Veles automates the process of training ensembles. It consists of 3 separate steps:
The following command:
veles -s --ensemble-train 20:0.9 --result-file ensemble.json <workflow> <config>
will result in 20 separate workflows, each being trained on 0.9 part of the training dataset. The information about the ensemble, such as best snapshots, evaluated metrics, etc., is saved to ensemble.json. As usual, multiple models can be trained in parallel via master-slave.
Internally, Veles launches an instance of veles.ensemble.model_workflow.EnsembleModelWorkflow instead of the user’s model. It is linked in a ring and veles.ensemble.model_workflow.EnsembleModelManager unit trains one model in run() at a time. This workflow contains the histogram plotter which depicts the distribution of “EvaluationResult” metric value.
By default, plotting and publishing is disabled in workflows included into the ensemble. If plotters are desired to work, set root.common.ensemble.disable.plotting to False:
veles ... root.common.ensemble.disable.plotting=False
You would probably like to store snapshots in a database, which is possible with ODBC snapshotter configuration. For more details, see Snapshotting.
The following command:
veles -s --ensemble-test ensemble.json --result-file ensemble_ev.json <workflow> <config>
writes the results of the evaluation of models trained on step 1 on test dataset to ensemble_ev.json. Each model is restored from the snapshot, runs in “test” mode (see Execution modes) and appends the output to ensemble_ev.json. Parallel evaluation via master-slave is supported. User’s loader must support “test” mode and supply the labelled (or targeted) data to TEST set. Hovewer, labels or targets are not used in this step, they are needed only by step 3.
This step is not fully automated, unfortunately, because one can try different classifers and the only way to allow this is to write a workflow. Nevertheless, veles.loader.ensemble.EnsembleLoader and veles.loader.ensemble.EnsembleLoaderMSE fullbatch loaders exist which load the results of step 2, that is, the information about trained and evaluated models in the ensemble. User’s loader should inherit from one of them and implement veles.loader.ensemble.IEnsembleLoader or veles.loader.ensemble.IEnsembleLoaderMSE interface, correspondingly. Specifically, load_winners() must return the list of the labels for the test dataset used in step 2, either in index or raw forms, and load_targets() populates original_targets.
Since the described steps are independent, one can generate the intermediate files by hand. They are just plain text JSON files. Thus, it is possible to combine different neural network topologies by merging --result-file-s from step 1, for example.