# 22. Training ensembles¶

Veles automates the process of training ensembles. It consists of 3 separate steps:

1. Train the models which are be included into the ensemble.
2. Evaluate those models on a separate part of the dataset (this ensures that the ensemble does not adapt to the validation set).
3. Train the top-level classifier on a separate part of the dataset which uses the output from step 2 as features.

## 22.1. (1) How to train the models¶

The following command:

veles -s --ensemble-train 20:0.9 --result-file ensemble.json <workflow> <config>


will result in 20 separate workflows, each being trained on 0.9 part of the training dataset. The information about the ensemble, such as best snapshots, evaluated metrics, etc., is saved to ensemble.json. As usual, multiple models can be trained in parallel via master-slave.

Internally, Veles launches an instance of veles.ensemble.model_workflow.EnsembleModelWorkflow instead of the user’s model. It is linked in a ring and veles.ensemble.model_workflow.EnsembleModelManager unit trains one model in run() at a time. This workflow contains the histogram plotter which depicts the distribution of “EvaluationResult” metric value.

By default, plotting and publishing is disabled in workflows included into the ensemble. If plotters are desired to work, set root.common.ensemble.disable.plotting to False:

veles ... root.common.ensemble.disable.plotting=False


You would probably like to store snapshots in a database, which is possible with ODBC snapshotter configuration. For more details, see Snapshotting.

## 22.2. (2) How to evaluate the models¶

The following command:

veles -s --ensemble-test ensemble.json --result-file ensemble_ev.json <workflow> <config>


writes the results of the evaluation of models trained on step 1 on test dataset to ensemble_ev.json. Each model is restored from the snapshot, runs in “test” mode (see Execution modes) and appends the output to ensemble_ev.json. Parallel evaluation via master-slave is supported. User’s loader must support “test” mode and supply the labelled (or targeted) data to TEST set. Hovewer, labels or targets are not used in this step, they are needed only by step 3.

## 22.3. (3) How to train the top-level classifier¶

Since the described steps are independent, one can generate the intermediate files by hand. They are just plain text JSON files. Thus, it is possible to combine different neural network topologies by merging --result-file-s from step 1, for example.