Model Selection

When selecting a model, we distinguish data into 3 different parts as follow:

Training set	Validation/Dev set	Testing set
Model is trained(80% usually)	Model is assessed(20% usually)	Model gives predictions

Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:

A method is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:

k-fold	Leave-p-out
Training on $k-1$ folds and assessment on the remaining one.	Training on $n-p$ observations and assessment on the $p$ remaining ones

Last updated on Dec 3, 2019