Model Selection

When selecting a model, we distinguish data into 3 different parts as follow:

Training setValidation/Dev setTesting set
Model is trained(80% usually)Model is assessed(20% usually)Model gives predictions

Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:

Cross validation

A method is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:

Training on $k-1$ folds and assessment on the remaining one.Training on $n-p$ observations and assessment on the $p$ remaining ones