Regularization
Overfitting
Idea: If we have too many features, the learned hypothesis may fit the training set very well, but fail to generalize to new examples.
How to address?
Options:
- Reduce number of features
- Manually select
- Model selection algorithm
- Regularization
- Keep all features, but reduce magnitude/ values of parameters $\theta_{j}$.
- Works well when we have a lot of features, each of which contributes a bit to predict $y$.
Regularization types
LASSO | Ridge | Elastic Net |
---|---|---|
Shrinks coefficients to 0, Good for variable selection | Makes coefficients smaller | Tradeoff between variable selection and small coefficients |
$...+\lambda | | \theta | | _{1}$ | $...+\lambda || \theta ||_{2}^{2}$ | $...+\lambda [ (1-\alpha) || \theta ||_{1} + \alpha ||\theta || _{2}^{2} ]$ |