Regularization

Overfitting

Idea: If we have too many features, the learned hypothesis may fit the training set very well, but fail to generalize to new examples.

How to address?

Options:

  1. Reduce number of features
    • Manually select
    • Model selection algorithm
  2. Regularization
    • Keep all features, but reduce magnitude/ values of parameters $\theta_{j}$.
    • Works well when we have a lot of features, each of which contributes a bit to predict $y$.

Regularization types

LASSORidgeElastic Net
Shrinks coefficients to 0, Good for variable selectionMakes coefficients smallerTradeoff between variable selection and small coefficients
$...+\lambda | | \theta | | _{1}$$...+\lambda || \theta ||_{2}^{2}$$...+\lambda [ (1-\alpha) || \theta ||_{1} + \alpha ||\theta || _{2}^{2} ]$
Previous
Next