Regularization
Overfitting
Idea: If we have too many features, the learned hypothesis may fit the training set very well, but fail to generalize to new examples.
How to address?
Options:
- Reduce number of features
- Manually select
- Model selection algorithm
- Regularization
- Keep all features, but reduce magnitude/ values of parameters $\theta_{j}$.
- Works well when we have a lot of features, each of which contributes a bit to predict $y$.