Linear Regression

Example data (House price prediction case)

SizeNumber of bedroomsNumber of floorsAge of housePrice
21045145460
14163240232
15343230315
8522136178
...............

Notation:

  • $n$ = number of features
  • $m$ = number of training data
  • $x^{(i)}$ = input features of $i^{th}$ training example
  • $x^{(i)}_{(j)}$ = value of feature $j$ in $i^{th}$ training example

Model structure:

  • Hypothesis: $h_{\theta}(x) = \theta^{T}x$
  • Parameters: $\theta$, $n + 1$ dimensional vector
  • Cost function: $J(\theta) = \frac{1}{2m} ( \sum_{i = 1}^{m} (h_{\theta}(x^{(i)}) - y^{i})) ^2$

Gradient descent:

\( \text{repeat until convergence} \\ \{ \theta_{j} := \theta_{j} - \alpha \frac{\partial}{\partial \theta_{j}} J(\theta) \} \) (simultaneously update for every $j = 0, ..., n$)

Gradient descent in practice

Feature scaling

Idea: Make sure features are on a similar scale.

  • Mean normalization
    • Replace $x_{j}$ with $x_{j} - \mu_{j}$ to make features have approximately zero mean.

\[ x_{(j)} \gets \frac{x_{(j)} - \mu_{(j)}}{\text{range of } x_{(j)}} \]

Learning rate

  • Debugging: Make sure gradient descent is working correctly.
  • Choose the apporpriate learning rate $\alpha$.
    • if $\alpha$ is too small: slow convergence.
    • if $\alpha$ is too large: $J(\theta)$ may not decrease on every iteration; may not converge.

Normal equation

Idea: The method to solve for $\theta$ analytically as follow:

\[ \theta = (X^{T}X)^{-1}X^{T}y \]

Gradient DesentNormal Equation
Need to choose $\alpha$.
Needs many iterations.
.Need to compute \[(X^{T}X)^{-1}\] which is $O(n^3)$
Works well even when $n$ is largeSlow if $n$ is large
Previous
Next