Recommendation System
Example data
Movie | Alice | Bob | Carol | Dave | $x_{1} (romance)$ | $x_{2} (action)$ |
---|---|---|---|---|---|---|
Love at last | 5 | 5 | 0 | 0 | 0.9 | 0 |
Romance forever | 5 | ? | ? | 0 | 1.0 | 0.01 |
Cute puppies of love | ? | 4 | 0 | ? | 0.99 | 0 |
Nonstop car chases | 0 | 0 | 5 | 4 | 0.1 | 1.0 |
Swords vs. karate | 0 | 0 | 5 | ? | 0 | 0.9 |
Notation
- $n_{u}$: number of user
- $n_{m}$: number of movie
- $r(i, j) = 1$ if user $j$ rated movie $i$
- $y^{(i, j)}$: rating by user $j$ on movie $i$ (if defined)
- $\theta^{(j)}$: parameter vector for user $j$
- $x^{(i)}$: parameter vector for movie $i$
- For user $j$, movie $i$, predicted rating: \( (\theta^{(j)})^{T}(x^{(i)}) \)
Content-based
Using hand-engineer feature $x_{1} (romance)$ and $x_{2} (action)$ to calculate the user preference.
Optimization objective
- To learn $\theta^{(j)}$ (parameter for user $j$)
\[ \mathop{\min}_{\theta^{(j)}} \frac{1}{2} \sum _{i:r(i, j)=1} ((\theta^{(j)})^{T}x^{(i)} - y^{(i, j)})^2 + \frac{\lambda}{2} \sum_{k=1}^{n} (\theta^{(j)}_{k})^2 \]
- To learn $\theta^{(1)}, \theta^{(2)}, ..., \theta^{(n_{u})}$
\[ \mathop{\min}_{\theta^{(1)}, \theta^{(2)}, ..., \theta^{(n_{u})}} \frac{1}{2} \sum_{j=1}^{n_{u}} \sum _{i:r(i, j)=1} ((\theta^{(j)})^{T}x^{(i)} - y^{(i, j)})^2 + \frac{\lambda}{2} \sum_{j=1}^{n_{u}} \sum_{k=1}^{n} (\theta^{(j)}_{k})^2 \]
Collaborative filtering
To address some of the limitations of content-based filtering like hand-engineer, collaborative filtering uses similarities between users and items simultaneously to provide recommendations.
Optimization objective:
- Given $ x^{(1)}, ..., x^{(n_{m})}$, estimate $ \theta^{(1)}, ..., \theta^{(n_{u})} $:
\[ \mathop{min}_{\theta^{(1)}, ..., \theta^{(n_{u})}} \frac{1}{2} \sum_{j=1}^{n_{u}} \sum _{i:r(i, j)=1} ((\theta^{(j)})^{T}x^{(i)} - y^{(i, j)})^2 + \frac{\lambda}{2} \sum_{j=1}^{n_{u}} \sum_{k=1}^{n} (\theta^{(j)}_{k})^2 \]
- Given $ \theta^{(1)}, ..., \theta^{(n_{u})} $, estimate $ x^{(1)}, ..., x^{(n_{m})}$:
\[ \mathop{min}_{\theta^{(1)}, ..., \theta^{(n_{u})}} \frac{1}{2} \sum_{i=1}^{n_{m}} \sum _{i:r(i, j)=1} ((\theta^{(j)})^{T}x^{(i)} - y^{(i, j)})^2 + \frac{\lambda}{2} \sum_{i=1}^{n_{m}} \sum_{k=1}^{n} (x^{(i)}_{k})^2 \]
- Minimizing $ x^{(1)}, ..., x^{(n_{m})}, \theta^{(1)}, ..., \theta^{(n_{u})} $ simultaneously:
\[ J(x^{(1)}, ..., x^{(n_{m})}, \theta^{(1)}, ..., \theta^{(n_{u})}) = \frac{1}{2} \sum_{i=1}^{n_{m}} \sum _{i:r(i, j)=1} ((\theta^{(j)})^{T}x^{(i)} - y^{(i, j)})^2 + \frac{\lambda}{2} \sum_{i=1}^{n_{m}} \sum_{k=1}^{n} (x^{(i)}_{k})^2 + \frac{\lambda}{2} \sum_{j=1}^{n_{u}} \sum_{k=1}^{n} (\theta^{(j)}_{k})^2 \]
namly,
\[ \mathop{min}_{x^{(1)}, ..., x^{(n_{m})}, \theta^{(1)}, ..., \theta^{(n_{u})}} J(x^{(1)}, ..., x^{(n_{m})}, \theta^{(1)}, ..., \theta^{(n_{u})}) \]
Collaborative filtering algorithm:
- Initialize $ x^{(1)}, ..., x^{(n_{m})}, \theta^{(1)}, ..., \theta^{(n_{u})} $ to small random values.
- Minimize $J(x^{(1)}, ..., x^{(n_{m})}, \theta^{(1)}, ..., \theta^{(n_{u})}) $ using gradient descent
- For a user with parameters $\theta$ and a movie with learned features $x$, predict a star rating of $\theta^{T}x$