Path: blob/master/notebooks/book1/22/matrix_factorization_recommender.ipynb
1192 views
Matrix Factorization for Movie Lens Recommendations
This notebook is based on code from Nick Becker
Setting Up the Ratings Data
We read the data directly from MovieLens website, since they don't allow redistribution. We want to include the metadata (movie titles, etc), not just the ratings matrix.
These look good, but I want the format of my ratings matrix to be one row per user and one column per movie. I'll pivot
ratings_df
to get that and call the new variable R
.
The last thing I need to do is de-mean the data (normalize by each users mean) and convert it from a dataframe to a numpy array.
Singular Value Decomposition
Scipy and Numpy both have functions to do the singular value decomposition. I'm going to use the Scipy function svds
because it let's me choose how many latent factors I want to use to approximate the original ratings matrix (instead of having to truncate it after).
Making Predictions from the Decomposed Matrices
I now have everything I need to make movie ratings predictions for every user. I can do it all at once by following the math and matrix multiply , , and back to get the rank approximation of .
I also need to add the user means back to get the actual star ratings prediction.
Making Movie Recommendations
Finally, it's time. With the predictions matrix for every user, I can build a function to recommend movies for any user. All I need to do is return the movies with the highest predicted rating that the specified user hasn't already rated. Though I didn't use actually use any explicit movie content features (such as genre or title), I'll merge in that information to get a more complete picture of the recommendations.
I'll also return the list of movies the user has already rated, for the sake of comparison.
So, how'd I do?
Pretty cool! These look like pretty good recommendations. It's also good to see that, though I didn't actually use the genre of the movie as a feature, the truncated matrix factorization features "picked up" on the underlying tastes and preferences of the user. I've recommended some film-noirs, crime, drama, and war movies - all of which were genres of some of this user's top rated movies.