Path: blob/main/Homework/Lesson 14 HW - RecSys 2/Homework_14.ipynb
871 views
Lesson 14 Homework: Recommender Systems 2
When asking questions about homework in Piazza please use a tag in the subject line like HW14.7 to refer to Homework 14, Question 7. So the subject line might be HW14.7 question. Note there are no spaces in "HW14.7". This really helps keep Piazza easily searchable for everyone!
For full credit, all code in this notebook must be both executed in this notebook and copied to the Canvas quiz where indicated.
Question 1 (2 points)
Which of the following recommenders is based on the user/item ratings? (Check all that apply.)
SVD item-based collaborative filter
KNN user-based collaborative filter
Content recommender
Knowledge-based recommender
Chart
Question 2 (2 points)
Which Surprise algorithm reduces the size of the problem space through matrix factorization?
NormalPredictor
KNNBasic
KNNWithMeans
BaselineOnly
SVD
KNNWithZScores
Data Exploration
(Note: This section is not included in the quiz and is ungraded.)
The file restaurant_ratings.csv (found in the presentation download for this lesson) contains user ratings for various New York City restaurants. You can read a little more about the data at Kaggle. We have modified the data to generate user ratings that match the star columns in this file.
Do the following:
read the data into a variable called "ratings"
display the first 5 lines of the data (get familiar with the data frame)
find the minimum restaurant rating
find the maximum restaurant rating
adjust the rating scale by shifting up 1 if 0 is included
Question 3 (2 points)
What is the minimum restaurant rating?
Question 4 (2 points)
What is the maximum restaurant rating?
Question 5 (2 points)
What is the mean restaurant rating for all restaurants (rounded to 2 significant digits)?
Question 6 (2 points)
What is the median of the restaurant rating scale?
Train/Test Split and Score Setup
(Note: this section is not included in the quiz and is not graded.)
We've provided code to you below for a scoring function and to split the data into train and test sets. Use the train and test set generated from this code to answer the next questions. You must not change this code if you want to get the correct answers.
Question 7 (2 points)
Compute a baseline model that always returns the median of the rating scale (rounded to 2 significant digits). What is the RMSE on this model?
Question 8 Build a Weighted Mean User-Based Filter (manually graded) (4 points)
From data in the file restaurant_rating.csv, build a ratings matrix from the data frame of users, restaurants, and ratings and build a user-based collaborative filtering model that weights mean rank using cosine similarity among users.
Question 9 2 points
What is the RMSE (rounded to 2 significant digits) of the Weighted Mean algorithm?
Question 10 User-Based SVD - Hyperparameter tuning (Manually Graded) (4 points)
From data in the file restaurant_ratings.csv, use the surprise library in Python to build an SVD user-based collaborative filtering model for the restaurant ratings. Use gridsearch to tune the hyperparameters, reserving 15% of the data to get an unbiased estimate of the accuracy. For the grid, use the following options:
'n_epochs': [15, 20, 25] (The number of iterations of the Stochastic Gradient Descent minimization procedure.)
'lr_all': [.005, .025, .001] (The learning rate.)
'reg_all': [.01, .02, .05] (The penalty for complex models.)
Additionally, use the following:
3 folds for cross validation
a seed of 14
Use the example from the lesson and be sure to set the seed in the appropriate place. Note: this code will take several minutes to run.
Question 11 (2 points)
What is the biased accuracy (rounded to 2 significant digits) of the algorithm?
Question 12 (2 points)
What is the unbiased accuracy (rounded to 2 significant digits) of the algorithm?
Question 13 (2 points)
What is the number of iterations of the stochastic gradient descent ('n_epochs') value chosen by the grid search?
Question 14 (2 points)
What is the learning rate ('lr_all') chosen by the grid search?
Question 15 (2 points)
What is the regularization ('reg_all') chosen by the grid search?
Question 16 (2 points)
Now that we know what our best parameters should be, we need to train our SVD model on all the available data. Do the following:
set the seeds for reproducibility
reset the data.raw_ratings to all of the ratings OR reload the data from the dataframe
use the build_full_trainset() method to build a full trainset
set up an SVD algorithm using the best parameters
fit the data to the trainset
predict the estimated rating for user 1061 and restaurant 347
What is the predicted estimated rating (rounded to 2 digits) for user 1061 and restaurant 347?
Hybrid Filter Setup
(Note: This section is not included in the quiz/solutions.)
From data in the files restaurant_ratings.csv and restaurants.csv build a recommender system that is a hybrid of a metadata content-based recommender and the SVD user-based collaborative filter that you just trained.
To set up your hybrid filter:
read in the restaurants.csv into a variable called rest
review the data in the dataframe (Note that we have pre-cleaned the data for you, including using TextBlob to extract just the relevant descriptors from the description. Not all restaurants have a description.)
make a soup from the following columns, which are all simple strings (Hint: the soup for the first item in the geoplaces dataframe should be: 'Contemporary American Average_price rustic airy adorable classic most distinguished uncommon innovative American proud only world-class week.IMPORTANT special welcome'):
restaurant_type
price_range
ambiance
descriptors
Instantiate a CountVectorizer with no stopwords (use
stop_words = None). (We shouldn't have much in the way of stopwords, since it's all keywords.)Use the provided fetchSimilarity function to get a countVectorizer similarity matrix using the soup column. (Hint: the similarity at [0,2] should be 0.2849014411490949.)
Question 17 Use The Content Recommender (2 points)
Using the provided content recommender function and the code you've prepared, get the top 5 recommendations for 'Tao Uptown'. (Hint: the top restaurant for 'Becco' should be 'Scampi'.)
Which if these restaurants is the top recommendation?
Haru Sushi - Amsterdam Ave
Bistrot Leo
Rice & Gold
Zengo - NYC
Restaurant Nippon
Question 18 - Build the Hybrid Function (manually graded) (4 points)
Some times recommendation designers are less focused on recommending things that have the highest rating, and more focused on recommending things that will have an acceptable rating, but are very similar to items the user has previously liked. For the homework, we're going to build a hybrid recommender that first identifies the most similar movies content-wise, estimates the ratings, and returns the most highly rated movies. We'll follow the example used in the lesson in which we will pre-fetch the content recommendations, and pass those pre-fetched recommendations into the hybrid function.
The full list of parameters needed will be:
user: the userid for which we are making predictions
contentRecs: the dataframe that contains the content recommendations, with similarity scores (this is returned for you in the content_recommender function we provided)
algo: the trained algorithm to use for colaborative filtering
predCol: the column in your contentRecs that can be used for predictions
minRating: the minimum rating we'll accept (estimated ratings should be >= to this number)
N: the final number of recommendations to return
Your function should return a dataframe that contains all of the information that was in your contentRecs plus the estimated rating for the "N" number of rows.
Question 19 - Calling the Hybrid Function (2 points)
Use your hybrid function to find recommendations for user 1235 and restaurant 'Lido'.
Remember, you will need to call your content_recommender function first to get the similarity scores.
Pass the top 25 restaurants with the highest sim_scores from the content recommender to the collaborative recommender.
Use the SVD algorithm you trained in Question 10 to predict ratings.
The minimum allowed rating is 4.5.
Return the top 3 recommendations.
Which answer shows the top 3 recommendations, in order?
Hint: If make recommendations for user 1061, and 'Schilling' and everything else the same, the top recommendation should be Trattoria Italienne.
Naples 45 Ristorante E Pizzeria, Obica Mozzarella Bar Pizza e Cucina, La Pecora Bianca - NoMad
Il Mulino New York - Downtown, Bocca di Bacco, Felice 64 Wine Bar
Becco, La Pecora Bianca - Midtown, Stella 34 Trattoria
La Pecora Bianca - NoMad, La Pecora Bianca - Midtown, Stella 34 Trattoria
Esca, Lincoln Ristorante, La Pecora Bianca - Midtown
Question 20 KNNWithMeans item-based collaborative filter (manually graded)(4 points)
Train a KNNWithMeans Surprise collaborative filter. We ran a gridsearch already and learned that the best k for this is 3, and we get the best results using an item-based similarity measure. You should:
Set seeds of 14
Read in the data and set up your reader
Set up a data object
Build a full trainset
set up a KNNWithMeans algorithm using the following parameters:
k of 3
set the sim_options 'user_based' to False (this switches it to an item-based similarity measure, instead of a user-based).
fit the algorithm using the full trainset
predict the rating for user 1000 and restaurant 300
Hint: the predicted rating for user 1000 and restaurant 300 should be 4.32
Use your hybrid function again with user 1243 and restaurant 'Lido'.
Remember, you will need to call your content_recommender function first to get the similarity scores.
Pass the top 25 restaurants with the highest sim_scores from the content recommender to the collaborative recommender.
Use the KNN algorithm you just trained predict ratings.
The minimum allowed rating is 4.5.
Return the top 3 recommendations.
Hint: If you call your function with user 1001 and Becco, the top recommendation should be Gran Morsi.
What are the top 3 restaurants, in order?
Bar Primi, Naples 45 Ristorante E Pizzeria, La Pecora Bianca - NoMad
Il Mulino New York - Uptown, Naples 45 Ristorante E Pizzeria, Bar Primi
Felice 64 Wine Bar, Lincoln Ristorante, Scampi
Trattoria Italienne, Taralluci e Vino Union Square, Felice 64 Wine Bar
La Pecora Bianca - Midtown, La Pecora Bianca - NoMad, Naples 45 Ristorante E Pizzeria