Path: blob/main/Lessons/Lesson 13 - RecSys 1/Self_Assess_Solns_13.ipynb
871 views
Lesson 13 - Self-Assessment Solutions
Self-Assessment: Modularize Fetching Unique Items
Self-Assessment: Load and Display - Solution
There's nothing too new here. You've done this kind of work before. What's more important here than the code is making sure you take a minute or two to understand the data you're pulling in. What columns do you have available to you? Which columns contain simple values and which columns contain lists. Think about how you could or couldn't use this data to make recommendations.
Self-Assessment: Pandas - Solution
Remember that shape gives you the number of rows first, followed by the number of columns.
There are 2550 TED talks in this data frame.
Self-Assessment: Prerequisites - Solution
Remember that when you're calculating the quantile for some piece of data, you'll get different results if you calculate it before or after you do your other subsetting. First, let's calculate the views quantile before we figure the rest of our prerequisites.
Let's compare that with calculating the quantile after we subset.
There is no universally "right" answer as to whether you should calculate the quantile before or after you've narrowed the initial dataset. It depends on what you're trying to accomplish. If you want the most viewed talks that meet your criteria you'd calculate it after you've subsetted. If you want the most viewed talks overall you'd calculate it before you've subsetted.
For our homework, we'll either tell you when to subset a dataframe or ask you to make the decision and give a justification for your decision.
Self-Assessment: Compute a Metric, Sort and Print - Solution
Note that here we are computing our metric on our narrowed data set. We could have created the metric on the entire dataset. But, if we know that we're only interested in a portion of the talks, we should narrow our dataset before computing the metric.
Self-Assessment: Create the Knowledge-Based Recommender - Solution
We're creating this as a function that takes in the dataframe and the percentile of views that we want to return. We'll first generate our list of unique words to present to users. We'll also stringify our list of ratings so we can use str.contains to filter.
Self-Assessment: TF-IDF Vectors - Solution
This is all straight from the book. More information about the TfidfVectorizer is available online here: https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html
Self-Assessment: Create the Content-Based Recommender Based on Dot Product - Solution
This is also straight from the book. We don't expect you to understand everything to do with linear kernels. But if you're interested, the documentation is here: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.linear_kernel.html
Self-Assessment: Metadata Recommender
Reminder: You are using the ratings and the tags. Sanitize both first. Use all the words from each to make the soup.