Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
DataScienceUWL
GitHub Repository: DataScienceUWL/DS775
Path: blob/main/Lessons/Lesson 13 - RecSys 1/Chapter_Notebooks/Simple Recommender.ipynb
871 views
Kernel: Python 3
import pandas as pd import numpy as np #Read the CSV File into df # Note we have truncated the dataset to 5000 rows for illustration, the actual data has over 40000 rows # the full dataset is available on Kaggle here # https://www.kaggle.com/rounakbanik/the-movies-dataset/downloads/the-movies-dataset.zip/7 # the recommenders work better with more data of course df = pd.read_csv('./data/movies_metadata.csv') df.head()
#Calculate the number of votes garnered by the 80th percentile movie m = df['vote_count'].quantile(0.80) m
255.20000000000027
#Only consider movies longer than 45 minutes and shorter than 300 minutes q_movies = df[(df['runtime'] >= 45) & (df['runtime'] <= 300)] #Only consider movies that have garnered more than m votes q_movies = q_movies[q_movies['vote_count'] >= m] #Inspect the number of movies that made the cut q_movies.shape
(999, 24)
# Calculate C C = df['vote_average'].mean() C
6.06916
# Function to compute the IMDB weighted rating for each movie def weighted_rating(x, m, C): v = x['vote_count'] R = x['vote_average'] # Compute the weighted score return (v/(v+m) * R) + (m/(m+v) * C)
# Compute the score using the weighted_rating function defined above q_movies['score'] = q_movies.apply(weighted_rating, args=(m,C), axis=1)
#Sort movies in descending order of their scores q_movies = q_movies.sort_values('score', ascending=False) #Print the top 25 movies q_movies[['title', 'vote_count', 'vote_average', 'score', 'runtime']].head(25)