CoCalc -- Simple Recommender.ipynb

GitHub Repository: DataScienceUWL/DS775
Path: blob/main/Lessons/Lesson 13 - RecSys 1/Chapter_Notebooks/Simple Recommender.ipynb
⁸⁷¹ views

Kernel: Python 3

In [1]:

import pandas as pd
import numpy as np

#Read the CSV File into df
# Note we have truncated the dataset to 5000 rows for illustration, the actual data has over 40000 rows
# the full dataset is available on Kaggle here
# https://www.kaggle.com/rounakbanik/the-movies-dataset/downloads/the-movies-dataset.zip/7
# the recommenders work better with more data of course

df = pd.read_csv('./data/movies_metadata.csv')
df.head()

Out[1]:

In [2]:

#Calculate the number of votes garnered by the 80th percentile movie
m = df['vote_count'].quantile(0.80)
m

Out[2]:

255.20000000000027

In [3]:

#Only consider movies longer than 45 minutes and shorter than 300 minutes
q_movies = df[(df['runtime'] >= 45) & (df['runtime'] <= 300)]

#Only consider movies that have garnered more than m votes
q_movies = q_movies[q_movies['vote_count'] >= m]

#Inspect the number of movies that made the cut
q_movies.shape

Out[3]:

(999, 24)

In [4]:

# Calculate C
C = df['vote_average'].mean()
C

Out[4]:

6.06916

In [5]:

# Function to compute the IMDB weighted rating for each movie
def weighted_rating(x, m, C):
    v = x['vote_count']
    R = x['vote_average']
    # Compute the weighted score
    return (v/(v+m) * R) + (m/(m+v) * C)

In [7]:

# Compute the score using the weighted_rating function defined above
q_movies['score'] = q_movies.apply(weighted_rating, args=(m,C), axis=1)

In [8]:

#Sort movies in descending order of their scores
q_movies = q_movies.sort_values('score', ascending=False)

#Print the top 25 movies
q_movies[['title', 'vote_count', 'vote_average', 'score', 'runtime']].head(25)

Out[8]:

Product

Resources

Company