Analysing the Edinburgh Fringe Festival Jokes
This is the ipython notebook for the blog post: Python, natural language processing and predicting funny.
Here are the libraries we are going to need:
Loading and tidying the data
Getting rid of the common word and tokenising the jokes
Training our classifier
From here on in we use the jokes up until 2013 as the training set.
We start by getting the entire set of words in all the jokes from the training set.
Creating a function to extract features from a given joke
Labelling our jokes depending on what will be deemed as funny
Creating a labeled feature
Creating our classifier
The real test comes from applying our classifier to this year's jokes
Wrapping all of the above in a function to see if we can identify how our classifier performs based on a funniness threshold
Wrapping everything in another function to see the effect of the testing data set
We used previous years to train for this year. Here we will just use random samples of a variety of size of the data to train.
Here is a plot of the accuracy for varying ratio.
Here are all the above on a single plot (not terrible helpful).