Path: blob/master/lessons/lesson_13/extra-materials/bayes_theorem_iris.ipynb
1904 views
Applying Bayes' Theorem to Iris Classification
Can Bayes' theorem help us to solve a classification problem, namely predicting the species of an iris?
Preparing the Data
We'll read the iris data into a DataFrame
, and round up all of the measurements to the next integer:
Deciding How to Make a Prediction
Let's say I have an out-of-sample iris with the following measurements: 7, 3, 5, 2. How might I predict the species?
Let's frame this as a conditional probability problem: What is the probability of some particular species, given the measurements 7, 3, 5, and 2?
We could calculate the conditional probability for each of the three species, and then predict the species with the highest probability:
Calculating the Probability of Each Species
Bayes' theorem gives us a way to calculate these conditional probabilities.
Let's start with versicolor:
We can calculate each of the terms on the right side of the equation:
Therefore, Bayes' theorem says the probability of versicolor given these measurements is:
Let's repeat this process for virginica and setosa:
We predict that the iris is a versicolor, since that species had the highest conditional probability.
Summary
We framed a classification problem as three conditional probability problems.
We used Bayes' theorem to calculate those conditional probabilities.
We made a prediction by choosing the species with the highest conditional probability.
Bonus: The Intuition Behind Bayes' Theorem
Let's make some hypothetical adjustments to the data to demonstrate how Bayes' theorem makes intuitive sense:
Pretend that more of the existing versicolors had measurements of 7352:
would increase, thus increasing the numerator.
It would make sense that, given an iris with measurements of 7352, the probability of it being a versicolor would also increase.
Pretend that most of the existing irises were versicolor:
would increase, thus increasing the numerator.
It would make sense that the probability of any iris being a versicolor (regardless of measurements) would also increase.
Pretend that 17 of the setosas had measurements of 7352:
would double, thus doubling the denominator.
It would make sense that given an iris with measurements of 7352, the probability of it being a versicolor would be cut in half.