Path: blob/master/Data Visualization using Python/4.2 Advanced Viz Using matplotlib and Seaborne Waffle Chart and Regression Plots.ipynb
3074 views
Waffle Charts and Regression Plots
Objectives
Create Waffle charts
Create regression plots with Seaborn library
Exploring Datasets with pandas and Matplotlib
Toolkits: The course heavily relies on pandas and Numpy for data wrangling, analysis, and visualization. The primary plotting library we will explore in the course is Matplotlib.
Dataset: Immigration to Canada from 1980 to 2013 - International migration flows to and from selected countries - The 2015 revision from United Nation's website
The dataset contains annual data on the flows of international migrants as recorded by the countries of destination. The data presents both inflows and outflows according to the place of birth, citizenship or place of previous / next residence both for foreigners and nationals. In this lab, we will focus on the Canadian Immigration data.
Import Primary Modules:
Download the Canadian Immigration dataset and read it into a pandas dataframe.
Let's take a look at the first five items in our dataset
Let's find out how many entries there are in our dataset
Clean up data. We will make some modifications to the original dataset to make it easier to create our visualizations. Refer to Introduction to Matplotlib and Line Plots and Area Plots, Histograms, and Bar Plots for a detailed description of this preprocessing.
Import and setup matplotlib
:
Let's revisit the previous case study about Denmark, Norway, and Sweden.
Unfortunately, unlike R, waffle
charts are not built into any of the Python visualization libraries. Therefore, we will learn how to create them from scratch.
Step 1. The first step into creating a waffle chart is determing the proportion of each category with respect to the total.
Step 2. The second step is defining the overall size of the waffle
chart.
Step 3. The third step is using the proportion of each category to determe it respective number of tiles
Based on the calculated proportions, Denmark will occupy 129 tiles of the waffle
chart, Norway will occupy 77 tiles, and Sweden will occupy 194 tiles.
Step 4. The fourth step is creating a matrix that resembles the waffle
chart and populating it.
Let's take a peek at how the matrix looks like.
As expected, the matrix consists of three categories and the total number of each category's instances matches the total number of tiles allocated to each category.
Step 5. Map the waffle
chart matrix into a visual.
Step 6. Prettify the chart.
Step 7. Create a legend and add it to chart.
And there you go! What a good looking delicious waffle
chart, don't you think?
Now it would very inefficient to repeat these seven steps every time we wish to create a waffle
chart. So let's combine all seven steps into one function called create_waffle_chart. This function would take the following parameters as input:
categories: Unique categories or classes in dataframe.
values: Values corresponding to categories or classes.
height: Defined height of waffle chart.
width: Defined width of waffle chart.
colormap: Colormap class
value_sign: In order to make our function more generalizable, we will add this parameter to address signs that could be associated with a value such as %, $, and so on. value_sign has a default value of empty string.
Now to create a waffle
chart, all we have to do is call the function create_waffle_chart
. Let's define the input parameters:
And now let's call our function to create a waffle
chart.
There seems to be a new Python package for generating waffle charts
called PyWaffle, but it looks like the repository is still being built. But feel free to check it out and play with it.
In lab Pie Charts, Box Plots, Scatter Plots, and Bubble Plots, we learned how to create a scatter plot and then fit a regression line. It took ~20 lines of code to create the scatter plot along with the regression fit. In this final section, we will explore seaborn and see how efficient it is to create regression lines and fits using this library!
Let's first install seaborn
Create a new dataframe that stores that total number of landed immigrants to Canada per year from 1980 to 2013.
With seaborn, generating a regression plot is as simple as calling the regplot function.
This is not magic; it is seaborn! You can also customize the color of the scatter plot and regression line. Let's change the color to green.
You can always customize the marker shape, so instead of circular markers, let's use +
.
Let's blow up the plot a little so that it is more appealing to the sight.
And let's increase the size of markers so they match the new size of the figure, and add a title and x- and y-labels.
And finally increase the font size of the tickmark labels, the title, and the x- and y-labels so they don't feel left out!
Amazing! A complete scatter plot with a regression fit with 5 lines of code only. Isn't this really amazing?
If you are not a big fan of the purple background, you can easily change the style to a white plain background.
Or to a white background with gridlines.
Question: Use seaborn to create a scatter plot with a regression line to visualize the total immigration from Denmark, Sweden, and Norway to Canada from 1980 to 2013.