Path: blob/master/lessons/lesson_12/python-notebooks-data-wrangling/Wrangling--Pivot-Tables-with-Pandas.ipynb
1904 views
Pivot Tables with Pandas: San Francisco Crime Data
Data background
The San Francisco crime data comes from the city Socrata portal:
https://data.sfgov.org/Public-Safety/SFPD-Incidents-from-1-January-2003/tmnf-yvry
The API call to get the data as CSV, for dates between '2006-01-01' and '2016-04-01':
https://data.sfgov.org/resource/cuks-n6tp.csv?$where=date between '2006-01-01' and '2016-04-01'&ParseError: KaTeX parse error: Expected 'EOF', got '&' at position 85: …ddress,x,y,pdid&̲limit=1000000000
The download is more than 200 MB+. A mirror of the data is included in this repo:
data/sf/raw/bulk-data.sfgov.org--20060101-20160401--cuks-n6tp.csv
An important caveat from the SF data portal:
As of July 19, 2015, the PD District boundaries have been updated through a redistricting process. These new boundaries are not reflected in the dataset yet so you cannot compare data from July 19, 2015 onward to official reports from PD with the Police District column. We are working on an update to the dataset to reflect the updated boundaries starting with data entered July 19 onward. Incidents derived from SFPD Crime Incident Reporting system Updated daily, showing data from 1/1/2003 up until two weeks ago from current date. Please note: San Francisco police have implemented a new system for tracking crime. The dataset included here is still coming from the old system, which is in the process of being retired (a multi-year process). Data included here is no longer the official SFPD data. We will migrate to the new system for DataSF in the upcoming months.
Simple pivot chart
http://pbpython.com/pandas-pivot-table-explained.html
Let's do a simple pivot: for every year, aggregate the count of records.
Pivot by category
A general count of crime is not interesting. Lets dive down into categories
Pivot the dataframe by category, and then by year
Larceny/Thefts
Go back to the original table, now facet by descript