Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
YStrano
GitHub Repository: YStrano/DataScience_GA
Path: blob/master/lessons/lesson_16/solution-code/02_rolling_statistics_solutions.ipynb
1904 views
Kernel: Python 2

Time Series: Rolling Statistics

Independent Practice

Instructor Note: These are optional and can be assigned as student practice questions outside of class.

1) Load the Unemployment data set. Perform any necessary cleaning and preprocess the data by creating a datetime index.

import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns import datetime %matplotlib inline
unemp = pd.read_csv('./data/unemployment.csv') unemp.head()
unemp.tail()
unemp.columns = ['year_quarter', 'unemployment_rate'] unemp['unemployment_rate'] = unemp['unemployment_rate'].map(lambda x: float(str(x).replace('%',''))) unemp.dropna(inplace=True)
unemp.head()
unemp.dtypes
year_quarter object unemployment_rate float64 dtype: object
# This is quarterly data, so converting to datetime is a bit complicated. .dt.to_period('Q') will help us represent the string as a datetime object. unemp['date'] = pd.to_datetime(unemp.year_quarter).dt.to_period('Q') unemp.set_index('date', inplace=True) unemp.head()

2) Plot the unemployment rate.

unemp['unemployment_rate'].plot(lw=2.5, figsize=(12,5))
<matplotlib.axes._subplots.AxesSubplot at 0x109177240>
Image in a Jupyter notebook

3) Calculate the rolling mean of years with window=3 , without centering, and plot both the unemployment rates and the rolling mean data.

yearly = unemp['unemployment_rate'].resample('A').mean().rolling(window=3, center=False).mean() yearly.head()
date 1948 NaN 1949 NaN 1950 5.002833 1951 4.847333 1952 3.838917 Freq: A-DEC, Name: unemployment_rate, dtype: float64
# Extract the dates from the index as timestamps. date_ticks_orig = unemp.index.to_timestamp() date_ticks_roll = yearly.index.to_timestamp()
plt.figure(figsize=(14,7)) plt.plot(date_ticks_orig, unemp.unemployment_rate.values,lw=2) plt.plot(date_ticks_roll, yearly.values, lw=2) plt.tick_params(labelsize=14)
Image in a Jupyter notebook

4) Calculate the rolling median with window=5 and window=15. Plot both together with the original data.

uroll_w5 = unemp.unemployment_rate.rolling(window=5).median() uroll_w15 = unemp.unemployment_rate.rolling(window=15).median()
plt.figure(figsize=(14,7)) plt.plot(date_ticks_orig, unemp.unemployment_rate.values,lw=2) plt.plot(date_ticks_orig, uroll_w5, lw=2) plt.plot(date_ticks_orig, uroll_w15, lw=2) plt.tick_params(labelsize=14)
Image in a Jupyter notebook

5) Calculate and plot the expanding mean. Resample by quarter. Plot the rolling mean and the expanding mean together.

date_ticks = unemp.index.to_timestamp() rolling_mean = unemp.unemployment_rate.resample('Q').sum().rolling(window=1, center=False).mean() expanding_mean = unemp.unemployment_rate.resample('Q').sum().expanding().mean() plt.figure(figsize=(14,7)) plt.plot(date_ticks, rolling_mean, alpha=1, lw=2, label='rolling mean') plt.plot(date_ticks, expanding_mean, alpha=1, lw=2, label='expanding mean') plt.legend(loc='upper left') plt.tick_params(labelsize=14)
Image in a Jupyter notebook

6) Calculate and plot the exponentially weighted sum along with the rolling sum.

rolling_mean = unemp.unemployment_rate.resample('Q').sum().rolling(window=2, center=True).mean() exp_weighted_mean = unemp.unemployment_rate.resample('Q').sum().ewm(span=10).mean()
ax = rolling_mean.plot(lw=2.5, figsize=(14,7)) exp_weighted_mean.plot(ax=ax, lw=2.5)
<matplotlib.axes._subplots.AxesSubplot at 0x10e6ae748>
Image in a Jupyter notebook

7) Difference the unemployment rate and plot.

unemp['unemp_diff'] = unemp['unemployment_rate'].diff()
unemp['unemp_diff'].plot(lw=2.5, figsize=(12,5))
<matplotlib.axes._subplots.AxesSubplot at 0x10eb191d0>
Image in a Jupyter notebook