Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
YStrano
GitHub Repository: YStrano/DataScience_GA
Path: blob/master/lessons/lesson_16/04_decomposition.ipynb
1904 views
Kernel: Python 3

Time Series: Decomposition

Learning Objectives

After this lesson, you will be able to:

  • Describe the different components of time series data (trend, seasonality, cyclical, and residual).

  • Decompose time series data into trend, seasonality, cyclical, and residual components.

  • Plot the decomposed components of a time series.

Splitting a time series into several components is useful for both understanding the data and diagnosing the appropriate forecasting model. Each of these components will represent an underlying pattern.

  • Trend: A trend exists when there is a long-term increase or decrease in the data. It does not have to be linear. Sometimes, we will refer to a trend “changing direction” when, for example, it might go from an increasing trend to a decreasing trend.

  • Seasonal: A seasonal pattern exists when a series is influenced by seasonal factors (e.g., the quarter of the year, the month, or day of the week). Seasonality is always of a fixed and known period.

  • Residual: The leftover or error component.

Guided Practice

We are going to play around with some bus data from Portland, Oregon. Load in the data set below and check it out.

import pandas as pd import matplotlib.pyplot as plt import seaborn as sns import numpy as np import datetime from dateutil.relativedelta import * %matplotlib inline bus = pd.read_csv('./data/bus.csv') bus.head()
bus.tail()

We'll need to clean this data a little. Let's simplify the names of the columns. There are also a couple of bad rows at the end of the file. We'll get rid of those. Additionally, we need to make the riders column an integer.

bus.drop(bus.index[115], inplace=True) bus.drop(bus.index[114], inplace=True) bus.columns= ['index','riders'] bus['riders'] = bus.riders.apply(lambda x: int(x)) bus.head()

We're going to create an artificial date index using the relativedelta() function, as shown below. We will simply start at 1973-01-01 and iterate up one month at a time.

start = datetime.datetime.strptime("1973-01-01", "%Y-%m-%d") date_list = [start + relativedelta(months=x) for x in range(0,114)] # Edited to 115. bus['index'] =date_list bus.set_index(['index'], inplace=True) bus.index.name=None bus.head()

StatsModels Time Series Tools

The Python StatsModels library offers a wide variety of reliable time series analysis tools. We'll start off by loading the autocorrelation and partial autocorrelation functions, as well as a function for decomposing time series.

from statsmodels.tsa.seasonal import seasonal_decompose

Plot the raw data.

We can look at the raw data first. Let's plot the time series.

bus.riders.plot(figsize=(10,6), title= 'Monthly Ridership (100,000s)', fontsize=14)
<matplotlib.axes._subplots.AxesSubplot at 0x1a10a3fd68>
Image in a Jupyter notebook

Using the seasonal_decompose() function, we can break the time series into its constituent parts.

Use the function on the riders data with a frequency of 12, then plot the data. We're using a frequency of 12 because the data are monthly.

The decomposition object from seasonal_decompose() has a .plot() function, like with Pandas DataFrames.

bus.dtypes
riders int64 dtype: object
decomposition = seasonal_decompose(bus.riders, freq=12) fig = plt.figure() fig = decomposition.plot() fig.set_size_inches(12, 6)
<matplotlib.figure.Figure at 0x1a0fe2c668>
Image in a Jupyter notebook

Plot a single component of the decomposition plot.

We can pull out just one component of the decomposition plot.

seasonal = decomposition.seasonal seasonal.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x1a106c79e8>
Image in a Jupyter notebook
trend = decomposition.trend trend.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x1a0f8f57b8>
Image in a Jupyter notebook

Let's examine the residuals of our data.

resid = decomposition.resid resid.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x1a10fc7c18>
Image in a Jupyter notebook
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
fig, ax = plt.subplots(figsize=(9,5)) plot_acf(resid, lags=30, ax = ax) plt.show()
Image in a Jupyter notebook
fig, ax = plt.subplots(figsize=(9,5)) plot_pacf(resid, lags=30, ax=ax) plt.show()
Image in a Jupyter notebook

We notice that the residuals of our time series don't have significant autocorrelation. This is because the trend and seasonal components have been taken out and adjusted for.

Recap

  • Trend is a long-term change in the data.

  • Seasonality is a pattern of a fixed period that repeats in the data.

  • Residuals are the error components of the data.

  • StatsModels contains a seasonal_decompose() function that breaks a time series into its components.

Instructor Note: These are optional and can be assigned as student practice questions outside of class.

1) Import the Airline Passengers data set, preprocess the data, and plot the raw time series.

import pandas as pd import matplotlib.pyplot as plt import seaborn as sns import numpy as np import datetime from dateutil.relativedelta import * %matplotlib inline
airline = pd.read_csv('./data/airline.csv')

2) Decompose the time series and plot using the .seasonal_decompose() function.

3) Interpret these plots.