Path: blob/master/lessons/lesson_16/04_decomposition.ipynb
1904 views
Time Series: Decomposition
Learning Objectives
After this lesson, you will be able to:
Describe the different components of time series data (trend, seasonality, cyclical, and residual).
Decompose time series data into trend, seasonality, cyclical, and residual components.
Plot the decomposed components of a time series.
Splitting a time series into several components is useful for both understanding the data and diagnosing the appropriate forecasting model. Each of these components will represent an underlying pattern.
Trend: A trend exists when there is a long-term increase or decrease in the data. It does not have to be linear. Sometimes, we will refer to a trend “changing direction” when, for example, it might go from an increasing trend to a decreasing trend.
Seasonal: A seasonal pattern exists when a series is influenced by seasonal factors (e.g., the quarter of the year, the month, or day of the week). Seasonality is always of a fixed and known period.
Residual: The leftover or error component.
Guided Practice
We are going to play around with some bus data from Portland, Oregon. Load in the data set below and check it out.
We'll need to clean this data a little. Let's simplify the names of the columns. There are also a couple of bad rows at the end of the file. We'll get rid of those. Additionally, we need to make the riders
column an integer.
We're going to create an artificial date index using the relativedelta()
function, as shown below. We will simply start at 1973-01-01
and iterate up one month at a time.
StatsModels Time Series Tools
The Python StatsModels library offers a wide variety of reliable time series analysis tools. We'll start off by loading the autocorrelation and partial autocorrelation functions, as well as a function for decomposing time series.
Plot the raw data.
We can look at the raw data first. Let's plot the time series.
Using the seasonal_decompose()
function, we can break the time series into its constituent parts.
Use the function on the riders
data with a frequency of 12
, then plot the data. We're using a frequency of 12 because the data are monthly.
The decomposition object from seasonal_decompose()
has a .plot()
function, like with Pandas DataFrames.
Plot a single component of the decomposition plot.
We can pull out just one component of the decomposition plot.
Let's examine the residuals of our data.
We notice that the residuals of our time series don't have significant autocorrelation. This is because the trend and seasonal components have been taken out and adjusted for.
Recap
Trend is a long-term change in the data.
Seasonality is a pattern of a fixed period that repeats in the data.
Residuals are the error components of the data.
StatsModels contains a
seasonal_decompose()
function that breaks a time series into its components.
Instructor Note: These are optional and can be assigned as student practice questions outside of class.