Path: blob/master/lessons/lesson_16/06_bonus_time_series_models.ipynb
1904 views
Time Series: Forecasting Models
Learning Objectives
After this lesson, you will be able to:
Define time series analysis and time series forecasting models.
Discuss the unique properties of time series models.
Explore time series forecasting methods.
Lesson Guide
Time Series: Forecasting Models
Installing StatsModels
We will use the StatsModels library to perform time series forecasting. This might already be installed in your Python environment, but it doesn't support the forecasting methods we'll be using in this lesson. So, we'll clone it from its repo and install it using the source code. To follow along with this lesson, complete these installation steps:
In your terminal, use
pip freeze
to see ifstatsmodels
is already installed in your environment.If it is, use
conda remove statsmodels
.Initialize Git in your terminal using
git init
.Clone the StatsModels repo by typing
git clone git://github.com/statsmodels/statsmodels.git
in your terminal.Change to the StatsModels directory with
cd statsmodels
.Build the set-up file using the
python setup.py build
command.When that is done, install it using
python setup.py install
.Exit out of the terminal.
Restart the terminal.
Open Python by typing
python
.Type
from statsmodels.tsa.api import ExponentialSmoothing
to verify the install. If you don't get any errors, you are all set!
Time series analysis comprises methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data. (The analyses we've been doing in previous section qualify as time series analysis).
Time series forecasting is the use of a model to predict future values based on previously observed values. We will discuss these types of models in this section.
Time series forecasting models predict a future value in a time series. Like other predictive models, we will use prior history to predict the future. Unlike previous models, however, we will use the outcome variables from earlier in time as the inputs for prediction.
As with the modeling you're used to, we will have to evaluate different models to ensure that we have chosen the best one.
So, we will want to evaluate on test data to ensure that our model performs well on unseen data.
What Makes Time Series Data Special?
Time series data are time dependent. So, the basic assumption of a linear regression model — that the observations are independent — falls apart. Often, time series data include some sort of trend or may have seasonality.
Training and Testing Sets
Time series data have a natural temporal ordering, which makes them different from the studies we've seen in other lessons in which there was no natural ordering of observations.
Because these data are ordered, we cannot choose training and testing examples at random. As we are attempting to predict a sequence of future values, we must train on values from earlier (in time) in our data and then test our values at the end of the period.
Moving Averages and Autocorrelation
In previous sections, we learned about a few statistics for analyzing time series. A moving average is the average of k surrounding data points in time.
We also looked at autocorrelation the compute the relationship of the data with prior values.
Autocorrelation is how correlated a variable is with itself. Specifically, how related variables from earlier in time are with variables from later in time. Note the need for a mean value:
Stationarity
The criteria for classifying a series as stationary indicate that:
The mean of the series should not be a function of time, but rather should be a constant. The image below has the left-hand graph satisfying this condition, whereas the graph in red has a time-dependent mean.
The variance of the series should not be a function of time. This property is known as homoscedasticity. The following graph depicts what is and what is not a stationary series. (Notice the varying spread of distribution in the right-hand graph.)
The covariance of the
i
th term and the(i + m)
th term should not be a function of time. In the following graph, you'll notice that the spread becomes closer as time increases. Hence, the covariance is not constant with time for the "red series."
Why is this important? Many of the time series models work on the assumption that the time series is stationary. We can think of this as: If a time series has a particular behavior over time, there is a high probability that it will act similarly in the future.
As always, these assumptions may not apply to raw real-world data. (In fact, they rarely do!)
For example, typical stock market performance is not stationary. In this plot of Dow Jones performance since 1986, the mean is clearly increasing over time:
How can we test this for stationarity? One way is by visualizing the data.
Testing for Stationarity
Import the libraries and data.
Change the week
column to a datetime
object and make it the index of the DataFrame.
Plot the data.
It's clear that there is an overall increasing trend, as well as some seasonal variations.
But, if this visual check for stationarity wasn't so clear, we could use the following:
Plotting rolling statistics: This is another visual technique, but it can help us see if the moving average or moving variance varies over time.
Dickey-Fuller test: This is a statistical test for checking for stationarity. The null hypothesis of this test is that the time series is non-stationary. The Dickey-Fuller test returns a test statistic that is compared to a critical value for different confidence levels. If the test statistic is less than the critical value, we reject the null hypothesis and say that the time series is stationary.
Here's a function for plotting rolling statistics and using the Dickey-Fuller test. We'll use it on our drone data.
Although the variation in the standard deviation experiences one jump, the mean is clearly increasing with time. So, this is not a stationary series. Also, the test statistic of the Dickey-Fuller test is way more than the critical values. (Compare the signed values, not the absolute values.)
Let's discuss methods for making this time series stationary.
Making a Time Series Stationary
To make a time series stationary, we first need to understand what makes a it non-stationary. Two things can do this:
Trend: Variation over time.
Seasonality: Variations at specific timeframes.
Detrending
We can preprocess our data to achieve stationarity. Two common methods are detrending and differencing.
Detrending removes major trends in our data. The simplest way is to fit a line to the trend, then make a new series of the difference between the line and the true series.
Below are U.S. housing prices over time that demonstrate an upward trend. This makes the time series non-stationary, as the mean home price is increasing. The line fit through it represents the trend.
The bottom figure is the "detrended" data, where each data point is transformed by subtracting the value of the trend line at that point. This data now have a fixed mean and may be easier to model. This is similar to mean-scaling our features in earlier models with StandardScaler
.
Example: Detrending
Fit a trendline to the data.
Detrend the time series. In its simplest form, we literally subtract the trendline from the time series.
There is also a helper function for this.
Differencing
A related method is differencing. Instead of predicting the (non-stationary) series, we can predict the difference between two consecutive values. ARIMA models incorporate this approach.
Recall that we used Pandas' .diff()
method to find the difference in an earlier section.
Example: Differencing
Method 1: Naive Approach
Let's look at Walmart's weekly sales data over a two-year period from 2010 to 2012. The data set is separated by store and department, but we'll focus on analyzing one store for simplicity.
Read in the data set and take a look at it.
Filter the DataFrame to Store 1 sales and aggregate over departments to compute the total sales per store.
If we think sales have been stable from the start, we can simply take the last time period's sales and estimate the same value for the next time period. Such a forecasting technique, which assumes that the next expected point is equal to the last observed point, is called a naive method.
We will subset the data to look at the accuracy of each time series forecasting method on the data set. We'll use the first two years (2010–2011) as the "training" data and the last year (2012) as the "testing" data for the purposes of our demonstration.
Let's plot the data to take a look at the subsetting.
Let's see how the well the naive method does when forecasting sales.
Let's use RMSE to check the accuracy of our model on the test data set.
We can infer from the RMSE value and the graph above that the naive method isn’t suited for data sets with high variability. It's best suited for stable data sets. We can still improve our score by adopting different techniques, however. Let's try a different technique to attempt to improve the accuracy score.
Method 2: Simple Average
Simple average is a useful method when the values of interest are increasing and decreasing randomly by a small margin but the average remains constant. Simple average forecasts the weekly sales of the next time period to be somewhat similar to the average of all of the previous time periods.
Now, let's calculate RMSE to check to accuracy of our model.
This model improved the score a bit. The simple average method works best when the average at each time period remains constant.
Method 3: Moving Average
When we have data sets in which the sales/value has increased or decreased sharply some time periods ago, simply using the previous average of all of the data (like the simple average method) isn't appropriate. An improvement over the simple average in this case will only take the average of the sales for the last few time periods, as we are think that only recent values may matter. This is called the moving average technique and it uses a sliding time period window to calculate the average.
Using a simple moving average model, we forecast the next value(s) in a time series based on the average of a fixed finite number (p
) of the previous values. Thus, for all i > p
:
A moving average can be quite effective if you pick the right p
for the series.
We chose the data from the last eight weeks. We will now calculate RMSE to check to accuracy of our model.
This didn't perform very well. Perhaps we picked an incorrect p
, or perhaps this method isn't the best for our data set.
An advancement over the moving average method is the weighted moving average method. In the moving average method, we weigh the past n
observations equally. But, we might encounter situations where each of the observations from the past n
impacts the forecast in a different way. Typically, weights are chosen so that more recent points matter more.
Method 4: Simple Exponential Smoothing
Simple average and weighted moving average lie on opposite sides of the spectrum. Simple exponential smoothing lies between these two extremes and takes into account all of the data while weighing the data points differently. Simple exponential smoothing will calculate the forecast using weighted averages where the weights decrease exponentially as observations come from further in the past (the closet data points in time are weighted more heavily).
The one-step-ahead forecast for time t+1
is a weighted average of all of the observations in the series (y1,…,yt
). The rate at which the weights decrease is controlled by the parameter, α
(which is between 0 and 1).
Now, let's use RMSE to check to accuracy of our model.
Method 5: Holt's Linear Trend
The methods we've looked at so far don't do well when our data have high variations. If our data contain a trend, none of the previous methods would be able to take that into account. A method that can is Holt's linear trend method.
Recall that a time series data set can be decomposed into its trend, seasonality, and residual components. If the data set contains a trend, then Holt's linear trend method can be applied.
From these graphs, we can see that the data set follows an increasing trend, and we can use Holt's linear trend method to forecast the future sales.
Holt's linear trend is exponential smoothing applied to both the average value of the series (called "level") and trend.
To generate the forecasting equation, we will add these equations together.
Level Equation
:Trend Equation
:We will add these equations together to generate the forecast equation.
Forecast Equation
:We could have also multiplied the trend and level together to generate a multiplicative forecast equation. When the trend is linear, the additive equation is used. When the trend is exponential, the multiplicative equation is used. The multiplicative equation is usually a more stable predictor (but the additive is more intuitive).
Method 6: Holt-Winters Method
So far, we haven't taken into account the seasonality of the data set while forecasting. Holt's Winter method takes into account both trend and seasonality to forecast future sales. With this method, we will apply exponential smoothing to the seasonal components as well as the level and trend components.
Level Equation
:Trend Equation
:Seasonality Equation
:Forecast Equation
:(Where α
, β
, and γ
are the smoothing parameters — each between 0 and 1 — and s
is the length of the seasonal cycle.)
The trend equation is identical to the equation we used for Holt's linear trend method. The level equation is a weighted average between the seasonally adjusted observation and the non-seasonal forecast for time t
. The seasonal equation is a weighted average between the current seasonal index and the seasonal index of the same season s
time periods ago.
Just like in Holt's linear trend method, we can use either additive or multiplicative forecasting equations. When the seasonal variations are roughly constant throughout the series, we will use the additive method. When the seasonal variations change depending on the level of the series, we will use the multiplicative method.
Let's look at the RMSE to check accuracy.
Method 7: ARIMA
A very popular forecasting method is ARIMA, which stands for autoregressive integrated moving average. With the exponential smoothing methods we've been using, we've been basing our forecasts on the trend and seasonality in the data. ARIMA models describe the correlations in the data with each other.
The seasonal ARIMA takes into account seasonality (like Holt's Winter method did).
We can use the ACF and PACF to pick a model and our parameters for ARIMA.
Using the ACF and PACF Plots to Pick a Model
Recall from the section on autocorrelation that the autocorrelation function (ACF) is a plot of total correlation between different lag functions.
In a moving average series of lag n
, we will not get any correlation between x(t)
and x(t – n -1)
, so the total correlation chart cuts off at n
th lag. In other words, we can determine the lag for an MA series by when the ACF drops off sharply. For an AR series, this correlation will gradually go down without any cut-off value.
If the ACF tells us it is an AR series, then we turn to the PACF. If we find out the partial correlation of each lag, it will cut off after the degree of the AR series. For instance, if we have a AR(1) series, if we exclude the effect of first lag (x (t-1)
), our second lag (x (t-2)
) is independent of x(t)
. Hence, the partial correlation function (PACF) will drop sharply after the first lag.
ARIMA Model
Let's check the RMSE for accuracy of the model.