Path: blob/master/time_series/3_supervised_time_series.ipynb
1470 views
Framing Time Series As Supervised Learning Problem
The gist behind time series analysis is that we are given some quantitative measures about the past and we wish to use these informations to predict the future to enable better planning, decision-making and so on. The main difference between time series problem and traditional prediction problems is that: in traditional prediction problems such as image classification, the data points there are assumed to be independent of one another. Whereas, time series analysis' data points have a temporal nature in them, i.e. The time dimension adds an explicit ordering to our data points that should be preserved because they can provide additional/important information to the learning algorithms.
This is not to say machine learning methods like supervised learning can't be used for time series forecasting, but before we apply these supervised learning methods on our time series data, we need to do some preprocessing step to make them applicable. There are 4 classes of time series based features that we can create out of our time series dataset.
Date & time features. e.g. Given the date 2019-08-02, we can extract features such as year, month and date to create 3 additional features out of the original timestamp. Or enumerate all the attributes of a timestamp.
Lag features. a.k.a values at prior time steps.
Window features. These are summary statistics over a fixed window.
Time until next event / Time since last event.
We give some examples of how the lag features and window features are constructed in the following two sections.
Lag Features
Given a sequence of numbers for a time series dataset, we can restructure the data to look like a supervised learning problem by using previous time steps as input variables and the next time step as the output variable. Let's make this concrete with an example. Imagine we have a time series as follows:
Given the original data above, we can re-frame into a format that's applicable for a supervised learning model:
The second row of the data shows that our input variable X is the measure at time 1 and our target variable y is the measure at time 2.
We can see that the order between the observations is preserved, and must continue to be preserved when using this dataset to train a supervised model.
Because we have no previous value that we can use to predict the first value in the sequence, we will need to delete this record. The same goes for the last value in the sequence (we do not have a known next value to predict for).
The use of prior time steps to predict the next time step is called the sliding window method. Note that it might also be refer to as the lag method. And the number of previous time steps to look at is called the window width or size of the lag. In the example above, we are using a window size of 1.
This sliding window approach forms the basis for how we can turn any time series dataset into a supervised learning problem and it can also be used on a time series that has more than one value, or so-called multivariate time series. An example of this is shown below:
Let's assume we have the contrived multivariate time series dataset below with two observed features at each time step. And we are concerned with predicting measure2. Then we would go from:
to:
Window Features
One step beyond adding raw lagged value is to add aggregated/summary statistics of values at previous time steps. The most common summary statistics is the mean. Using an univariate time series as an example.
Pandas provides a few variants such as rolling, expanding and exponentially moving weights for calculating these type of window statistics. e.g. rolling()
function that creates a new data structure with the window of values at each time step. Here, we've creating a rolling window size of 3 and calculates the mean for each of the window. As we can see the first non NaN value is the third row, which is calculated by the mean of the previous 3 records (100 + 110 + 108) / 3.
Implementation
If we are using pandas, one useful function that can help transform time series data into a format that's applicable for supervised learning problem is the shift()
function. Given a DataFrame, the shift()
(some other libraries call it lag
) function can be used to create copies of columns that are pushed forward or backward.
Let's first look at an example of the shift function in action. We start off by defining a toy time series dataset as a sequence of 10 numbers then use the shift function to create the "lagged" time series.
Running the code chunk above gives us two columns in the dataset. The first contains the original observations and the second has the shifted observation. Note that the last row would have to be discarded because of the NaN value (there's no value to shift up).
From the output above, we can see that shifting the series forward one time step gives us a primitive supervised learning problem, where the first row shows the input value of 0.0 corresponding to the output of the second column 1.0.
Moving on to an actual dataset, we will use a real mobile game data that depicts ads watched per hour.
Given this 1 dimensional time series data, we will need to do some feature-engineering to make it applicable for a downstream supervised learning method. Including:
Generating lagged features and window statistics from them.
We will also add date & time features such as hour of the day, day of the week and a boolean feature indicating whether this is a weekend.
To generate the window statistics, we'll generate the mean of the lagged features we've created.
We didn't rely on the .rolling()
method to generate our rolling mean as we can compute it by taking the mean of the lagged features we've already generated, but the code chunk below also shows how we can generate the same statistics using the .rolling()
method.
Now that we've prepared the data, the next couple of code chunks trains a RandomForest model on the training set (based on a time series train and test split), and evaluates the mean absolute percentage error on the test set and also looks at the feature importance that comes with the RandomForest model.
Looking at the feature importance plot, the most important feature is lag_24
(each lag is an hour for this dataset), which is not a surprise as yesterday's value is usually a good indicator of what the value is going to be today ...
Hopefully, after reading through this documentation, you have an understanding of how we can re-frame our time series problem into a supervised machine learning problem and perform model training and forecasting using supervised machine learning models.
Side Note:
In communicating forecasting result, a useful summary can be something like:
"If nothing unexpected happens we expect to be within ±x %, but if assumptions a, b, or c perform differently than expected, we might be as much as ±y % off."
For the ±x % part, notice that in this documentation, we added a confidence interval functionality to our time series forecasting plot. This is where the confidence interval part can come in handy.
Whereas Monte Carlo simulation can come in handy for the second part of the sentence (if assumptions a, b, or c perform differently ...). For example, when making a supply forecast for a specific event like a product launch, it makes sense to consider scenarios with different sales and production rates as well as any sales bursts arising from media buzz. The scenarios can take the form of low, medium, and high alternatives or Monte Carlo simulations of many possible outcomes from some distribution of impact magnitudes. Note that for the simulation, it helps to understand the drivers that potentially has the most impact so that we know exactly what to simulate.