Path: blob/master/april_18/lessons/lesson-16/L16-Demo.ipynb
1904 views
To explore time series models, we will continue with the Rossmann sales data. This dataset has sales data for sales at every Rossmann store for a 3-year period, as well indicators of holidays and basic store information.
In the last class, we saw that we would plot the sales data at a particular store to identify how the sales changed over time. Additionally, we computed autocorrelation for the data at varying lag periods. This helps us identify if previous timepoints are predictive of future data and which time points are most important - the previous day? week? month?
Check Compute the autocorrelation of Sales in Store 1 for lag 1 and 2. Will we be able to use a predictive model - particularly an autoregressive one?
Pandas and statsmodels both provide convenience plots for autocorrelations.
Check: What caused the spike at 7?
ARMA Model
Recall that ARMA(p, q)
models are a sum of an AR(p)
and a MA(q)
model. So if we want just an AR(p)
model we use and ARMA(p, 0)
model.
Just like with other types of regression, we can compute the model residuals.
Check: What are residuals? In linear regression, what did we expect of residuals?
Becuase of the errors, it doesn't look like an AR model is good enough -- the data isn't stationary. So let's expand to an ARMA
model.