Path: blob/master/april_18/lessons/lesson-16/code/solution-code/solution-code-16.ipynb
1905 views
Kernel: Python 2
In [3]:
Walmart Sales Data
For the independent practice, we will analyze the weekly sales data from Walmart over a two year period from 2010 to 2012.
The data is again separated by store and by department, but we will focus on analyzing one store for simplicity.
The data includes:
Store - the store number
Dept - the department number
Date - the week
Weekly_Sales - sales for the given department in the given store
IsHoliday - whether the week is a special holiday week
Loading the data and setting the DateTimeIndex
In [5]:
Out[5]:
Filter the dataframe to Store 1 sales and aggregate over departments to compute the total sales per store.
In [6]:
Out[6]:
Plot the rolling_mean for Weekly_Sales. What general trends do you observe?
In [7]:
Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x1120d7450>
Compute the 1, 2, 52 autocorrelations for Weekly_Sales and/or create an autocorrelation plot.
In [8]:
Out[8]:
('Autocorrelation 1: ', 0.30215827941131324)
('Autocorrelation 3: ', 0.059799235066717457)
('Autocorrelation 52: ', 0.89537602947770079)
In [9]:
Out[9]:
<matplotlib.axes._subplots.AxesSubplot at 0x111e58050>
In [10]:
Out[10]:
Split the weekly sales data in a training and test set - using 75% of the data for training
In [11]:
Create an AR(1) model on the training data and compute the mean absolute error of the predictions.
In [12]:
In [13]:
Out[13]:
('Mean absolute error: ', 81839.338629691949)
/Users/arahuja/anaconda/lib/python2.7/site-packages/statsmodels/base/data.py:503: FutureWarning: TimeSeries is deprecated. Please use Series
return TimeSeries(result, index=self.predict_dates)
Plot the residuals - where are their significant errors.
In [14]:
Out[14]:
<matplotlib.axes._subplots.AxesSubplot at 0x119d926d0>
In [15]:
Out[15]:
Compute and AR(2) model and an ARMA(2, 2) model - does this improve your mean absolute error on the held out set.
In [16]:
Out[16]:
('Mean absolute error: ', 81203.240909485947)
In [17]:
Out[17]:
('Mean absolute error: ', 80502.745386798299)
Finally, compute an ARIMA model to improve your prediction error - iterate on the p, q, and parameters comparing the model's performance.
In [18]:
Out[18]:
('Mean absolute error: ', 77789.494825392394)
/Users/arahuja/anaconda/lib/python2.7/site-packages/statsmodels/base/model.py:466: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
"Check mle_retvals", ConvergenceWarning)
In [ ]: