Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
YStrano
GitHub Repository: YStrano/DataScience_GA
Path: blob/master/april_18/lessons/lesson-16/code/starter-code/starter-code-16.ipynb
1905 views
Kernel: Python 2
import pandas as pd import numpy as np %matplotlib inline

Walmart Sales Data

For the independent practice, we will analyze the weekly sales data from Walmart over a two year period from 2010 to 2012.

The data is again separated by store and by department, but we will focus on analyzing one store for simplicity.

The data includes:

  • Store - the store number

  • Dept - the department number

  • Date - the week

  • Weekly_Sales - sales for the given department in the given store

  • IsHoliday - whether the week is a special holiday week

Loading the data and setting the DateTimeIndex

data = pd.read_csv('../../assets/dataset/train.csv') data['Date'] = pd.to_datetime(data['Date']) data.set_index('Date', inplace=True) data.head()

Filter the dataframe to Store 1 sales and aggregate over departments to compute the total sales per store.

# TODO
# TODO

Compute the 1, 2, 52 autocorrelations for Weekly_Sales and/or create an autocorrelation plot.

# TODO
# TODO
# TODO

Split the weekly sales data in a training and test set - using 75% of the data for training

# TODO

Create an AR(1) model on the training data and compute the mean absolute error of the predictions.

import statsmodels.api as sm from sklearn.metrics import mean_absolute_error
# TODO

Plot the residuals - where are their significant errors.

# TODO
# TODO

Compute and AR(2) model and an ARMA(2, 2) model - does this improve your mean absolute error on the held out set.

# TODO
# TODO

Finally, compute an ARIMA model to improve your prediction error - iterate on the p, q, and parameters comparing the model's performance.

# TODO