Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
YStrano
GitHub Repository: YStrano/DataScience_GA
Path: blob/master/lessons/lesson_16/solution-code/05_independent_practice_solutions.ipynb
1904 views
Kernel: Python 2

Time Series Independent Practice

import pandas as pd import numpy as np import matplotlib.pyplot as plt %matplotlib inline

Walmart Sales Data

For this independent practice, we'll analyze Walmart's weekly sales data over a two-year period from 2010 to 2012.

The data set is again separated by store and department, but we'll focus on analyzing one store for simplicity.

The data include:

  • Store: The store number.

  • Dept: The department number.

  • Date: The week.

  • Weekly_Sales: Sales for the given department in the given store.

  • IsHoliday: Whether the week is a special holiday week.

1) Preprocess the data using Pandas.

  • Load the data.

  • Convert the Date column to a datetime object.

  • Set Date as the index of the DataFrame.

walmart = pd.read_csv('data/train.csv')
walmart.head()
walmart.dtypes
Store int64 Dept int64 Date object Weekly_Sales float64 IsHoliday bool dtype: object
walmart['Date'] = pd.to_datetime(walmart['Date'])
walmart.dtypes
Store int64 Dept int64 Date datetime64[ns] Weekly_Sales float64 IsHoliday bool dtype: object
walmart.set_index('Date', inplace=True)
walmart.head()

2) Filter the DataFrame to Store 1 sales and aggregate over departments to compute the total weekly sales per store. Store this in a new DataFrame.

store1 = walmart[walmart.Store == 1][['Weekly_Sales']].resample('W', 'sum')
/Users/nicholebennett/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:1: FutureWarning: how in .resample() is deprecated the new syntax is .resample(...).sum() """Entry point for launching an IPython kernel.
store1.head()
store1[['Weekly_Sales']].rolling(window = 3, center = True).mean().plot()
<matplotlib.axes._subplots.AxesSubplot at 0x1c0dd14860>
Image in a Jupyter notebook

4) Compute the 1, 13, and 52 autocorrelations for Weekly_Sales and/or create an autocorrelation plot.

print('Autocorrelation 1: ', store1['Weekly_Sales'].autocorr(1)) print('Autocorrelation 13: ', store1['Weekly_Sales'].autocorr(13)) print('Autocorrelation 52: ', store1['Weekly_Sales'].autocorr(52))
Autocorrelation 1: 0.302158279411 Autocorrelation 13: 0.10169228502 Autocorrelation 52: 0.895376029478
from pandas.tools.plotting import autocorrelation_plot autocorrelation_plot(store1['Weekly_Sales'])
/Users/nicholebennett/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:3: FutureWarning: 'pandas.tools.plotting.autocorrelation_plot' is deprecated, import 'pandas.plotting.autocorrelation_plot' instead. This is separate from the ipykernel package so we can avoid doing imports until
<matplotlib.axes._subplots.AxesSubplot at 0x10ac317b8>
Image in a Jupyter notebook
from statsmodels.graphics.tsaplots import plot_acf plot_acf(store1['Weekly_Sales'], lags=60) plt.show()
Image in a Jupyter notebook

5) Create a decomposition plot for the Store 1 sales data.

from statsmodels.tsa.seasonal import seasonal_decompose decomposition = seasonal_decompose(store1.Weekly_Sales, freq=13) decomposition.plot() plt.show()
Image in a Jupyter notebook

6) Based on the analyses above, what can we deduce about this time series?

# Big autocorrelation spikes are happening around 52, indicating some kind of yearly pattern. # Autocorrelation is high at 1 and 2 lags (perhaps even up to 4), so it's most likely useful for an autoregressive model. # There are no random spikes, so there's probably not much use for a moving average model.