Path: blob/master/lessons/lesson_17/Forecasting a Time Series in Python.ipynb
1904 views
How to Forecast a Time Series with Python
Wouldn't it be nice to know the future? This is the notebook that relates to the blog post on medium. Please check the blog for visualizations and explanations, this notebook is really just for the code 😃
Processing the Data
Let's explore the Industrial production of electric and gas utilities in the United States, from the years 1985-2018, with our frequency being Monthly production output.
You can access this data here: https://fred.stlouisfed.org/series/IPG2211A2N
This data measures the real output of all relevant establishments located in the United States, regardless of their ownership, but not those located in U.S. territories.
Right now our index is actually just a list of strings that look like a date, we'll want to adjust these to be timestamps, that way our forecasting analysis will be able to interpret these values:
Let's first make sure that the data doesn't have any missing data points:
Let's also rename this column since its hard to remember what "IPG2211A2N" code stands for:
Requirement already satisfied: pyramid-arima in /anaconda3/lib/python3.6/site-packages
Requirement already satisfied: Cython>=0.23 in /anaconda3/lib/python3.6/site-packages (from pyramid-arima)
Requirement already satisfied: statsmodels>=0.8 in /anaconda3/lib/python3.6/site-packages (from pyramid-arima)
Requirement already satisfied: numpy>=1.9 in /anaconda3/lib/python3.6/site-packages (from pyramid-arima)
Requirement already satisfied: scikit-learn>=0.17 in /anaconda3/lib/python3.6/site-packages (from pyramid-arima)
Requirement already satisfied: scipy>=0.9 in /anaconda3/lib/python3.6/site-packages (from pyramid-arima)
You are using pip version 9.0.1, however version 10.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
**he AIC measures how well a model fits the data while taking into account the overall complexity of the model. A model that fits the data very well while using lots of features will be assigned a larger AIC score than a model that uses fewer features to achieve the same goodness-of-fit. Therefore, we are interested in finding the model that yields the lowest AIC value.
Model Validation
Split "Train/ Test" (i.e. use earlier data to predict later data)
Examine Residuals to make sure that there is no autocorrelation
Compare against actual data
We'll train on 20 years of data, from the years 1985-2015 and test our forcast on the years after that and compare it to the real data.
Lets look at a close up of our predicted values versus the actual values
the interpretaion is relative to other models