Path: blob/master/Time Forecasting using Python/2.1 Auto Regressive Model .ipynb
3074 views
A basic framework for implementing Autoregressive (AR) models for time series forecasting in Python
Autoregression (AR) is a type of time series model used for predicting future values based on past values. It assumes that the current value of the series is a linear combination of its previous values, plus a random error term. The basic idea is to exploit the temporal dependence in data where the value at time 𝑡 t can be explained by its own previous values.
Key points about autoregression:
Example: AR(1) Model
Assume we have a time series which follows an AR(1) model. The AR(1) model is given by:
statsmodels library in Python to create an AR(1) model and generate sample data.
Dep. Variable: This section specifies the dependent variable used in the model. In this case, it's labeled as "y".
No. Observations: Indicates the number of observations in the dataset used to fit the model. Here, it's 100.
Model: Specifies the type of model used. In this case, it's an AutoReg(1) model, which means it's an autoregressive model of order 1.
Log Likelihood: This is the value of the log-likelihood function at the maximum likelihood estimates of the parameters. It measures the goodness-of-fit of the model. A higher log-likelihood indicates a better fit.
Method: Indicates the method used for parameter estimation. In this case, it's "Conditional MLE", which stands for Conditional Maximum - Likelihood Estimation.
S.D. of innovations: Represents the standard deviation of the innovations (residuals) of the model. It gives an idea of the spread of the errors around the fitted values.
AIC (Akaike Information Criterion): AIC is a measure of the model's goodness-of-fit, penalized for the number of parameters in the model. Lower AIC values indicate a better trade-off between model fit and complexity.
BIC (Bayesian Information Criterion): Similar to AIC, BIC is another measure of model goodness-of-fit, but it penalizes more heavily for model complexity. It often results in more parsimonious models compared to AIC.
Sample: Specifies the range of observations used in the estimation. In this case, it's from observation 1 to 100.
Coefficients: Lists the estimated coefficients of the model.
const: Represents the intercept term.
y.L1: Represents the coefficient for the lag 1 term of the autoregressive process.
Standard Error: Provides the standard errors associated with the estimated coefficients.
z-Value and P>|z|: These values are associated with the significance tests for the coefficients. The z-value is the ratio of the estimated coefficient to its standard error.
P>|z| is the p-value associated with the null hypothesis that the coefficient is equal to zero. Lower p-values indicate greater significance.
Confidence Intervals [0.025 0.975]: Provides the 95% confidence intervals for the estimated coefficients. Roots:
ists the roots of the autoregressive polynomial. In this case, the AR polynomial has one real root at approximately 1.7072.
Example to predict attrition rate using AR
Determining the Best Lag using AIC and BIC
Aspect | Autocorrelation | Partial Autocorrelation |
---|---|---|
Definition | Correlation with lagged versions of itself | Correlation with lagged versions, controlling for other lags |
Influence | Direct influence of all previous observations | Removes indirect effects through intermediate observations |
Calculation Method | Autocorrelation Function (ACF) | Partial Autocorrelation Function (PACF) |
Interpretation | Identifies serial correlation in a time series | Determines order of autoregressive models |
Role in Time Series | Detects patterns like trends or seasonality | Identifies significant lags in AR models |
Mathematical Formula | Autocorrelation at lag k: ρ_k | Partial autocorrelation at lag k: ϕ_{k,k} |
Sales Day
1000 23 2000 24 3000 25 4000 26
Sales for 27th: AR(1) = 26th -25th, 25th-24th, 24-23, 23 AR(2) = 26th,25t and 24th