CoCalc -- 1.1 Augmented Dickey-Fuller (ADF) test.ipynb

GitHub Repository: suyashi29/python-su
Path: blob/master/Time Forecasting using Python/1.1 Augmented Dickey-Fuller (ADF) test.ipynb
³⁰⁷⁴ views

Kernel: Python 3 (ipykernel)

The Augmented Dickey-Fuller (ADF) test is a statistical hypothesis test used to determine whether a unit root is present in a time series dataset. In simpler terms, it assesses whether a time series is stationary or non-stationary.

Stationarity is a key assumption in many time series analysis techniques, including ARIMA modeling. A stationary time series is one whose statistical properties (such as mean, variance, and autocorrelation) remain constant over time. Non-stationary time series, on the other hand, exhibit trends, seasonality, or other patterns that change over time.

What is Unit Root?

The ADF test belongs to a category of tests called Unit Root Test, which is the proper method for testing the stationarity of a time series.
Unit root is a characteristic of a time series that makes it non-stationary. Technically speaking, a unit root is said to exist in a time series of the value of alpha = 1 in the below equation:

-The presence of a unit root means the time series is non-stationary. Besides, the number of unit roots contained in the series corresponds to the number of differencing operations required to make the series stationary

The Augmented Dickey-Fuller (ADF) test holds significant importance in time series analysis for several reasons:

Assessing Stationarity: One of the fundamental assumptions in time series analysis is stationarity. The ADF test helps in determining whether a time series is stationary or non-stationary.
Model Selection: Stationarity is a prerequisite for many time series models, including autoregressive integrated moving average (ARIMA) models. The ADF test aids in model selection by guiding the choice of appropriate differencing orders.
Avoiding Spurious Regression: In regression analysis involving non-stationary time series, there is a risk of obtaining spurious regression results, where apparent relationships between variables are purely coincidental. By confirming stationarity through the ADF test, analysts can mitigate this risk and ensure the validity of their regression models.
Forecasting Accuracy: Stationary time series are typically easier to model and forecast accurately. By confirming stationarity using the ADF test, analysts can proceed with confidence in building forecasting models, leading to more reliable predictions.

Dickey-Fuller (DF)

A Dickey-Fuller test is a unit root test that tests the null hypothesis that α=1 in the following model equation. alpha is the coefficient of the first lag on Y.

Null Hypothesis (H0): alpha=1

ADF

ADF test is an ‘augmented’ version of the Dickey Fuller test.
The ADF test expands the Dickey-Fuller test equation to include high order regressive process in the model.
we have only added more differencing terms, while the rest of the equation remains the same. This adds more thoroughness to the test.

***The null hypothesis however is still the same as the Dickey Fuller test.

A key point to remember here is: Since the null hypothesis assumes the presence of unit root, that is α=1, the p-value obtained should be less than the significance level (say 0.05) in order to reject the null hypothesis. Thereby, inferring that the series is stationary.

ADF Test in Python:

The statsmodel package provides a reliable implementation of the ADF test via the adfuller() function in statsmodels.tsa.stattools. It returns the following outputs:

The p-value
The value of the test statistic
Number of lags considered for the test
The critical value cutoffs.

When the test statistic is lower than the critical value shown, you reject the null hypothesis and infer that the time series is stationary.

In [28]:

import numpy as np
np.random.seed(0)
series = np.random.randn(150)
series

Out[28]:

array([ 1.76405235,  0.40015721,  0.97873798,  2.2408932 ,  1.86755799,
       -0.97727788,  0.95008842, -0.15135721, -0.10321885,  0.4105985 ,
        0.14404357,  1.45427351,  0.76103773,  0.12167502,  0.44386323,
        0.33367433,  1.49407907, -0.20515826,  0.3130677 , -0.85409574,
       -2.55298982,  0.6536186 ,  0.8644362 , -0.74216502,  2.26975462,
       -1.45436567,  0.04575852, -0.18718385,  1.53277921,  1.46935877,
        0.15494743,  0.37816252, -0.88778575, -1.98079647, -0.34791215,
        0.15634897,  1.23029068,  1.20237985, -0.38732682, -0.30230275,
       -1.04855297, -1.42001794, -1.70627019,  1.9507754 , -0.50965218,
       -0.4380743 , -1.25279536,  0.77749036, -1.61389785, -0.21274028,
       -0.89546656,  0.3869025 , -0.51080514, -1.18063218, -0.02818223,
        0.42833187,  0.06651722,  0.3024719 , -0.63432209, -0.36274117,
       -0.67246045, -0.35955316, -0.81314628, -1.7262826 ,  0.17742614,
       -0.40178094, -1.63019835,  0.46278226, -0.90729836,  0.0519454 ,
        0.72909056,  0.12898291,  1.13940068, -1.23482582,  0.40234164,
       -0.68481009, -0.87079715, -0.57884966, -0.31155253,  0.05616534,
       -1.16514984,  0.90082649,  0.46566244, -1.53624369,  1.48825219,
        1.89588918,  1.17877957, -0.17992484, -1.07075262,  1.05445173,
       -0.40317695,  1.22244507,  0.20827498,  0.97663904,  0.3563664 ,
        0.70657317,  0.01050002,  1.78587049,  0.12691209,  0.40198936,
        1.8831507 , -1.34775906, -1.270485  ,  0.96939671, -1.17312341,
        1.94362119, -0.41361898, -0.74745481,  1.92294203,  1.48051479,
        1.86755896,  0.90604466, -0.86122569,  1.91006495, -0.26800337,
        0.8024564 ,  0.94725197, -0.15501009,  0.61407937,  0.92220667,
        0.37642553, -1.09940079,  0.29823817,  1.3263859 , -0.69456786,
       -0.14963454, -0.43515355,  1.84926373,  0.67229476,  0.40746184,
       -0.76991607,  0.53924919, -0.67433266,  0.03183056, -0.63584608,
        0.67643329,  0.57659082, -0.20829876,  0.39600671, -1.09306151,
       -1.49125759,  0.4393917 ,  0.1666735 ,  0.63503144,  2.38314477,
        0.94447949, -0.91282223,  1.11701629, -1.31590741, -0.4615846 ])

In [26]:

#ADF Test on stationary series
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller
series = np.random.randn(100)
result = adfuller(series, autolag='AIC')
print(f'ADF Statistic: {result[0]}')
print(f'p-value: {result[1]}')
for key, value in result[4].items():
    print('Critial Values:')
    print(f'   {key}, {value}')

Out[26]:

ADF Statistic: -5.763202872521945
p-value: 5.60986653760901e-07
Critial Values:
   1%, -3.5019123847798657
Critial Values:
   5%, -2.892815255482889
Critial Values:
   10%, -2.583453861475781

The p-value is very less than the significance level of 0.05 and hence we can reject the null hypothesis and take that the series is stationary.

Visual representation

In [7]:

import matplotlib.pyplot as plt
%matplotlib inline
fig, axes = plt.subplots(figsize=(12,7))
plt.plot(series);
plt.title('Random');

Out[7]:

Example 2 : ADF Test

In [29]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller
# Generate sample time series data
np.random.seed(0)
dates = pd.date_range(start='2022-01-01', periods=100, freq='D')
trend = 0.1 * np.arange(100)
seasonal_pattern = np.sin(np.arange(100) * np.pi / 6) * 2
noise = np.random.normal(loc=0, scale=0.1, size=len(dates))
series = trend + seasonal_pattern + noise
series

Out[29]:

array([ 0.17640523,  1.14001572,  2.02992461,  2.52408932,  2.31880661,
        1.40227221,  0.69500884, -0.31513572, -0.94237269, -1.05894015,
       -0.71764645,  0.24542735,  1.27610377,  2.3121675 ,  3.17643713,
        3.53336743,  3.48145871,  2.67948417,  1.83130677,  0.81459043,
        0.01265021,  0.16536186,  0.55439281,  1.2257835 ,  2.62697546,
        3.35456343,  4.33662666,  4.68128161,  4.68532873,  4.04693588,
        3.01549474,  2.13781625,  1.37917062,  1.10192035,  1.63315798,
        2.5156349 ,  3.72302907,  4.82023798,  5.49331813,  5.86976972,
        5.62719551,  4.95799821,  4.02937298,  3.49507754,  2.61698397,
        2.45619257,  2.74266966,  3.77774904,  4.63861022,  5.87872597,
        6.64250415,  7.13869025,  6.88097029,  6.18193678,  5.39718178,
        4.54283319,  3.87460091,  3.73024719,  4.00451698,  4.86372588,
        5.93275396,  7.06404468,  7.85073618,  8.12737174,  8.14979342,
        7.45982191,  6.43698017,  5.74627823,  4.97721936,  4.90519454,
        5.34085825,  6.11289829,  7.31394007,  8.17651742,  9.17228497,
        9.43151899,  9.24497109,  8.64211503,  7.76884475,  6.90561653,
        6.15143421,  6.19008265,  6.51451544,  7.14637563,  8.54882522,
        9.68958892, 10.44992876, 10.68200752, 10.42497555, 10.00544517,
        8.95968231,  8.22224451,  7.48877669,  7.3976639 ,  7.70358583,
        8.57065732,  9.60105   , 10.87858705, 11.54474202, 11.94019894])

In [30]:

# Perform Augmented Dickey-Fuller (ADF) test
result = adfuller(series)

# Extract and print test statistics and p-value
adf_statistic = result[0]
p_value = result[1]
print("ADF Statistic:", adf_statistic)
print("p-value:", p_value)

# Interpret the test result
if p_value < 0.05:
    print("Reject the null hypothesis: The time series is stationary.")
else:
    print("Fail to reject the null hypothesis: The time series is non-stationary.")

Out[30]:

ADF Statistic: 2.3983961997386816
p-value: 0.9990095528726992
Fail to reject the null hypothesis: The time series is non-stationary.

In [32]:


# Plot the time series data
plt.figure(figsize=(15, 7))
plt.plot(dates, series)
plt.title('Sample Time Series Data')
plt.xlabel('Date')
plt.ylabel('Value')
plt.grid(True)
plt.show()

Out[32]: