CoCalc -- 1.2 Understanding Moving Average and Its Implementation.ipynb

GitHub Repository: suyashi29/python-su
Path: blob/master/Time Forecasting using Python/1.2 Understanding Moving Average and Its Implementation.ipynb
³⁰⁷⁴ views

Kernel: Python 3 (ipykernel)

Moving average forecast is a simple and widely used method for time series forecasting. It works by taking the average of a subset of the most recent data points to make predictions for future values.

The moving average method is a simple yet effective technique used for smoothing time series data and identifying trends or patterns. It involves calculating the average of a fixed number of consecutive data points, referred to as the window size or the period.

The moving average method smooths out short-term fluctuations in the data and highlights long-term trends. It is commonly used for forecasting and trend analysis in various fields, including finance, economics, and signal processing.

how the moving average forecast typically works:

Select a Window Size: Decide on the number of previous data points (the window size) to include in the moving average calculation. This window size determines the smoothing effect of the forecast.
Calculate the Moving Average: For each time step, calculate the average of the data points within the selected window.
Make Predictions: Use the calculated moving average as the forecast for the next time step.
Repeat: As new data becomes available, update the moving average by including the latest data point and removing the oldest data point from the window. Then, repeat the process of making predictions.
Moving average forecast can be implemented using different types of moving averages, such as:
Simple Moving Average (SMA): This is the most basic form of moving average, where each data point in the window is given equal weight.
Weighted Moving Average (WMA): In WMA, different weights are assigned to each data point in the window. Usually, more recent data points are given higher weights.
Exponential Moving Average (EMA): EMA gives more weight to recent observations while still considering older data. It is calculated using a smoothing factor that exponentially decreases with time.

A simple example to show how Moving Average works

x=[5,7,9,6,10,8,12,11]

Calculating Moving Avearge using Python Libraries

In [1]:

##Example 1
# Get the list from the user
user_input = input("Enter numbers separated by spaces: ")
simple_list = list(map(float, user_input.split()))

# Define the window size for the moving average
window_size = int(input("Enter the window size for the moving average: "))

# Calculate the moving average
moving_averages = []
for i in range(len(simple_list) - window_size + 1):
    window = simple_list[i:i + window_size]
    window_average = sum(window) / window_size
    moving_averages.append(window_average)

# Display the results
print(f"Original List: {simple_list}")
print(f"Moving Averages: {moving_averages}")

Out[1]:

Enter numbers separated by spaces: 22 24 26 28 55 45 35 35 46 
Enter the window size for the moving average: 2
Original List: [22.0, 24.0, 26.0, 28.0, 55.0, 45.0, 35.0, 35.0, 46.0]
Moving Averages: [23.0, 25.0, 27.0, 41.5, 50.0, 40.0, 35.0, 40.5]

In [ ]:

### Example 2 
# Get the list from the user
user_input = input("Enter numbers separated by spaces: ")
simple_list = list(map(float, user_input.split()))

# Define the window size for the moving average
window_size = int(input("Enter the window size for the moving average: "))

# Calculate the moving average using a more Pythonic way (list comprehension)
moving_averages = [sum(simple_list[i:i + window_size]) / window_size for i in range(len(simple_list) - window_size + 1)]

# Display the results
print(f"Original List: {simple_list}")
print(f"Moving Averages: {moving_averages}")

### ### The numpy.random.seed() function is used to set the seed for generating random numbers in NumPy. Setting the seed ensures reproducibility of the random numbers generated 

import numpy as np

# Set the seed
np.random.seed(42)

# Generate 5 random numbers
random_numbers = np.random.rand(5)

print("Random numbers generated with seed 42:", random_numbers)

# Reset the seed and generate the same random numbers again
np.random.seed(42)
random_numbers_again = np.random.rand(5)

print("Random numbers generated again with seed 42:", random_numbers_again)

In [3]:

import numpy as np

# List of integers
data = np.array([5, 10, 15, 20, 25, 30, 35, 40, 45, 50])

# Window size
window_size = 4

# Calculate the moving average
moving_avg = np.convolve(data, np.ones(window_size)/window_size, mode='valid')

print("Moving Average (Window Size 4):", moving_avg)

Out[3]:

Moving Average (Window Size 4): [12.5 17.5 22.5 27.5 32.5 37.5 42.5]

Forecasting using Moving Average

Forecasting using a moving average in Python involves extending the concept of a moving average to predict future values based on past data.

In [ ]:

# Example: Forecasting with Moving Average

# Import necessary libraries
import numpy as np

# Get the list from the user
user_input = input("Enter numbers separated by spaces: ")
data = list(map(float, user_input.split()))

# Define the window size for the moving average
window_size = int(input("Enter the window size for the moving average: "))

# Calculate the moving averages
moving_averages = [np.mean(data[i:i+window_size]) for i in range(len(data) - window_size + 1)]

# Forecast the next value
# The next forecasted value is the average of the last 'window_size' elements
forecast = np.mean(data[-window_size:])

# Display the results
print(f"Original Data: {data}")
print(f"Moving Averages: {moving_averages}")
print(f"Forecasted Next Value: {forecast}")

Example Walkthrough:

Input Data: Suppose you input the following data: 10 12 14 16 18 20
Window Size: You choose a window size of 3.

Moving Averages: The moving averages will be calculated as:

(10 + 12 + 14) / 3 = 12
(12 + 14 + 16) / 3 = 14
(14 + 16 + 18) / 3 = 16
(16 + 18 + 20) / 3 = 18

Forecast: The forecasted next value will be (18 + 20) / 2 = 19 (using a window size of 2 in this case).

A Basic workflow for time series analysis, including data generation, visualization, stationarity testing, differencing, model fitting, forecasting, and visualization of forecasted values.(ARIMA)

Fitting the MA Model:

Key Terms Explained:

Dependent Variable: y This indicates the variable being modeled (the time series data). No. Observations: 24
The number of data points used in the model. In this case, there are 24 observations in the time series data.

Model: ARIMA(0, 0, 1)

This specifies the model configuration: ARIMA(p, d, q): p: The number of autoregressive terms (AR). Here, it's 0. d: The number of differences to make the series stationary (differencing). Here, it's 0. q: The number of moving average terms (MA). Here, it's 1, which means the model includes a moving average component with one lag.

Log Likelihood: -94.595

This is a measure of how well the model fits the data. A higher (less negative) log likelihood indicates a better fit. It is used in calculating the information criteria.

AIC (Akaike Information Criterion): 195.189

AIC is a metric used to compare different models. It takes into account the model's goodness of fit and the number of parameters. Lower AIC values indicate a better model fit.

BIC (Bayesian Information Criterion): 198.724

Similar to AIC, BIC also measures model fit but with a greater penalty for models with more parameters. Lower BIC values suggest a better model.

HQIC (Hannan-Quinn Information Criterion): 196.127

HQIC is another criterion for model comparison, balancing fit and complexity. It typically falls between AIC and BIC in terms of penalty for the number of parameters.

Covariance Type: opg

This refers to the type of covariance matrix used for estimating the model's standard errors. "opg" stands for outer product of gradients.

Coefficients Table:

const (Constant Term): 133.3215 his is the estimated constant (intercept) of the model. It represents the baseline level of the series.

ma.L1 (Moving Average Term): 0.7537

This is the coefficient for the MA(1) term. It indicates the influence of the lagged forecast error on the current value.

sigma2 (Variance of the Residuals): 149.9109

This is the estimated variance of the model residuals (errors). It measures the variability of the forecast errors.

Statistical Tests

Ljung-Box (L1) (Q): 2.13

This test checks whether the residuals from the model are independent. A higher Q statistic suggests less independence.

Prob(Q): 0.14

The p-value for the Ljung-Box test. A higher p-value indicates that the residuals are likely independent, suggesting that the model has captured the time series structure well.

Jarque-Bera (JB): 0.87

A test for normality of residuals. It checks if the residuals are normally distributed. Prob(JB): 0.65 The p-value for the Jarque-Bera test. A higher p-value indicates that the residuals are likely normally distributed.

Heteroskedasticity (H): 3.31

This test checks for variability in the residuals. Heteroskedasticity suggests varying residual variance over time. Prob(H) (two-sided): 0.11

The p-value for the heteroskedasticity test. A higher p-value suggests that the residuals' variance is not significantly varying.

Skew: 0.17

Measures the asymmetry of the residuals. Values close to 0 indicate a symmetrical distribution.

Kurtosis: 2.13

Measures the "tailedness" of the residuals. A value of 3 would indicate a normal distribution, with lower values indicating lighter tails and higher values indicating heavier tails.

Summary

In summary, the output gives insights into how well the ARIMA(0, 0, 1) model fits the data and whether the residuals (errors) from the model meet the assumptions of the model (e.g., normality, independence). The AIC, BIC, and HQIC help in comparing this model to others, while statistical tests evaluate the model's residuals.

In [4]:

### Example: A Simple sequence of numbers representing, for example, monthly sales

# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA

# Example data: Assume this is your time series data
data = [112, 118, 132, 129, 121, 135, 148, 148, 136, 119, 104, 118, 115, 126, 141, 135, 125, 149, 170, 170, 158, 133, 114, 140]

# Convert the list to a pandas Series
series = pd.Series(data)

# Plot the data
series.plot()
plt.title("Original Data")
plt.show()

Out[4]:

In [11]:

# Fit an MA model (ARIMA with p=0, d=0, q=1)
# Here, order=(0, 0, 1) means we are only using the Moving Average component
moving_Avg = ARIMA(series, order=(0, 0, 2))
moving_Avg_fit = moving_Avg.fit()

# Print the model summary
print(moving_Avg_fit.summary())

Out[11]:

                               SARIMAX Results                                
==============================================================================
Dep. Variable:                      y   No. Observations:                   24
Model:                 ARIMA(0, 0, 2)   Log Likelihood                 -91.648
Date:                Wed, 04 Sep 2024   AIC                            191.296
Time:                        15:56:38   BIC                            196.008
Sample:                             0   HQIC                           192.546
                                 - 24                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const        134.2272      6.232     21.537      0.000     122.012     146.442
ma.L1          1.3624      0.188      7.255      0.000       0.994       1.730
ma.L2          0.6649      0.197      3.375      0.001       0.279       1.051
sigma2       110.4585     39.532      2.794      0.005      32.978     187.939
===================================================================================
Ljung-Box (L1) (Q):                   0.10   Jarque-Bera (JB):                 0.46
Prob(Q):                              0.76   Prob(JB):                         0.79
Heteroskedasticity (H):               3.18   Skew:                             0.10
Prob(H) (two-sided):                  0.12   Kurtosis:                         2.35
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

In [13]:


# Forecast the next 5 values
forecast = moving_Avg_fit.forecast(steps=10)
print("Forecasted Values: ", forecast)

# Plot the forecasted values
series.plot(label='Original')
forecast.plot(label='Forecast', color='red')
plt.legend()
plt.title("Forecast Using MA Model")
plt.show()

Out[13]:

Forecasted Values:  24    160.217111
  148.859203
  134.227215
  134.227215
  134.227215
  134.227215
  134.227215
  134.227215
  134.227215
  134.227215
Name: predicted_mean, dtype: float64

Model: By fitting an MA(1) model (order=(0, 0, 3)), the forecast is based solely on the error term from the last observation.

Forecast: The output provides a prediction for the next 5 time steps, reflecting the pattern of past data.

Why Use Only MA?

Using only the MA component is appropriate when the time series data exhibits random fluctuations around a mean without strong trends or seasonal patterns. This model is simpler than a full ARIMA model and can be effective for short-term forecasting in such cases.

In [ ]: