Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
suyashi29
GitHub Repository: suyashi29/python-su
Path: blob/master/Time Forecasting using Python/1.2 Understanding Moving Average and Its Implementation.ipynb
3074 views
Kernel: Python 3 (ipykernel)

Moving average forecast is a simple and widely used method for time series forecasting. It works by taking the average of a subset of the most recent data points to make predictions for future values.

  • The moving average method is a simple yet effective technique used for smoothing time series data and identifying trends or patterns. It involves calculating the average of a fixed number of consecutive data points, referred to as the window size or the period.

image.png

  • The moving average method smooths out short-term fluctuations in the data and highlights long-term trends. It is commonly used for forecasting and trend analysis in various fields, including finance, economics, and signal processing.

how the moving average forecast typically works:

  • Select a Window Size: Decide on the number of previous data points (the window size) to include in the moving average calculation. This window size determines the smoothing effect of the forecast.

  • Calculate the Moving Average: For each time step, calculate the average of the data points within the selected window.

  • Make Predictions: Use the calculated moving average as the forecast for the next time step.

  • Repeat: As new data becomes available, update the moving average by including the latest data point and removing the oldest data point from the window. Then, repeat the process of making predictions.

  • Moving average forecast can be implemented using different types of moving averages, such as:

  • Simple Moving Average (SMA): This is the most basic form of moving average, where each data point in the window is given equal weight.

  • Weighted Moving Average (WMA): In WMA, different weights are assigned to each data point in the window. Usually, more recent data points are given higher weights.

  • Exponential Moving Average (EMA): EMA gives more weight to recent observations while still considering older data. It is calculated using a smoothing factor that exponentially decreases with time.

A simple example to show how Moving Average works

x=[5,7,9,6,10,8,12,11]

image.png

Calculating Moving Avearge using Python Libraries

##Example 1 # Get the list from the user user_input = input("Enter numbers separated by spaces: ") simple_list = list(map(float, user_input.split())) # Define the window size for the moving average window_size = int(input("Enter the window size for the moving average: ")) # Calculate the moving average moving_averages = [] for i in range(len(simple_list) - window_size + 1): window = simple_list[i:i + window_size] window_average = sum(window) / window_size moving_averages.append(window_average) # Display the results print(f"Original List: {simple_list}") print(f"Moving Averages: {moving_averages}")
Enter numbers separated by spaces: 22 24 26 28 55 45 35 35 46 Enter the window size for the moving average: 2 Original List: [22.0, 24.0, 26.0, 28.0, 55.0, 45.0, 35.0, 35.0, 46.0] Moving Averages: [23.0, 25.0, 27.0, 41.5, 50.0, 40.0, 35.0, 40.5]
### Example 2 # Get the list from the user user_input = input("Enter numbers separated by spaces: ") simple_list = list(map(float, user_input.split())) # Define the window size for the moving average window_size = int(input("Enter the window size for the moving average: ")) # Calculate the moving average using a more Pythonic way (list comprehension) moving_averages = [sum(simple_list[i:i + window_size]) / window_size for i in range(len(simple_list) - window_size + 1)] # Display the results print(f"Original List: {simple_list}") print(f"Moving Averages: {moving_averages}")
### ### The numpy.random.seed() function is used to set the seed for generating random numbers in NumPy. Setting the seed ensures reproducibility of the random numbers generated import numpy as np # Set the seed np.random.seed(42) # Generate 5 random numbers random_numbers = np.random.rand(5) print("Random numbers generated with seed 42:", random_numbers) # Reset the seed and generate the same random numbers again np.random.seed(42) random_numbers_again = np.random.rand(5) print("Random numbers generated again with seed 42:", random_numbers_again)
import numpy as np # List of integers data = np.array([5, 10, 15, 20, 25, 30, 35, 40, 45, 50]) # Window size window_size = 4 # Calculate the moving average moving_avg = np.convolve(data, np.ones(window_size)/window_size, mode='valid') print("Moving Average (Window Size 4):", moving_avg)
Moving Average (Window Size 4): [12.5 17.5 22.5 27.5 32.5 37.5 42.5]

Forecasting using Moving Average

Forecasting using a moving average in Python involves extending the concept of a moving average to predict future values based on past data.

# Example: Forecasting with Moving Average # Import necessary libraries import numpy as np # Get the list from the user user_input = input("Enter numbers separated by spaces: ") data = list(map(float, user_input.split())) # Define the window size for the moving average window_size = int(input("Enter the window size for the moving average: ")) # Calculate the moving averages moving_averages = [np.mean(data[i:i+window_size]) for i in range(len(data) - window_size + 1)] # Forecast the next value # The next forecasted value is the average of the last 'window_size' elements forecast = np.mean(data[-window_size:]) # Display the results print(f"Original Data: {data}") print(f"Moving Averages: {moving_averages}") print(f"Forecasted Next Value: {forecast}")

Example Walkthrough:

  • Input Data: Suppose you input the following data: 10 12 14 16 18 20

  • Window Size: You choose a window size of 3.

Moving Averages: The moving averages will be calculated as:

  • (10 + 12 + 14) / 3 = 12

  • (12 + 14 + 16) / 3 = 14

  • (14 + 16 + 18) / 3 = 16

  • (16 + 18 + 20) / 3 = 18

Forecast: The forecasted next value will be (18 + 20) / 2 = 19 (using a window size of 2 in this case).

A Basic workflow for time series analysis, including data generation, visualization, stationarity testing, differencing, model fitting, forecasting, and visualization of forecasted values.(ARIMA)

Fitting the MA Model:

image.png

Key Terms Explained:

  • Dependent Variable: y This indicates the variable being modeled (the time series data). No. Observations: 24

  • The number of data points used in the model. In this case, there are 24 observations in the time series data.

Model: ARIMA(0, 0, 1)

This specifies the model configuration: ARIMA(p, d, q): p: The number of autoregressive terms (AR). Here, it's 0. d: The number of differences to make the series stationary (differencing). Here, it's 0. q: The number of moving average terms (MA). Here, it's 1, which means the model includes a moving average component with one lag.

Log Likelihood: -94.595

This is a measure of how well the model fits the data. A higher (less negative) log likelihood indicates a better fit. It is used in calculating the information criteria.

AIC (Akaike Information Criterion): 195.189

AIC is a metric used to compare different models. It takes into account the model's goodness of fit and the number of parameters. Lower AIC values indicate a better model fit.

BIC (Bayesian Information Criterion): 198.724

Similar to AIC, BIC also measures model fit but with a greater penalty for models with more parameters. Lower BIC values suggest a better model.

HQIC (Hannan-Quinn Information Criterion): 196.127

HQIC is another criterion for model comparison, balancing fit and complexity. It typically falls between AIC and BIC in terms of penalty for the number of parameters.

Covariance Type: opg

This refers to the type of covariance matrix used for estimating the model's standard errors. "opg" stands for outer product of gradients.

Coefficients Table:

const (Constant Term): 133.3215 his is the estimated constant (intercept) of the model. It represents the baseline level of the series.

ma.L1 (Moving Average Term): 0.7537

This is the coefficient for the MA(1) term. It indicates the influence of the lagged forecast error on the current value.

sigma2 (Variance of the Residuals): 149.9109

This is the estimated variance of the model residuals (errors). It measures the variability of the forecast errors.

Statistical Tests

Ljung-Box (L1) (Q): 2.13

This test checks whether the residuals from the model are independent. A higher Q statistic suggests less independence.

Prob(Q): 0.14

The p-value for the Ljung-Box test. A higher p-value indicates that the residuals are likely independent, suggesting that the model has captured the time series structure well.

Jarque-Bera (JB): 0.87

A test for normality of residuals. It checks if the residuals are normally distributed. Prob(JB): 0.65 The p-value for the Jarque-Bera test. A higher p-value indicates that the residuals are likely normally distributed.

Heteroskedasticity (H): 3.31

This test checks for variability in the residuals. Heteroskedasticity suggests varying residual variance over time. Prob(H) (two-sided): 0.11

The p-value for the heteroskedasticity test. A higher p-value suggests that the residuals' variance is not significantly varying.

Skew: 0.17

Measures the asymmetry of the residuals. Values close to 0 indicate a symmetrical distribution.

Kurtosis: 2.13

Measures the "tailedness" of the residuals. A value of 3 would indicate a normal distribution, with lower values indicating lighter tails and higher values indicating heavier tails.

Summary

In summary, the output gives insights into how well the ARIMA(0, 0, 1) model fits the data and whether the residuals (errors) from the model meet the assumptions of the model (e.g., normality, independence). The AIC, BIC, and HQIC help in comparing this model to others, while statistical tests evaluate the model's residuals.

### Example: A Simple sequence of numbers representing, for example, monthly sales # Import necessary libraries import numpy as np import pandas as pd import matplotlib.pyplot as plt from statsmodels.tsa.arima.model import ARIMA # Example data: Assume this is your time series data data = [112, 118, 132, 129, 121, 135, 148, 148, 136, 119, 104, 118, 115, 126, 141, 135, 125, 149, 170, 170, 158, 133, 114, 140] # Convert the list to a pandas Series series = pd.Series(data) # Plot the data series.plot() plt.title("Original Data") plt.show()
Image in a Jupyter notebook
# Fit an MA model (ARIMA with p=0, d=0, q=1) # Here, order=(0, 0, 1) means we are only using the Moving Average component moving_Avg = ARIMA(series, order=(0, 0, 2)) moving_Avg_fit = moving_Avg.fit() # Print the model summary print(moving_Avg_fit.summary())
SARIMAX Results ============================================================================== Dep. Variable: y No. Observations: 24 Model: ARIMA(0, 0, 2) Log Likelihood -91.648 Date: Wed, 04 Sep 2024 AIC 191.296 Time: 15:56:38 BIC 196.008 Sample: 0 HQIC 192.546 - 24 Covariance Type: opg ============================================================================== coef std err z P>|z| [0.025 0.975] ------------------------------------------------------------------------------ const 134.2272 6.232 21.537 0.000 122.012 146.442 ma.L1 1.3624 0.188 7.255 0.000 0.994 1.730 ma.L2 0.6649 0.197 3.375 0.001 0.279 1.051 sigma2 110.4585 39.532 2.794 0.005 32.978 187.939 =================================================================================== Ljung-Box (L1) (Q): 0.10 Jarque-Bera (JB): 0.46 Prob(Q): 0.76 Prob(JB): 0.79 Heteroskedasticity (H): 3.18 Skew: 0.10 Prob(H) (two-sided): 0.12 Kurtosis: 2.35 =================================================================================== Warnings: [1] Covariance matrix calculated using the outer product of gradients (complex-step).
# Forecast the next 5 values forecast = moving_Avg_fit.forecast(steps=10) print("Forecasted Values: ", forecast) # Plot the forecasted values series.plot(label='Original') forecast.plot(label='Forecast', color='red') plt.legend() plt.title("Forecast Using MA Model") plt.show()
Forecasted Values: 24 160.217111 25 148.859203 26 134.227215 27 134.227215 28 134.227215 29 134.227215 30 134.227215 31 134.227215 32 134.227215 33 134.227215 Name: predicted_mean, dtype: float64
Image in a Jupyter notebook

Model: By fitting an MA(1) model (order=(0, 0, 3)), the forecast is based solely on the error term from the last observation.

Forecast: The output provides a prediction for the next 5 time steps, reflecting the pattern of past data.

Why Use Only MA?

Using only the MA component is appropriate when the time series data exhibits random fluctuations around a mean without strong trends or seasonal patterns. This model is simpler than a full ARIMA model and can be effective for short-term forecasting in such cases.