Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
packtpublishing
GitHub Repository: packtpublishing/machine-learning-for-algorithmic-trading-second-edition
Path: blob/master/10_bayesian_machine_learning/05_stochastic_volatility.ipynb
2923 views
Kernel: Python 3

Stochastic Volatility model

Imports & Settings

import warnings warnings.filterwarnings('ignore')
%matplotlib inline from pathlib import Path import numpy as np import pandas as pd import matplotlib.pyplot as plt from matplotlib.ticker import FuncFormatter import seaborn as sns import pymc3 as pm import arviz from pymc3.distributions.timeseries import GaussianRandomWalk
sns.set_style('whitegrid') # model_path = Path('models')

Model assumptions

Asset prices have time-varying volatility (variance of day over day returns). In some periods, returns are highly variable, while in others very stable. Stochastic volatility models model this with a latent volatility variable, modeled as a stochastic process. The following model is similar to the one described in the No-U-Turn Sampler paper, Hoffman (2011) p21.

σExponential(50)νExponential(.1)siNormal(si1,σ2)log(ri)t(ν,0,exp(2si))\begin{align*} \sigma &\sim \text{Exponential}(50)\\ \nu &\sim \text{Exponential}(.1)\\ s_i &\sim \text{Normal}(s_{i-1}, \sigma^{-2})\\ \log(r_i) &\sim t(\nu, 0, \exp(-2 s_i)) \end{align*}

Here, rr is the daily return series and ss is the latent log volatility process.

Get Return Data

First we load some daily returns of the S&P 500.

prices = pd.read_hdf('../data/assets.h5', key='sp500/stooq').loc['2000':, 'close'] log_returns = np.log(prices).diff().dropna()
ax = log_returns.plot(figsize=(15, 4), title='S&P 500 | Daily Log Returns', rot=0) ax.yaxis.set_major_formatter(FuncFormatter(lambda y, _: '{:.0%}'.format(y))) sns.despine() plt.tight_layout();
Image in a Jupyter notebook

As you can see, the volatility seems to change over time quite a bit while clustering around certain time-periods, most notably the 2009 financial crash.

Specify Model in PyMC3

Specifying the model in PyMC3 mirrors its statistical specification.

with pm.Model() as model: step_size = pm.Exponential('sigma', 50.) s = GaussianRandomWalk('s', sd=step_size, shape=len(log_returns)) nu = pm.Exponential('nu', .1) r = pm.StudentT('r', nu=nu, lam=pm.math.exp(-2*s), observed=log_returns)
pm.model_to_graphviz(model)
Image in a Jupyter notebook

Fit Model

For this model, the full maximum a posteriori (MAP) point is degenerate and has infinite density. NUTS, however, gives the correct posterior.

with model: trace = pm.sample(tune=2000, draws=5000, chains=4, cores=1, target_accept=.9)
Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Sequential sampling (4 chains in 1 job) NUTS: [nu, s, sigma]
Sampling 4 chains for 2_000 tune and 5_000 draw iterations (8_000 + 20_000 draws total) took 4590 seconds. The estimated number of effective samples is smaller than 200 for some parameters.

Optionally, persist result as pickle:

# with open('model_vol.pkl', 'wb') as buff: # pickle.dump({'model': model, 'trace': trace}, buff)

Evaluate results

Trace Plot

arviz.plot_trace(trace, var_names=['sigma', 'nu']);
Image in a Jupyter notebook

Looking at the returns over time and overlaying the estimated standard deviation we can see how the model tracks the volatility over time.

In-Sample Predictions

pm.trace_to_dataframe(trace).info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 20000 entries, 0 to 19999 Columns: 5032 entries, s__0 to nu dtypes: float64(5032) memory usage: 767.8 MB
fig, ax = plt.subplots(figsize=(15, 5)) log_returns.plot(ax=ax, lw=.5, xlim=('2000', '2020'), rot=0, title='In-Sample Fit of Stochastic Volatility Model') ax.plot(log_returns.index, np.exp(trace[s]).T, 'r', alpha=.03, lw=.5) ax.set(xlabel='Time', ylabel='Returns') ax.legend(['S&P 500 (log returns)', 'Stochastic Volatility Model']) ax.yaxis.set_major_formatter(FuncFormatter(lambda y, _: '{:.0%}'.format(y))) sns.despine() fig.tight_layout()
Image in a Jupyter notebook