CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place.

| Download
Views: 3376
Image: ubuntu2004
Kernel: Python 3 (system-wide)

Risk measure estimation

Camilo Garcia Trillos 2020

In this notebook

In this notebook, we build a first complete application: estimating Value at Risk (at level 99%) and Expected Shortfall (at level 95%) for one stock (simplest case). We also look at backtesting.

#As usual, we start by importing the modules we require. These are: %matplotlib inline import numpy as np import pandas as pd import scipy.stats as stats import matplotlib.pyplot as plt from statsmodels.tsa.stattools import adfuller from numpy.random import normal

Acquire and clean data

We start by reading the data from the csv table 'AIG_20171201_15y.csv'. It corresponds to AIG, an insurance company.

Let read the data and perform some basic cleaning procedures:

  • Understand what are the entries of the data

  • Guarantee that the type of each entry corresponds to their meaning

  • Check if there are missing data

  • Choose an appropriate index

  • Make a visual inspection in search of abnormal observations

We start by reading and inspecting the data

aig_data = pd.read_csv('~/Data/AIG_20171201_15y.csv')
aig_data.head()
aig_data.tail()

Looks like it contains the same information as the APPLE database, but the database starts this time in january 2003. Let us check some summary statistics.

aig_data.describe()

First, we suspect by comparing the 'count' row, it is likely that there are no missing data. Then, looking at each of the summar statistics (mean,std,min,...) for open, high, low and close (and their respective adjusted versions), they seem rather close, which is a good sign of coherence in the information.

We can now turn our attention to the ifnormation withinto the content It looks that, different to apple, this share did not split but rather diminished the number of shares on the market (split ratio less than one).

Also, there are huge gaps on the prices, but unlike apple share this happens on the adjusted prices. Finaly, there are also big differences in the traded volume of shares.

These are things to take into account when looking at the plots.

Let us now see what are datatypes of the entries

aig_data.dtypes
ticker object date object open float64 high float64 low float64 close float64 volume float64 ex-dividend float64 split_ratio float64 adj_open float64 adj_high float64 adj_low float64 adj_close float64 adj_volume float64 dtype: object

Once again we need to fix the date entry type.

aig_data['date']=pd.to_datetime(aig_data['date'])
aig_data.dtypes
ticker object date datetime64[ns] open float64 high float64 low float64 close float64 volume float64 ex-dividend float64 split_ratio float64 adj_open float64 adj_high float64 adj_low float64 adj_close float64 adj_volume float64 dtype: object

We can now check we have no missing data, and that the date column is unique (in view of defining it as the index of the database).

aig_data.isnull().any()
ticker False date False open False high False low False close False volume False ex-dividend False split_ratio False adj_open False adj_high False adj_low False adj_close False adj_volume False dtype: bool
print( np.unique(aig_data['date']).shape, aig_data['date'].shape)
(3756,) (3756,)
aig_data.set_index('date', inplace=True)
aig_data.index
DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-06', '2003-01-07', '2003-01-08', '2003-01-09', '2003-01-10', '2003-01-13', '2003-01-14', '2003-01-15', ... '2017-11-17', '2017-11-20', '2017-11-21', '2017-11-22', '2017-11-24', '2017-11-27', '2017-11-28', '2017-11-29', '2017-11-30', '2017-12-01'], dtype='datetime64[ns]', name='date', length=3756, freq=None)

We are ready to make some plots: ake four separate plots

  • One with open, close, low, and high for each date

  • One with adj. open, close, low, and high for each date -One with split ratio per day

  • One with Adjusted volume per day

aig_data.loc[:,['open','close','high','low']].plot(alpha=0.5, title='AIG quote')
<AxesSubplot:title={'center':'AIG quote'}, xlabel='date'>
Image in a Jupyter notebook

Two very steep changes in a couple of days. Are they caused by splits? Before trying to answer, let us make a 'zoom' of this area

aux_data = aig_data.loc[(aig_data.index<'20091231') & (aig_data.index>'20080101'),:] aux_data[['open','close','high','low']].plot(alpha=0.5, title='AIG quote (2008 and 2009)')
<AxesSubplot:title={'center':'AIG quote (2008 and 2009)'}, xlabel='date'>
Image in a Jupyter notebook

The decreasing change is more progressive, although there is a strong decrease around August 2008. At the time the actual effects of the subprime crisis on AIG were being revealed.

The upward tendency occurs quite quickly starting from 2009-07. Now, leat us look at the splits

aig_data['split_ratio'].plot(title='AIG split ratio')
<AxesSubplot:title={'center':'AIG split ratio'}, xlabel='date'>
Image in a Jupyter notebook
aig_data.loc[:,'split_ratio'].idxmin()
Timestamp('2009-07-01 00:00:00')
aig_data['split_ratio'].min()
0.05

This correspons to the upward trend before. Unlike Apple, this share rather tah aplitting actually merged some stocks!

We can now check the adjusted prices

aig_data[['adj_open','adj_close','adj_high','adj_low']].plot(alpha=0.5, title='AIG adjusted prices')
<AxesSubplot:title={'center':'AIG adjusted prices'}, xlabel='date'>
Image in a Jupyter notebook
aux_data = aig_data.loc[(aig_data.index>'2009-01-01'),:] aux_data[['adj_open','adj_close','adj_high','adj_low']].plot(alpha=0.5, title='AIG adjusted prices (2009 onwards)')
<AxesSubplot:title={'center':'AIG adjusted prices (2009 onwards)'}, xlabel='date'>
Image in a Jupyter notebook

The picture is grim. The stock lost an important part of its value and has ever since recovered some ground (albeit negligible with respect to initial losses).

Finally, we look at the traded adjusted volume

aig_data['adj_volume'].plot()
<AxesSubplot:xlabel='date'>
Image in a Jupyter notebook
aig_data.loc[:,'adj_volume'].idxmax()
Timestamp('2009-07-01 00:00:00')
aig_data['adj_volume'].argmax()
1635
aig_data['adj_volume'].max()
816446000.0

The volume has a very large peak precisely on the date of the split. It is not unlikely, but if this number were relevant to us, we would need to find out what happened that day.

Risk measure estimation : one day

In the following, we will implement and compare the closed form and historical simualtion approaches for the estimation of value at risk of the changes in the AIG adjusted close price.

To simplify the exercise, we suppose that we want to estimate a one day value at risk. Moreover,we SUPPOSE that we are making the calculations on the 01/01/2008. Therefore, we will only consider the data base before this date.

Let us find another database that only takes these dates into consideration

sub_aig_data = aig_data[aig_data.index< '2008-01-01'] sub_aig_data = sub_aig_data['adj_close'] sub_aig_data.tail()
date 2007-12-24 925.172558 2007-12-26 918.077203 2007-12-27 894.785930 2007-12-28 893.706202 2007-12-31 899.259088 Name: adj_close, dtype: float64

Closed form

Remember that the key point to use a close form approximation is to find a suitable risk mapping where the risk factors are Gaussian.

Our candidate is to define as risk factor Ft+1:=log(St+1)log(St)F_{t+1}:= \log(S_{t+1})-\log(S_t)

Let us examine if

  1. This factor seems stationary

  2. This factor seems Gaussian

Stationarity

log_returns = np.log(sub_aig_data).diff() log_returns.head()
date 2003-01-02 NaN 2003-01-03 -0.002990 2003-01-06 0.033047 2003-01-07 -0.019664 2003-01-08 -0.012552 Name: adj_close, dtype: float64
log_returns.loc[~log_returns.isna()]
date 2003-01-03 -0.002990 2003-01-06 0.033047 2003-01-07 -0.019664 2003-01-08 -0.012552 2003-01-09 0.036391 ... 2007-12-24 0.027895 2007-12-26 -0.007699 2007-12-27 -0.025697 2007-12-28 -0.001207 2007-12-31 0.006194 Name: adj_close, Length: 1257, dtype: float64
log_returns.drop(log_returns.index[0],inplace=True) log_returns.head()
date 2003-01-03 -0.002990 2003-01-06 0.033047 2003-01-07 -0.019664 2003-01-08 -0.012552 2003-01-09 0.036391 Name: adj_close, dtype: float64
log_returns.plot()
<AxesSubplot:xlabel='date'>
Image in a Jupyter notebook

Let us also look at variances and covariances within one month periods.

log_returns.resample('60d').mean().head()
date 2003-01-03 -0.005522 2003-03-04 0.004358 2003-05-03 -0.000936 2003-07-02 0.001668 2003-08-31 0.000551 Freq: 60D, Name: adj_close, dtype: float64
log_returns.resample('60d').mean().plot(title='Mean for subsamples of size 60 days')
<AxesSubplot:title={'center':'Mean for subsamples of size 60 days'}, xlabel='date'>
Image in a Jupyter notebook

The mean of each subsample seems to be rather random around a costant mean value. This is not agians stationarity. Let us test the standard deviation as well

log_returns.resample('60d').std().plot(title='Std. Dev for subsamples of size 60 days')
<AxesSubplot:title={'center':'Std. Dev for subsamples of size 60 days'}, xlabel='date'>
Image in a Jupyter notebook

This plot seems slighly less random, but still reasonable.

autocorr_vect = [log_returns.autocorr(lag=_) for _ in range(1,30)] plt.plot(range(1,30),autocorr_vect) plt.title('Estimated autocorreletation vs. lag') plt.xlabel('lag') plt.ylabel('autocorrelation')
Text(0, 0.5, 'autocorrelation')
Image in a Jupyter notebook

Very small estimated autocorrelation. Finally, let us check the augmented Dickey-Fueller test.

adfuller(log_returns)[1]
0.0

All in all, we cannot seem to have enough evidence agains stationnarity. How about normality?

Normality

numbins = int(1+ np.log2(log_returns.count())) log_returns.hist(bins = numbins, density=True) mu_log = np.mean(log_returns) sigma_log = np.std(log_returns) x_ticks = np.linspace(min(log_returns), max(log_returns),100) plt.plot(x_ticks, 1./((2.*np.pi)**0.5 *sigma_log)*np.exp(-((x_ticks - mu_log)/sigma_log)**2/2), 'r') plt.title('Histogram of log-returns and Gaussian with the same mean and variance')
Text(0.5, 1.0, 'Histogram of log-returns and Gaussian with the same mean and variance')
Image in a Jupyter notebook

Visually, the fit is not clear, although the divergences are not extraordiary. Let us look at another tool to check this

stats.probplot(log_returns,dist='norm', plot=plt)
((array([-3.2629588 , -3.00266919, -2.857728 , ..., 2.857728 , 3.00266919, 3.2629588 ]), array([-0.11019879, -0.08391532, -0.08365994, ..., 0.05826549, 0.0649579 , 0.07269368])), (0.014012626662414988, 1.0022667870343473e-05, 0.9547558083408232))
Image in a Jupyter notebook
stats.normaltest(log_returns)
NormaltestResult(statistic=237.92592871947468, pvalue=2.162922844964407e-52)

So the p-value is very small. Yet another indication against normality.

This means we cannot assume full normality of the whole series.

Despite the above, for the sake of comparison, let us see what the risk measures would be if assuming normality. Recall from the lecture that the delta approximation gives

ΔVk+1VkFk+1\Delta V_{k+1} \approx V_k F_{k+1}

and so the formulas for the risk measures are

VaR(ΔVk+1)=VkσΦ(α)Vkμk{\rm VaR}(\Delta V_{k+1}) = V_k\sigma \Phi(\alpha) - V_k \mu_kES(ΔVk+1)=Vkσϕ(Φ(α))1αVkμk{\rm ES}(\Delta V_{k+1}) = V_k\sigma \frac{\phi(\Phi(\alpha))}{1-\alpha} - V_k \mu_k
sub_aig_data.tail()
date 2007-12-24 925.172558 2007-12-26 918.077203 2007-12-27 894.785930 2007-12-28 893.706202 2007-12-31 899.259088 Name: adj_close, dtype: float64
alpha_var=0.99 alpha_es=0.975 VaR_CF =sub_aig_data.iloc[-1]*( -mu_log + sigma_log*stats.norm.ppf(alpha_var)) ES_CF =sub_aig_data.iloc[-1]*( -mu_log + sigma_log*stats.norm.pdf(stats.norm.ppf(alpha_es))/(1-alpha_es)) print('CLOSED FORM: ') print('Each unit of stock of AIG with closing date ', sub_aig_data.index[-1], ' and closing (adjusted) value of ', sub_aig_data.iloc[-1], ' has for one day a VaR at level ', alpha_var, ' of ', VaR_CF, ' and a one day ES at level ', alpha_es, ' of ', ES_CF,'.')
CLOSED FORM: Each unit of stock of AIG with closing date 2007-12-31 00:00:00 and closing (adjusted) value of 899.2590883868303 has for one day a VaR at level 0.99 of 30.624608057569294 and a one day ES at level 0.975 of 30.775447760353096 .

Historical simulation: one day

We have already tested stationarity, we can calculate directly the historical simulation of one day of value at risk and expected shortfall at the assumed levels.

To do that, we first calculate what would be the P&L between 'today' and 'tomorrow' (that is from 2007/12/31 to 2008/01/01) using as possible outcomes the past values of the log-returns. The formula in this case is simply

ΔVk+1i=Vk(exp(Fki)1)\Delta V_{k+1}^{i} = V_k*(\exp(F_{k-i})-1)
hs_prices = sub_aig_data.iloc[-1] * (np.exp(log_returns)-1)
hs_prices.head()
date 2003-01-03 -2.684355 2003-01-06 30.214627 2003-01-07 -17.510517 2003-01-08 -11.216755 2003-01-09 33.328033 Name: adj_close, dtype: float64

Let us look quickly at the histogram of possible changes

hs_prices.hist(bins=numbins)
<AxesSubplot:>
Image in a Jupyter notebook

Now we calculate the estimators for the position in value at risk and expected shortfall. We first order the samples and choose one of the estimators we saw in class. I will simply take the '+' value at risk and its corresponding expected shortfall. Please remember that observations start at zero

hs_prices.sort_values(inplace=True) hs_prices.head()
date 2004-10-14 -93.832229 2005-04-01 -72.382161 2003-01-24 -72.170966 2007-11-07 -60.143839 2003-02-04 -58.997117 Name: adj_close, dtype: float64
ind_var = int( hs_prices.size * (1-alpha_var))-1 ind_es = int( hs_prices.size * (1-alpha_es))-1 print(ind_var,ind_es)
11 30
VaR_HS = -1*hs_prices.iloc[ind_var] ES_HS = -1*hs_prices.iloc[:ind_es+1].mean() print('Historical Simulation: ') print('Each unit of stock of AIG with closing date ', sub_aig_data.index[-1], ' and closing (adjusted) value of ', sub_aig_data.iloc[-1], ' has for one day a VaR at level ', alpha_var, ' of ', VaR_HS, ' and a one day ES at level ', alpha_es, ' of ', ES_HS,'.')
Historical Simulation: Each unit of stock of AIG with closing date 2007-12-31 00:00:00 and closing (adjusted) value of 899.2590883868303 has for one day a VaR at level 0.99 of 44.16004451899178 and a one day ES at level 0.975 of 40.86661378600221 .

Clearly these values represent a significative increase of 44% and 32.7% !

print('Increase (in %) w.r.t. close form: VaR:',100*(VaR_HS/VaR_CF-1) , 'ES:',100*(ES_HS/ES_CF-1))
Increase (in %) w.r.t. close form: VaR: 44.197909197655896 ES: 32.78966435916231