Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.

Project: Ying Jia - MATH0094-2020/2021

Path: Notebooks / Week 8 / 9. Mean-variance frontier with market data.ipynb

Views: ³⁴⁵²
Image: ubuntu2004

Kernel: Python 3 (system-wide)

Mean variance frontier with market data

** Camilo A. Garcia Trillos 2020**

In this notebook

We revisit the mean-variance frontier construction, this time using market data. We use a sub-sample of data, belonging to a selection of stocks in the market.

In [0]:

In [1]:

%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as stats
from math import ceil

Data cleaning

We start by reading in data

In [2]:

data = pd.read_csv('~/Data/basket_20171201_10y.csv')

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-2-14adc80f4979> in <module>
----> 1 data = pd.read_csv('~/Data/basket_20171201_10y.csv')

/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
    686     )
    687 
--> 688     return _read(filepath_or_buffer, kwds)
    689 
    690 
/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    452 
    453     # Create the parser.
--> 454     parser = TextFileReader(fp_or_buf, **kwds)
    455 
    456     if chunksize or iterator:
/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
    946             self.options["has_index_names"] = kwds["has_index_names"]
    947 
--> 948         self._make_engine(self.engine)
    949 
    950     def close(self):
/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py in _make_engine(self, engine)
   1178     def _make_engine(self, engine="c"):
   1179         if engine == "c":
-> 1180             self._engine = CParserWrapper(self.f, **self.options)
   1181         else:
   1182             if engine == "python":
/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   2008         kwds["usecols"] = self.usecols
   2009 
-> 2010         self._reader = parsers.TextReader(src, **kwds)
   2011         self.unnamed_cols = self._reader.unnamed_cols
   2012 
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()
FileNotFoundError: [Errno 2] No such file or directory: '/home/user/Data/basket_20171201_10y.csv'

This corresponds to 10 years of daily information for a selection of assets.

First of all, we need to summarise, understand, visualize and clean the data. Let us look first at the different entries and their types:

In [14]:

data.dtypes

ticker          object
date            object
open           float64
high           float64
low            float64
close          float64
volume         float64
ex-dividend    float64
split_ratio    float64
adj_open       float64
adj_high       float64
adj_low        float64
adj_close      float64
adj_volume     float64
dtype: object

Note that once again the 'date' field is not recognised as a data but as an object. Let us have a look at how it looks before we decide to transform it

In [4]:

data.head()

It does look like a date with the format 'yyy-mm-dd'. We can then transform this field and update it.

In [15]:

data['date']=pd.to_datetime(data['date'], yearfirst=True)

Let us check that it looks the same, but that the field is now a date

In [16]:

data.head()

In [8]:

data.dtypes

ticker                 object
date           datetime64[ns]
open                  float64
high                  float64
low                   float64
close                 float64
volume                float64
ex-dividend           float64
split_ratio           float64
adj_open              float64
adj_high              float64
adj_low               float64
adj_close             float64
adj_volume            float64
dtype: object

Let su check if elements are unique per date.

In [9]:

print( len(np.unique(data['date'])),  len(data['date']))

2749 24741

Not equal... because we have several indices. So this time the index should be a combination of ticker and date. However, since we are going to work fron now on only with adjusted_close prices, an alternative is to "pivot" the table, so that the 'ticker', which identifies the asset, becomes now another entry.

In [10]:

P = data.pivot(index='date', columns='ticker', values='adj_close')
P.head()

The dataframe 'P' has only information of the 'adjusted close price' for each date for each one of the companies that are in the database, namely BAC (Bank of America), C (Citybank), GE (General Electric), JNJ (Johnson and Johnson), MSFT (Microsoft), PG (Proctor & Gamble), T (AT&T), WMT (Wallmart), and XOM (Exxon Mobile). All of these are large companies from different economic sectors including finance, technology, energy, retail and pharmaceutical.

Let us check that we did not loose any data when creating our pivot table.

In [17]:

P.isnull().any()

ticker
BAC     False
C       False
GE      False
JNJ     False
MSFT    False
PG      False
T       False
WMT     False
XOM     False
dtype: bool

No null data, so all fields are complete. Now, let us check that the data makes sense visually.

In [18]:

P.plot()

<AxesSubplot:xlabel='date'>

That 'C' line look suspicious... until we realise it belongs to Citygroup: a huge drop in between 2008-2010 is not so starnge if we think about the financial crisis of the time.

Efficient frontier calculation

We are now going to build the efficient frontier in this reduced portfolio.

Using one week as your period, find the mean_variance frontier if we assume that these are all the stocks on the market, and only using information before 01/01/2012.
We find the minimal variance portfolio (note there is no risk free asset).

Building the log-returns

Let us now resample using only weekly data...

In [19]:

P=P.resample('W-FRI').last()
P.head()

In [20]:

type(P)

pandas.core.frame.DataFrame

Replot to check we did not do anything wrong

In [21]:

P.plot()

<AxesSubplot:xlabel='date'>

Good. Now, let us calculate both the returns and the log-returns.

In [23]:

logR=np.log(P).diff()
logR.drop(logR.index[0],inplace=True) # We drop the first line which produces NAN
logR.head()

In [26]:

logR.head()

In [27]:

logR.plot()

<AxesSubplot:xlabel='date'>

Remember that negative results of log-returns means returns lower then 1 (i.e. effective losses)

Stationarity check

How stationary are the data? Let us look first at the log-returns. I will simply plot the mean and variance with a rolling window of 10 observations.

In [33]:

logR.rolling(20).mean().plot(alpha=0.5)
plt.title('Log returns')

Text(0.5, 1.0, 'Log returns')

These values seem to be significantly bigger initially. What about standard deviation?

In [34]:

logR.rolling(20).std().plot(alpha=0.5)

<AxesSubplot:xlabel='date'>

It looks that there is a significant ``change of regime'' around 2010. It is of course associated with the results of the financial crisis. We are going to consider first the mean variance using the entire history from 2007 to 2010 and then compare with the result from 2010 on.

Note: In class we worked with mean-varaince analysis of returns. It is often convenient to work with log-retunrs instead: on the one hand, log-returns are usually closer to sattionary. On the other hand, they tend to look closer to Gaussian

Mean-variance frontier construction

We start by calculating, the mean and variance for the selected history on the first case

In [35]:

logR_sub = logR[logR.index< '2012-01-01']
mu = logR_sub.mean()
Sigma = logR_sub.cov()
Rho = logR_sub.corr()

print('Mean:', mu)
print('Variances', np.diag(Sigma))
print('Var-cov:', Sigma)
print('Correlation', Rho)

Mean: ticker
BAC    -0.008158
C      -0.011287
GE     -0.002101
JNJ     0.000542
MSFT   -0.000119
PG      0.000720
T       0.000589
WMT     0.001307
XOM     0.000983
dtype: float64
Variances [0.01065754 0.01643106 0.00302104 0.00060041 0.00164302 0.00067786
 0.00114917 0.0008024  0.00119484]
Var-cov: ticker       BAC         C        GE       JNJ      MSFT        PG         T  \
ticker                                                                         
BAC     0.010658  0.010821  0.003996  0.001206  0.001436  0.001176  0.001680   
C       0.010821  0.016431  0.004706  0.001277  0.001674  0.001060  0.002169   
GE      0.003996  0.004706  0.003021  0.000551  0.000855  0.000614  0.000794   
JNJ     0.001206  0.001277  0.000551  0.000600  0.000493  0.000410  0.000463   
MSFT    0.001436  0.001674  0.000855  0.000493  0.001643  0.000460  0.000587   
PG      0.001176  0.001060  0.000614  0.000410  0.000460  0.000678  0.000492   
T       0.001680  0.002169  0.000794  0.000463  0.000587  0.000492  0.001149   
WMT     0.001000  0.000953  0.000495  0.000375  0.000495  0.000437  0.000505   
XOM     0.001490  0.001562  0.000762  0.000490  0.000745  0.000483  0.000675   

ticker       WMT       XOM  
ticker                      
BAC     0.001000  0.001490  
C       0.000953  0.001562  
GE      0.000495  0.000762  
JNJ     0.000375  0.000490  
MSFT    0.000495  0.000745  
PG      0.000437  0.000483  
T       0.000505  0.000675  
WMT     0.000802  0.000479  
XOM     0.000479  0.001195  
Correlation ticker       BAC         C        GE       JNJ      MSFT        PG         T  \
ticker                                                                         
BAC     1.000000  0.817741  0.704303  0.476717  0.343086  0.437369  0.480161   
C       0.817741  1.000000  0.668015  0.406486  0.322244  0.317525  0.499051   
GE      0.704303  0.668015  1.000000  0.409441  0.383542  0.428870  0.426370   
JNJ     0.476717  0.406486  0.409441  1.000000  0.496616  0.642216  0.557587   
MSFT    0.343086  0.322244  0.383542  0.496616  1.000000  0.435775  0.427469   
PG      0.437369  0.317525  0.428870  0.642216  0.435775  1.000000  0.557387   
T       0.480161  0.499051  0.426370  0.557587  0.427469  0.557387  1.000000   
WMT     0.341866  0.262506  0.318131  0.540748  0.431313  0.592972  0.526187   
XOM     0.417678  0.352628  0.401172  0.579001  0.531398  0.536973  0.575743   

ticker       WMT       XOM  
ticker                      
BAC     0.341866  0.417678  
C       0.262506  0.352628  
GE      0.318131  0.401172  
JNJ     0.540748  0.579001  
MSFT    0.431313  0.531398  
PG      0.592972  0.536973  
T       0.526187  0.575743  
WMT     1.000000  0.489380  
XOM     0.489380  1.000000  

Correlation are positive as expected... means seem reasonable.

Before calculating the optimal portfolio, we transform the variable Sigma in a numpy matrix and calculate its condition number to establish the stability of its inverse.

In [36]:

Sigma=Sigma.values
np.linalg.cond(Sigma)

128.7153892613439

It is a large number, but still OK. We do not need to stabilise the inverse.

Now, we are going to re-define the Markowitz weights function to find the mean-variance frontier.

In [37]:

def Markowitz_weights(Sigma, mean_returns, mu_p):
    S_inv = np.linalg.inv(Sigma)
    pi_mu = S_inv @ (mean_returns)/sum(S_inv @ mean_returns)
    pi_1 = np.sum(S_inv, axis = 1) / sum(np.sum(S_inv, axis = 1) )
    lambda_demoninator = (mean_returns @ pi_mu) - (mean_returns@pi_1)
    ll = np.array((mu_p - (mean_returns @ pi_1))/lambda_demoninator)
    # previous line: to convert into array in case that mu_p is a single number
    ll.shape=(ll.size,1)
    return pi_mu * ll + pi_1 * (1-ll)

In [41]:

# Using the function: note that Sigma and mean are Series, so we turn them into arrays by using the property 'values'
print(mu)
print('======')
#mup= np.linspace(0.999,1.003) 
mup= np.linspace(-0.002,0.004)
mvf_portfolios = Markowitz_weights(Sigma, mu,mup )
print(mvf_portfolios[0:3])

ticker
BAC    -0.008158
C      -0.011287
GE     -0.002101
JNJ     0.000542
MSFT   -0.000119
PG      0.000720
T       0.000589
WMT     0.001307
XOM     0.000983
dtype: float64
======
[[ 0.04274296  0.15931801  0.02639219  0.59758119  0.20169459  0.41534182
  -0.14901258 -0.06063188 -0.23342629]
 [ 0.03794259  0.15316594  0.02884724  0.59361264  0.19470381  0.40944844
  -0.14098252 -0.05195897 -0.22477917]
 [ 0.03314222  0.14701386  0.03130228  0.5896441   0.18771302  0.40355506
  -0.13295245 -0.04328606 -0.21613205]]

Let us calculate the means and variances of those portfolios.

In [43]:

mu_mvf = mvf_portfolios @ mu
sd_mvf = np.diag((( mvf_portfolios @ Sigma) @ mvf_portfolios.T ))**0.5

We can also calculate the portfolio with the (quasi-) minimal variance portfolio

In [44]:

sigma_inv = np.linalg.inv(Sigma)
ones_vec = np.ones(Sigma.shape[0])
min_var_p =  sigma_inv@ones_vec
min_var_p/=min_var_p.sum()
plt.bar(R.columns, min_var_p)
print(R.columns)
print(min_var_p)

Index(['BAC', 'C', 'GE', 'JNJ', 'MSFT', 'PG', 'T', 'WMT', 'XOM'], dtype='object', name='ticker')
[-0.09416917 -0.01614631  0.0964129   0.48439373  0.00230924  0.24725583
  0.08001433  0.18672968  0.01319977]

So that the minimal variance portfolio invests mostly on health and consumption (Johnson and Johnson, Proctor and Gamble, and Wallmart). Note that this is a risky strategy because it is too centered on one type of asset.

Let us check that everything is correct. We plot our surface and the minimum variance portfolio. To do so, we generate a bunch of random portfolios to visually verify that our calculated values are right.

In [45]:

#Generating random portfolios
random_portfolios = 2.*np.random.rand(1000,Sigma.shape[0]) -1
random_portfolios/= random_portfolios.sum(axis=1)[..., np.newaxis]
rand_mean = np.dot(random_portfolios, mu)
rand_sd = np.diag((random_portfolios@Sigma)@random_portfolios.T)**0.5
rand_sd

#Calculating mean and variance of minimal variance portfolio
min_var_mean = np.dot(min_var_p, mu)
min_var_sd = min_var_p.dot(Sigma).dot(min_var_p.T)**0.5
print(min_var_sd, min_var_mean)
max_sd_plot=0.06

#Making the plot
plt.plot(rand_sd[rand_sd <max_sd_plot], rand_mean[rand_sd < max_sd_plot], 'b.')
plt.plot(sd_mvf, mu_mvf, 'go')
plt.plot(min_var_sd, min_var_mean, 'rx')

0.02063705581048089 0.001492388376839055

[<matplotlib.lines.Line2D at 0x7f2c334cfd00>]

Stability in time

What happens if we consider only recent history? If we had perfect stationarity, the optimal should still be very close to the one we found in the previous part. Let us make some calculations

In [46]:

logR_sub2 = logR[(logR.index< '2012-01-01') & (logR.index> '2010-01-01')]
mu2 = logR_sub2.mean().values
Sigma2 = logR_sub2.cov().values
Rho2 = logR_sub2.corr().values

print('Mean:', mu2)
print('Variances', np.diag(Sigma2))
print('Var-cov:', Sigma2)
print('Correlation', Rho2)

mvf_portfolios2 = Markowitz_weights(Sigma2, mu2, mup)
mu2_mvf = mvf_portfolios2 @ mu2
sd2_mvf = np.diag((( mvf_portfolios2 @ Sigma2) @ mvf_portfolios2.T ))**0.5

print(np.linalg.cond(Sigma2))
sigma2_inv = np.linalg.inv(Sigma2)
ones_vec = np.ones(Sigma2.shape[0])
min_var_p2 =  sigma2_inv@ones_vec
min_var_p2/=min_var_p2.sum()
plt.bar(R.columns, min_var_p2)
print(R.columns)
print(min_var_p2)

Mean: [-0.00950856 -0.00219951  0.00220996  0.00085082 -0.00109434  0.00151515
  0.00188769  0.00155423  0.00257817]
Variances [0.00311498 0.00377738 0.00168709 0.00036972 0.00095914 0.00035433
 0.00046654 0.00036697 0.00091758]
Var-cov: [[0.00311498 0.00284224 0.00164264 0.00047208 0.00086103 0.00042768
  0.00067589 0.00043497 0.00107392]
 [0.00284224 0.00377738 0.00180427 0.00052624 0.00108874 0.00050054
  0.00083692 0.00047916 0.001286  ]
 [0.00164264 0.00180427 0.00168709 0.00041486 0.00086875 0.00044694
  0.00058337 0.00046338 0.00093544]
 [0.00047208 0.00052624 0.00041486 0.00036972 0.00028924 0.00019963
  0.00021918 0.00016211 0.00036518]
 [0.00086103 0.00108874 0.00086875 0.00028924 0.00095914 0.00029394
  0.00037959 0.00031198 0.00062288]
 [0.00042768 0.00050054 0.00044694 0.00019963 0.00029394 0.00035433
  0.00023336 0.00021372 0.00033215]
 [0.00067589 0.00083692 0.00058337 0.00021918 0.00037959 0.00023336
  0.00046654 0.00022388 0.00044079]
 [0.00043497 0.00047916 0.00046338 0.00016211 0.00031198 0.00021372
  0.00022388 0.00036697 0.00031368]
 [0.00107392 0.001286   0.00093544 0.00036518 0.00062288 0.00033215
  0.00044079 0.00031368 0.00091758]]
Correlation [[1.         0.82858805 0.71655018 0.43989882 0.49813507 0.40708948
  0.56066372 0.40683055 0.63521888]
 [0.82858805 1.         0.71472261 0.44530459 0.57199022 0.43265766
  0.63044506 0.40697403 0.6907519 ]
 [0.71655018 0.71472261 1.         0.52529064 0.68294018 0.57806134
  0.65755684 0.58891775 0.75184149]
 [0.43989882 0.44530459 0.52529064 1.         0.48572049 0.55154699
  0.52774308 0.44010584 0.62698097]
 [0.49813507 0.57199022 0.68294018 0.48572049 1.         0.50421683
  0.56745031 0.52585777 0.66395411]
 [0.40708948 0.43265766 0.57806134 0.55154699 0.50421683 1.
  0.57396207 0.59268311 0.58251898]
 [0.56066372 0.63044506 0.65755684 0.52774308 0.56745031 0.57396207
  1.         0.54107775 0.6736912 ]
 [0.40683055 0.40697403 0.58891775 0.44010584 0.52585777 0.59268311
  0.54107775 1.         0.54057532]
 [0.63521888 0.6907519  0.75184149 0.62698097 0.66395411 0.58251898
  0.6736912  0.54057532 1.        ]]
66.18958930410454
Index(['BAC', 'C', 'GE', 'JNJ', 'MSFT', 'PG', 'T', 'WMT', 'XOM'], dtype='object', name='ticker')
[ 0.02480495 -0.06254287 -0.17280838  0.35353751  0.04500131  0.26564107
  0.27430629  0.34214459 -0.07008448]

The first big change is that in the minimal variance portfolio, there is no longer room for a positive position in GE or Exxon Mobile. On the other hand Bank of America makes its comeback: Note also the reduction in exposition to JNJ.

In [47]:

#Generating random portfolios
random_portfolios2 = 2.*np.random.rand(1000,Sigma2.shape[0]) -1
random_portfolios2/= random_portfolios2.sum(axis=1)[..., np.newaxis]
rand_mean2 = np.dot(random_portfolios2, mu2)
rand_sd2 = np.diag((random_portfolios2@Sigma2)@random_portfolios2.T)**0.5

#Calculating mean and variance of minimal varaince portfolio
min_var_mean2 = np.dot(min_var_p2, mu2)
min_var_sd2 = min_var_p.dot(Sigma2).dot(min_var_p2.T)**0.5
print(min_var_sd2, min_var_mean2)
max_sd_plot=0.06

#Making the plot
plt.plot(rand_sd2[rand_sd2 <max_sd_plot], rand_mean2[rand_sd2 < max_sd_plot], 'b.')
plt.plot(sd2_mvf, mu2_mvf, 'go')
plt.plot(min_var_sd2, min_var_mean2, 'rx')

0.013923096082564245 0.0010427265331609799

[<matplotlib.lines.Line2D at 0x7f2c3342a2e0>]

Note that with this data, the mean variance frontier achieves the same returns with much smaller variance.

How does the efficient portfolio calculated with the data of the first part do on this second part?

In [48]:

#Making the plot

mu1in2_mvf = mvf_portfolios @ mu2
sd1in2_mvf = np.diag((( mvf_portfolios @ Sigma2) @ mvf_portfolios.T ))**0.5



plt.plot(rand_sd2[rand_sd2 <max_sd_plot], rand_mean2[rand_sd2 < max_sd_plot], 'b.')
plt.plot(sd2_mvf, mu2_mvf, 'go')
plt.plot(sd1in2_mvf, mu1in2_mvf, 'y*')
plt.plot(min_var_sd2, min_var_mean2, 'rx')

[<matplotlib.lines.Line2D at 0x7f2c33382c40>]

We can see that the efficient part of the frontier is not doing that bad.

Let us move forward in time a couple of years and evaluate how these two portfolios would do in the period 2015-.

In [49]:

R_sub3 = R[(R.index> '2015-01-01')]
mu3 = R_sub3.mean().values
Sigma3 = R_sub3.cov().values
Rho3 = R_sub3.corr().values

print('Mean:', mu3)
print('Variances', np.diag(Sigma3))
print('Var-cov:', Sigma3)
print('Correlation', Rho3)

mvf_portfolios3 = Markowitz_weights(Sigma3, mu3, mup)
mu3_mvf = mvf_portfolios3 @ mu3
sd3_mvf = np.diag((( mvf_portfolios3 @ Sigma3) @ mvf_portfolios3.T ))**0.5

print(np.linalg.cond(Sigma3))
sigma3_inv = np.linalg.inv(Sigma3)
ones_vec = np.ones(Sigma3.shape[0])
min_var_p3 =  sigma3_inv@ones_vec
min_var_p3/=min_var_p3.sum()
plt.bar(R.columns, min_var_p3)
print(R.columns)
print(min_var_p3)

Mean: [ 3.17089980e-03  2.31414528e-03 -1.79822463e-03  2.41587099e-03
  4.18994786e-03  4.04470847e-04  1.44399790e-03  1.23922279e-03
 -2.76520721e-05]
Variances [0.00148144 0.00116645 0.00096945 0.00032701 0.00095902 0.00035725
 0.00046562 0.00068835 0.00048457]
Var-cov: [[1.48144347e-03 1.16573124e-03 4.35206897e-04 1.81572602e-04
  4.49598304e-04 7.74107987e-05 1.37466132e-04 1.28871207e-04
  3.04613149e-04]
 [1.16573124e-03 1.16644598e-03 4.71421081e-04 1.88652743e-04
  4.15244861e-04 1.25625235e-04 1.65589075e-04 7.98007726e-05
  2.87273708e-04]
 [4.35206897e-04 4.71421081e-04 9.69450039e-04 2.12555959e-04
  2.65003065e-04 1.66592903e-04 2.40478405e-04 8.25903973e-06
  2.70597336e-04]
 [1.81572602e-04 1.88652743e-04 2.12555959e-04 3.27014030e-04
  2.42208339e-04 1.52696242e-04 1.02605747e-04 1.18529988e-04
  1.36443676e-04]
 [4.49598304e-04 4.15244861e-04 2.65003065e-04 2.42208339e-04
  9.59021380e-04 2.65334278e-04 1.44848437e-04 2.23905822e-04
  2.17360718e-04]
 [7.74107987e-05 1.25625235e-04 1.66592903e-04 1.52696242e-04
  2.65334278e-04 3.57253412e-04 1.35743174e-04 1.44619226e-04
  1.20544663e-04]
 [1.37466132e-04 1.65589075e-04 2.40478405e-04 1.02605747e-04
  1.44848437e-04 1.35743174e-04 4.65622780e-04 6.59445138e-05
  1.44725823e-04]
 [1.28871207e-04 7.98007726e-05 8.25903973e-06 1.18529988e-04
  2.23905822e-04 1.44619226e-04 6.59445138e-05 6.88354591e-04
  7.13161133e-05]
 [3.04613149e-04 2.87273708e-04 2.70597336e-04 1.36443676e-04
  2.17360718e-04 1.20544663e-04 1.44725823e-04 7.13161133e-05
  4.84568573e-04]]
Correlation [[1.         0.8867957  0.36315373 0.26087027 0.37719678 0.10640715
  0.16551451 0.12761662 0.35952449]
 [0.8867957  1.         0.44331657 0.30545526 0.39260679 0.19460607
  0.22468923 0.08905713 0.38210783]
 [0.36315373 0.44331657 1.         0.37750906 0.27483626 0.28307765
  0.35792848 0.01011022 0.39480549]
 [0.26087027 0.30545526 0.37750906 1.         0.43250562 0.44674219
  0.26294892 0.24982675 0.34276194]
 [0.37719678 0.39260679 0.27483626 0.43250562 1.         0.45330554
  0.21676171 0.27557838 0.31885211]
 [0.10640715 0.19460607 0.28307765 0.44674219 0.45330554 1.
  0.33282251 0.29162969 0.28972246]
 [0.16551451 0.22468923 0.35792848 0.26294892 0.21676171 0.33282251
  1.         0.1164811  0.30468516]
 [0.12761662 0.08905713 0.01011022 0.24982675 0.27557838 0.29162969
  0.1164811  1.         0.12348208]
 [0.35952449 0.38210783 0.39480549 0.34276194 0.31885211 0.28972246
  0.30468516 0.12348208 1.        ]]
24.900558004055362
Index(['BAC', 'C', 'GE', 'JNJ', 'MSFT', 'PG', 'T', 'WMT', 'XOM'], dtype='object', name='ticker')
[ 0.03548436 -0.00513189 -0.0128339   0.29949406 -0.06414959  0.23864434
  0.20060666  0.14062327  0.16726271]

In [50]:

#Generating random portfolios
random_portfolios3 = 2.*np.random.rand(1000,Sigma3.shape[0]) -1
random_portfolios3/= random_portfolios3.sum(axis=1)[..., np.newaxis]
rand_mean3 = np.dot(random_portfolios3, mu3)
rand_sd3 = np.diag((random_portfolios3@Sigma2)@random_portfolios3.T)**0.5

#Calculating mean and variance of minimal varaince portfolio
min_var_mean3 = np.dot(min_var_p3, mu3)
min_var_sd3 = min_var_p.dot(Sigma3).dot(min_var_p3.T)**0.5
print(min_var_sd3, min_var_mean3)
max_sd_plot=0.06

#Calculating results of previous optimals

mu2in3_mvf = mvf_portfolios2 @ mu3
sd2in3_mvf = np.diag((( mvf_portfolios2 @ Sigma3) @ mvf_portfolios2.T ))**0.5

mu1in3_mvf = mvf_portfolios @ mu3
sd1in3_mvf = np.diag((( mvf_portfolios @ Sigma3) @ mvf_portfolios.T ))**0.5



#Making the plot
plt.plot(rand_sd3[rand_sd3 <max_sd_plot], rand_mean3[rand_sd3 < max_sd_plot], 'b.')
plt.plot(sd3_mvf, mu3_mvf, 'go')
plt.plot(min_var_sd3, min_var_mean3, 'rx')
plt.plot(sd1in3_mvf, mu1in3_mvf, '*', color='orange')
plt.plot(sd2in3_mvf, mu2in3_mvf, 'y*')

0.013478190320770416 0.0011343138763670057

[<matplotlib.lines.Line2D at 0x7f2c332e3e50>]

Note that the efficient portfolios calculated with the first data-base (all before 2010) are subperforming (even if they are not that bad). This shows that periodic recalculation is needed. If trading costs are important, this might reduce the possibilities to keep true efficient portfolios in time.

In [0]:

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.

Table of Contents

Mean variance frontier with market data

In this notebook

Data cleaning

Efficient frontier calculation

Building the log-returns

Stationarity check

Mean-variance frontier construction

Stability in time

Product

Resources

Company

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more, all in one place. Commercial Alternative to JupyterHub.

Table of Contents

Mean variance frontier with market data

In this notebook

Data cleaning

Efficient frontier calculation

Building the log-returns

Stationarity check

Mean-variance frontier construction

Stability in time

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.