Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place.
Image: ubuntu2004
Mean variance frontier with market data
** Camilo A. Garcia Trillos 2020**
In this notebook
We revisit the mean-variance frontier construction, this time using market data. We use a sub-sample of data, belonging to a selection of stocks in the market.
Data cleaning
We start by reading in data
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-2-14adc80f4979> in <module>
----> 1 data = pd.read_csv('~/Data/basket_20171201_10y.csv')
/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
686 )
687
--> 688 return _read(filepath_or_buffer, kwds)
689
690
/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
452
453 # Create the parser.
--> 454 parser = TextFileReader(fp_or_buf, **kwds)
455
456 if chunksize or iterator:
/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
946 self.options["has_index_names"] = kwds["has_index_names"]
947
--> 948 self._make_engine(self.engine)
949
950 def close(self):
/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py in _make_engine(self, engine)
1178 def _make_engine(self, engine="c"):
1179 if engine == "c":
-> 1180 self._engine = CParserWrapper(self.f, **self.options)
1181 else:
1182 if engine == "python":
/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
2008 kwds["usecols"] = self.usecols
2009
-> 2010 self._reader = parsers.TextReader(src, **kwds)
2011 self.unnamed_cols = self._reader.unnamed_cols
2012
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()
FileNotFoundError: [Errno 2] No such file or directory: '/home/user/Data/basket_20171201_10y.csv'
This corresponds to 10 years of daily information for a selection of assets.
First of all, we need to summarise, understand, visualize and clean the data. Let us look first at the different entries and their types:
Note that once again the 'date' field is not recognised as a data but as an object. Let us have a look at how it looks before we decide to transform it
It does look like a date with the format 'yyy-mm-dd'. We can then transform this field and update it.
Let us check that it looks the same, but that the field is now a date
Let su check if elements are unique per date.
Not equal... because we have several indices. So this time the index should be a combination of ticker and date. However, since we are going to work fron now on only with adjusted_close prices, an alternative is to "pivot" the table, so that the 'ticker', which identifies the asset, becomes now another entry.
The dataframe 'P' has only information of the 'adjusted close price' for each date for each one of the companies that are in the database, namely BAC (Bank of America), C (Citybank), GE (General Electric), JNJ (Johnson and Johnson), MSFT (Microsoft), PG (Proctor & Gamble), T (AT&T), WMT (Wallmart), and XOM (Exxon Mobile). All of these are large companies from different economic sectors including finance, technology, energy, retail and pharmaceutical.
Let us check that we did not loose any data when creating our pivot table.
No null data, so all fields are complete. Now, let us check that the data makes sense visually.
That 'C' line look suspicious... until we realise it belongs to Citygroup: a huge drop in between 2008-2010 is not so starnge if we think about the financial crisis of the time.
Efficient frontier calculation
We are now going to build the efficient frontier in this reduced portfolio.
Using one week as your period, find the mean_variance frontier if we assume that these are all the stocks on the market, and only using information before 01/01/2012.
We find the minimal variance portfolio (note there is no risk free asset).
Building the log-returns
Let us now resample using only weekly data...
Replot to check we did not do anything wrong
Good. Now, let us calculate both the returns and the log-returns.
Remember that negative results of log-returns means returns lower then 1 (i.e. effective losses)
Stationarity check
How stationary are the data? Let us look first at the log-returns. I will simply plot the mean and variance with a rolling window of 10 observations.
These values seem to be significantly bigger initially. What about standard deviation?
It looks that there is a significant ``change of regime'' around 2010. It is of course associated with the results of the financial crisis. We are going to consider first the mean variance using the entire history from 2007 to 2010 and then compare with the result from 2010 on.
Note: In class we worked with mean-varaince analysis of returns. It is often convenient to work with log-retunrs instead: on the one hand, log-returns are usually closer to sattionary. On the other hand, they tend to look closer to Gaussian
Mean-variance frontier construction
We start by calculating, the mean and variance for the selected history on the first case
Correlation are positive as expected... means seem reasonable.
Before calculating the optimal portfolio, we transform the variable Sigma in a numpy matrix and calculate its condition number to establish the stability of its inverse.
It is a large number, but still OK. We do not need to stabilise the inverse.
Now, we are going to re-define the Markowitz weights function to find the mean-variance frontier.
Let us calculate the means and variances of those portfolios.
We can also calculate the portfolio with the (quasi-) minimal variance portfolio
So that the minimal variance portfolio invests mostly on health and consumption (Johnson and Johnson, Proctor and Gamble, and Wallmart). Note that this is a risky strategy because it is too centered on one type of asset.
Let us check that everything is correct. We plot our surface and the minimum variance portfolio. To do so, we generate a bunch of random portfolios to visually verify that our calculated values are right.
Stability in time
What happens if we consider only recent history? If we had perfect stationarity, the optimal should still be very close to the one we found in the previous part. Let us make some calculations
The first big change is that in the minimal variance portfolio, there is no longer room for a positive position in GE or Exxon Mobile. On the other hand Bank of America makes its comeback: Note also the reduction in exposition to JNJ.
Note that with this data, the mean variance frontier achieves the same returns with much smaller variance.
How does the efficient portfolio calculated with the data of the first part do on this second part?
We can see that the efficient part of the frontier is not doing that bad.
Let us move forward in time a couple of years and evaluate how these two portfolios would do in the period 2015-.
Note that the efficient portfolios calculated with the first data-base (all before 2010) are subperforming (even if they are not that bad). This shows that periodic recalculation is needed. If trading costs are important, this might reduce the possibilities to keep true efficient portfolios in time.