Path: blob/master/07_linear_models/02_fama_macbeth.ipynb
2923 views
How to build a linear factor model
Algorithmic trading strategies use linear factor models to quantify the relationship between the return of an asset and the sources of risk that represent the main drivers of these returns. Each factor risk carries a premium, and the total asset return can be expected to correspond to a weighted average of these risk premia.
There are several practical applications of factor models across the portfolio management process from construction and asset selection to risk management and performance evaluation. The importance of factor models continues to grow as common risk factors are now tradeable:
A summary of the returns of many assets by a much smaller number of factors reduces the amount of data required to estimate the covariance matrix when optimizing a portfolio
An estimate of the exposure of an asset or a portfolio to these factors allows for the management of the resultant risk, for instance by entering suitable hedges when risk factors are themselves traded
A factor model also permits the assessment of the incremental signal content of new alpha factors
A factor model can also help assess whether a manager's performance relative to a benchmark is indeed due to skill in selecting assets and timing the market, or if instead, the performance can be explained by portfolio tilts towards known return drivers that can today be replicated as low-cost, passively managed funds without incurring active management fees
Imports & Settings
Get Data
Fama and French make updated risk factor and research portfolio data available through their website, and you can use the pandas_datareader
package to obtain the data.
Risk Factors
In particular, we will be using the five Fama—French factors that result from sorting stocks first into three size groups and then into two for each of the remaining three firm-specific factors.
Hence, the factors involve three sets of value-weighted portfolios formed as 3 x 2 sorts on size and book-to-market, size and operating profitability, and size and investment. The risk factor values computed as the average returns of the portfolios (PF) as outlined in the following table:
Label | Name | Description |
---|---|---|
SMB | Small Minus Big | Average return on the nine small stock portfolios minus the average return on the nine big stock portfolios |
HML | High Minus Low | Average return on the two value portfolios minus the average return on the two growth portfolios |
RMW | Robust minus Weak | Average return on the two robust operating profitability portfolios minus the average return on the two weak operating profitability portfolios |
CMA | Conservative Minus Aggressive | Average return on the two conservative investment portfolios minus the average return on the two aggressive investment portfolios |
Rm-Rf | Excess return on the market | Value-weight return of all firms incorporated in the US and listed on the NYSE, AMEX, or NASDAQ at the beginning of month t with 'good' data for t minus the one-month Treasury bill rate |
The Fama-French 5 factors are based on the 6 value-weight portfolios formed on size and book-to-market, the 6 value-weight portfolios formed on size and operating profitability, and the 6 value-weight portfolios formed on size and investment.
We will use returns at a monthly frequency that we obtain for the period 2010 – 2017 as follows:
Portfolios
Fama and French also make available numerous portfolios that we can illustrate the estimation of the factor exposures, as well as the value of the risk premia available in the market for a given time period. We will use a panel of the 17 industry portfolios at a monthly frequency.
We will subtract the risk-free rate from the returns because the factor model works with excess returns:
Equity Data
Align data
Compute excess Returns
Fama-Macbeth Regression
Given data on risk factors and portfolio returns, it is useful to estimate the portfolio's exposure, that is, how much the risk factors drive portfolio returns, as well as how much the exposure to a given factor is worth, that is, the what market's risk factor premium is. The risk premium then permits to estimate the return for any portfolio provided the factor exposure is known or can be assumed.
To address the inference problem caused by the correlation of the residuals, Fama and MacBeth proposed a two-step methodology for a cross-sectional regression of returns on factors. The two-stage Fama—Macbeth regression is designed to estimate the premium rewarded for the exposure to a particular risk factor by the market. The two stages consist of:
First stage: N time-series regression, one for each asset or portfolio, of its excess returns on the factors to estimate the factor loadings.
Second stage: T cross-sectional regression, one for each time period, to estimate the risk premium.
See corresponding section in Chapter 7 of Machine Learning for Trading for details.
Now we can compute the factor risk premia as the time average and get t-statistic to assess their individual significance, using the assumption that the risk premia estimates are independent over time.
If we had a very large and representative data sample on traded risk factors we could use the sample mean as a risk premium estimate. However, we typically do not have a sufficiently long history to and the margin of error around the sample mean could be quite large.
The Fama—Macbeth methodology leverages the covariance of the factors with other assets to determine the factor premia. The second moment of asset returns is easier to estimate than the first moment, and obtaining more granular data improves estimation considerably, which is not true of mean estimation.
Step 1: Factor Exposures
We can implement the first stage to obtain the 17 factor loading estimates as follows:
Step 2: Risk Premia
For the second stage, we run 96 regressions of the period returns for the cross section of portfolios on the factor loadings
Results
Fama-Macbeth with the LinearModels library
The linear_models library extends statsmodels with various models for panel data and also implements the two-stage Fama—MacBeth procedure:
This provides us with the same result: