Path: blob/master/Analysis/02 Kalman Filter Based Pairs Trading.ipynb
995 views
Welcome to The QuantConnect Research Page!
Refer to this page for documentation https://www.quantconnect.com/docs#Introduction-to-Jupyter
Contribute to this template file https://github.com/QuantConnect/Research/tree/master/Notebooks
pairs trading
Pairs trading is a market neutral trading strategy and it belongs to statistical arbitrage. The basic idea is to select two stocks which move similarly, sell high priced stock and buy low priced stock where there is a price divergence between the pairs.
Cointegration
Before using pairs trading, we need to know the cointegration. Cointegration is a statistical property of time series (that is a series of random variables)
Correlation specifies the co-movement of return, it is a short-term relationship
Cointegration specifies co-movement of price, it is a long-term relationship
Generally speaking, the weak stationary process means the mean and the autocovariance do not vary from time. For example, white noise is an example of stationary time series because and .
If two series {} and {} are not stationary but their linear combination is a stationary process, then we say {xt} and {yt} are cointegrated. Here we take the first-order difference of x and y, then we say their relationship is first-order cointegration. Most of the financial time series are integrated into order 1.
In general, we use Augmented Dickey-Fuller test to test cointegration.
Can we apply this idea to trading strategy?
The most widely used model of stock price is is stock price, is average return(drift term), is volatility (variance term), is Brownian motion. From ito's Lemma we get So logarithm price is integrated into order 1.
Step 1: Find two likely cointegrated stocks
We choose stock A :XOM, stock B:CVX. From the above plot, we can see two series are highly correlated with each other. Next, we will explore this relationship further.
Step 2: Estimate Spreads
If we have two stocks, X and Y, which are cointegrated in their price movements, then any divergence in the spread from 0 should be temporary and mean-reverting. Next step we will estimate the spread series.
Step 3: Check Stationarity
From the above plot, the first order difference seems to be stationary and mean-reversion. Next, we will check if it is stationary.
Running the example prints the test statistic value of -3.45. The more negative this statistic, the more likely we are to reject the null hypothesis (there is a unit root).
As part of the output, we get a table to help determine the ADF statistic. We can see that our statistic value of -3.45 is less than the value of -3.435 at 1%. The p-value 0.009 is less than 0.05.
This suggests that we can reject the null hypothesis with a significance level of less than 1%. Rejecting the null hypothesis means that the process has no unit root, and in turn that the time series is stationary or does not have time-dependent structure.
Step 4: Create Trading Signal
Here we use 1.96 times the standard deviation as our threshold. 1.96 is the approximate value of the 97.5 percentile point of the normal distribution. 95% of the area under a normal curve lies within roughly 1.96 standard deviations of the mean. If the sample is out of this range, then we think there is a price divergence between these two stocks price which means the pairs trading opportunity.
Here buying the spread means buy 1 unit of stock B(CVX) and sell units of stock A(XOM). We expect that the relationship between x and y will hold in the future. Buying the spread when it is lower than the standard deviation and closing out the position when it returns to mean . Selling the spread means to sell 1 unit of stock B(CVX) and buy units of stock A(XOM) and when it is above and close the position when reaching the long-term mean to realize a profit.
Method: Kalman Filter
If we use linear regression to estimate those two parameters for the spread calculation, the main issue is that we have to pick an arbitrary lookback window and assume in the near future they will still keep this relationship and the spread will convergence to its long-term equilibrium. But in practice, and are not constants and vary over time. They are not market observables. On the other hand, the long-term relationship can break down.
In the following research, we use Kalman filter to model the spread. This is an adaptive filter which updates itself iteratively and produces , dynamically. We use the python package pykalman which has the EM method that calibrates the covariance matrices over the training period.
For the model details https://www.quantstart.com/articles/Dynamic-Hedge-Ratio-Between-ETF-Pairs-Using-the-Kalman-Filter
From the result above, the slope increases and the intercept doesn't change too much as the time goes.
Next we plot the original spread and the spread estimated using Kalman Filter.