Selecting tradeable pairs: which measure to use?


Statistical arbitrage pairs is a strategy that bets divergences in stock pairs sharing common factor risk will revert to a stationary mean, allowing us to fade these divergences and make money.

In this analysis I look at methods for the selection of tradeable pairs.  Some traders have suggested using correlation coefficients such as Pearson correlation or variants thereof -such as Spearman which measures rank correlation .  A different approach is to consider cointegration between stock pairs and to use that.

But which approach is best?  To find out I take a universe of stocks with similar fundamental characteristics, define a set of basic trading rules that involve selling one stock and buying another when the ratio exceeds a threshold, and exiting the trade when the ratio touches a rolling mean.   This generates some approximate measures of profitability (% wins, cumulative profitability).  I then run a series of regressions where I regress the profitability metric(s) generated above against the statistical measures mentioned previously on the same data.

I start with a universe of 36 stocks in the Oil Well Services and Equipment sector.  These are companies engaged in a common business area and should have a large amount of common factor risk, making them good candidates for pairs trading.

A simple trading system


We create a basic means reversion trading system from the following rules:

§  1082 unique pairs are generated from the 36 stocks

§  1 year of data (13/11/08 to 13/11/09)

§  Daily dividend adjusted close prices are used

§  Open trade when ratio moves > 2 Std Dev from a 14 day moving average

§  Close trade when ratio reverts to the moving average

The results are generated below. We can see that the average win rate is > 50% and that the win rate appears to be distributed lognormally.

Applying some statistics


So far so good – we have a universe of stocks we suspect may share common movement, and a lets now apply some statistical measures to the same set of data

§  Pearson correlation coefficient

What is it? The amount of the two stock’s variance is explained by their covariance.  This measures their tendency to move together.

§  Spearman correlation coefficient

What is it? Just Pearson Correlation based on the rank of  price in its past series rather than actual prices – arguably a better proxy for cointegration as it measures correlation in the ranks of prices, which disallows unequal movement in prices that would cause the spread to diverge.

§  Original Dickey Fuller (p-value).

What is it? A unit root test for 1st order autocorrelation.  This means testing to see if a value is trending, by regressing current values against their lagged equivalents.  If the regression coefficient is < 1 the values are assumed not to diverge (this is good!)

§  Augmented Dickey Fuller (p-value)

What is it? Same as the above, but with additional lags added to compensate for residual correlation, where the results of the ADF test will be “muddied” by the influence of data points 2,3...or (n) periods back

§  Model specification tests to select lags

What is it?  We can use the Akaike Information Criteria (AIC) or the Bayes Information Criteria to select the optimal lag length based on a signal to noise tradeoff between explanatory power and degrees of freedom introduced.  The CADF test offers this functionality.

§  Other variants of the ADF that either switch on or off adjustment for intercept and drift in the regression calculation.

OK.  First we apply the above statistical measures to the same set of data we used to generate the win rate.  Now let’s eyeball both the correlation and cointegration values.

 

 

 

The illustration show Pearson’s correlation coefficient on the x-axis and the winrate of our simple system on the y-axis.  A couple of observations:

§  Where the correlation coefficient (on the x-axis) exceeds 0.5 we see a clustering of points.  This suggests a majority of the two stock’s variance is explained by their covariance (or they tend to move together).  This is not surprising as they come from the same industry subgroup.

§  Contrary to what we might expect, the points do not appear to trend upward, and higher correlation coefficients so not appear to coincide with a higher win rate.  For this data set, Pearson Correlation does not look like it associates with profitability in any meaningful way. 

Now let’s consider a dickey fuller test for cointegration, and compare the winrate of the pair with the p-value, which is a measure of confidence that the stocks are cointegrated

 

Here the illustration depicts the ADF p-value on the x-axis and the winrate of our simple system on the y-axis

We can observe that the trend is a downward drift.  This suggests there may be a meaningful relationship between ADF p-value and the win rate of our system.  As lower p-values associate with cointegration and we would expect cointegrated stocks to perform well in a means reversion pairs system  these results are consistent with what we might expect

Regression data


To judge which statistical measure (if any) best explains the system’s win rate we can regress a chosen metric generated by the statistical test with the % win rate.  Systems with higher explanatory power should have a higher R squared and t-value above 2.0 to have statistical significance.

Measure

R-squared

Absolute t-value |t|

Pearson correlation

-0.0009

0.2714

Spearman correlation

0.0001

1.04074

ADF (0 lag)

0.1222

12.3001

ADF (1 lag)

0.1110

11.653

ADF (2 lags)

0.0915

10.4785

ADF (2 lags)

0.0876

10.2323

ADF (3 lags)

0.0589

8.28

CADF AIC selected lag

0.1071

11.4246

CADF BIC selected lag

0.1143

11.8463

CADF No drift

0.0294

5.8095

CADF No drift or intercept

-0.0008

0.35436

 

Observations and Conclusions


§  Both correlation measures don’t appear to have any statistical significance.

 

§  The original Dickey Fuller test (or ADF with zero lag) has the highest amount of significance.  As we introduce lags the power of the test and the explanatory power of the variable degrade.

 

§  Using a signal to noise measures to choose the lag length (which we might instinctively prefer) also gives good results

 

§  The tests that remove (or control for) drift and intercept performed better that ones where the effects of intercept and trend/drift were ignored.

In interpreting the results, we have to be mindful of the limitations of the test.  We used a restricted universe of 36 stocks in a particular industry group, considered  a one year window of data and used a single simple trading strategy as a proxy for profitability.  Nevertheless the exercise appears to show the power of simple cointegration tests and the limits of optimising them.  Improvements in the approach might be to regress profitability on statistical output from a prior, lagged period to see whether the measures have any real predictive power, and to introduce newer approaches such as threshold cointegration.

 

Paul Farrington, December ‘09

pfarrington@gmail.com