Statistical arbitrage pairs is a strategy that bets
divergences in stock pairs sharing common factor risk will revert to a
stationary mean, allowing us to fade these divergences and make money.
In this analysis I look at methods for the selection of
tradeable pairs. Some traders have
suggested using correlation coefficients such as Pearson correlation or
variants thereof -such as Spearman which measures rank correlation . A different approach is to consider
cointegration between stock pairs and to use that.
But which approach is best? To find out I take a universe of stocks with
similar fundamental characteristics, define a set of basic trading rules that
involve selling one stock and buying another when the ratio exceeds a
threshold, and exiting the trade when the ratio touches a rolling mean. This generates some approximate measures of
profitability (% wins, cumulative profitability). I then run a series of regressions where I
regress the profitability metric(s) generated above against the statistical
measures mentioned previously on the same data.
I start with a universe of 36 stocks in the Oil Well
Services and Equipment sector. These are
companies engaged in a common business area and should have a large amount of
common factor risk, making them good candidates for pairs trading.
We create a basic means reversion trading system from the
following rules:
§ 1082
unique pairs are generated from the 36 stocks
§ 1
year of data (13/11/08 to 13/11/09)
§ Daily
dividend adjusted close prices are used
§ Open
trade when ratio moves > 2 Std Dev from a 14 day moving average
§ Close
trade when ratio reverts to the moving average
The results are generated below. We can see that the
average win rate is > 50% and that the win rate appears to be distributed
lognormally.

So far so good – we have a universe of stocks we suspect
may share common movement, and a lets now apply some statistical measures to
the same set of data
§ Pearson
correlation coefficient
What is it? The amount of the two stock’s variance is
explained by their covariance. This
measures their tendency to move together.
§ Spearman
correlation coefficient
What is it? Just Pearson Correlation based on the rank
of price in its past series rather than
actual prices – arguably a better proxy for cointegration as it measures
correlation in the ranks of prices, which disallows unequal movement in prices
that would cause the spread to diverge.
§ Original
Dickey Fuller (p-value).
What is it? A unit root test for 1st order
autocorrelation. This means testing to
see if a value is trending, by regressing current values against their lagged
equivalents. If the regression coefficient
is < 1 the values are assumed not to diverge (this is good!)
§ Augmented
Dickey Fuller (p-value)
What is it? Same as the above, but with additional lags
added to compensate for residual correlation, where the results of the ADF test
will be “muddied” by the influence of data points 2,3...or (n) periods back
§ Model
specification tests to select lags
What is it? We can
use the Akaike Information Criteria (AIC) or the Bayes Information Criteria to
select the optimal lag length based on a signal to noise tradeoff between
explanatory power and degrees of freedom introduced. The CADF test offers this functionality.
§ Other
variants of the ADF that either switch on or off adjustment for intercept and
drift in the regression calculation.
OK. First we apply
the above statistical measures to the same set of data we used to generate the
win rate. Now let’s eyeball both the
correlation and cointegration values.

The illustration show Pearson’s correlation coefficient
on the x-axis and the winrate of our simple system on the y-axis. A couple of observations:
§ Where
the correlation coefficient (on the x-axis) exceeds 0.5 we see a clustering of
points. This suggests a majority of the
two stock’s variance is explained by their covariance (or they tend to move
together). This is not surprising as
they come from the same industry subgroup.
§ Contrary
to what we might expect, the points do not appear to trend upward, and higher
correlation coefficients so not appear to coincide with a higher win rate. For this data set, Pearson Correlation does
not look like it associates with profitability in any meaningful way.
Now let’s consider a dickey fuller test for
cointegration, and compare the winrate of the pair with the p-value, which is a
measure of confidence that the stocks are cointegrated

Here the illustration depicts the ADF p-value on the
x-axis and the winrate of our simple system on the y-axis
We can observe that the trend is a downward drift. This suggests there may be a meaningful
relationship between ADF p-value and the win rate of our system. As lower p-values associate with
cointegration and we would expect cointegrated stocks to perform well in a
means reversion pairs system these
results are consistent with what we might expect
To judge which statistical measure (if any) best explains
the system’s win rate we can regress a chosen metric generated by the
statistical test with the % win rate.
Systems with higher explanatory power should have a higher R squared and
t-value above 2.0 to have statistical significance.
|
Measure |
R-squared |
Absolute t-value |t| |
|
Pearson
correlation |
-0.0009 |
0.2714 |
|
Spearman
correlation |
0.0001 |
1.04074 |
|
ADF (0 lag) |
0.1222 |
12.3001 |
|
ADF (1 lag) |
0.1110 |
11.653 |
|
ADF (2 lags) |
0.0915 |
10.4785 |
|
ADF (2 lags) |
0.0876 |
10.2323 |
|
ADF (3 lags) |
0.0589 |
8.28 |
|
CADF AIC
selected lag |
0.1071 |
11.4246 |
|
CADF BIC
selected lag |
0.1143 |
11.8463 |
|
CADF No drift |
0.0294 |
5.8095 |
|
CADF No drift
or intercept |
-0.0008 |
0.35436 |
§ Both
correlation measures don’t appear to have any statistical significance.
§ The
original Dickey Fuller test (or ADF with zero lag) has the highest amount of
significance. As we introduce lags the
power of the test and the explanatory power of the variable degrade.
§ Using
a signal to noise measures to choose the lag length (which we might
instinctively prefer) also gives good results
§ The
tests that remove (or control for) drift and intercept performed better that
ones where the effects of intercept and trend/drift were ignored.
In interpreting the results, we have to be mindful of the
limitations of the test. We used a
restricted universe of 36 stocks in a particular industry group,
considered a one year window of data and
used a single simple trading strategy as a proxy for profitability. Nevertheless the exercise appears to show the
power of simple cointegration tests and the limits of optimising them. Improvements in the approach might be to
regress profitability on statistical output from a prior, lagged period to see
whether the measures have any real predictive power, and to introduce newer
approaches such as threshold cointegration.
Paul Farrington, December ‘09
pfarrington@gmail.com