Common Pitfalls in Cointegration Testing: How to Avoid Spurious Relationships

The Allure and Danger of Spurious Relationships

Cointegration analysis is a effective technique for identifying stable, long-run relationships between non-stationary time series. However, the process is fraught with potential pitfalls that can lead to the identification of spurious relationships—correlations that appear statistically significant but are economically meaningless and will break down out-of-sample. A trading strategy built on a spurious cointegrating relationship is not just ineffective; it is a recipe for significant losses. Understanding the common mistakes and how to avoid them is paramount for any quantitative trader.

Pitfall 1: Ignoring the Order of Integration

Cointegration is defined as a linear combination of I(1) variables that is I(0). The very first step, therefore, must be to correctly determine the order of integration for each time series in the system.

The Mistake: A common error is to proceed with cointegration testing without first rigorously testing each individual series for a unit root using tests like the Augmented Dickey-Fuller (ADF) or Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test.
The Consequence: If you apply a cointegration test to a mix of I(0) and I(1) variables, or to variables that are I(2), the assumptions of the test are violated. For example, if one series is stationary (I(0)) to begin with, any linear combination involving it will be dominated by the non-stationary behavior of the other I(1) variables, and the residuals of a regression will almost certainly be non-stationary. This leads to a false negative—a failure to find cointegration when a different, valid relationship might exist.
The Solution: Pre-test all series. Ensure they are all I(1). If a series is I(0), it should not be included in the cointegrating relationship. If a series is I(2), it needs to be differenced twice to become stationary, and standard cointegration tests are not applicable.

Pitfall 2: Inappropriate Lag Length Selection

Both the ADF test for unit roots and the Johansen test for cointegration require specifying the number of lags of the differenced terms to include in the model. The purpose of these lags is to soak up any serial correlation in the residuals, ensuring they are white noise.

The Mistake: Choosing a lag length that is either too short or too long.
The Consequence:
- Too few lags: If the chosen lag length is insufficient to remove serial correlation in the residuals, the test statistics will be biased. This often leads to an oversized test, meaning you will reject the null hypothesis of no cointegration too often, leading to false positives (spurious relationships).
- Too many lags: Including unnecessary lags reduces the power of the test. The model becomes over-parameterized, and the additional noise can make it harder to detect a true cointegrating relationship, leading to false negatives.
The Solution: Use a systematic approach. Start with a reasonably high maximum number of lags and test down. Use information criteria like the Akaike Information Criterion (AIC) or the Bayesian Information Criterion (BIC) to guide the selection. AIC tends to choose more lags than BIC. For smaller samples, BIC is often preferred. It is also important to run diagnostic tests on the residuals of the final model to confirm the absence of serial correlation.

Pitfall 3: Mishandling Deterministic Components

The Johansen test and the Engle-Granger test can include deterministic components: a constant (intercept) and/or a linear time trend. The choice of which to include is important and has a significant impact on the test results.

The Mistake: Arbitrarily including or excluding these terms without economic justification or visual inspection of the data.
The Consequence: The important values for the cointegration tests depend on the specification of the deterministic terms. Using the wrong set of important values will lead to incorrect inferences. For example, if the data has a linear trend but you use a model with only a constant, you may fail to reject the null of no cointegration. Conversely, including a trend when none exists can reduce the power of the test.
The Solution: There are five standard cases for the Johansen test, ranging from no constant or trend to a quadratic trend in the data. The choice should be guided by:
1. Visual Inspection: Plot the time series. Do they appear to drift upwards over time? If so, a trend term might be appropriate.
2. Economic Theory: Is there a reason to believe the long-run relationship should have a non-zero mean (requiring a constant in the cointegrating equation) or that the variables are trending together (requiring a trend in the model)?
3. Formal Testing: Some software packages provide formal tests to help select the appropriate model for deterministic trends.

Pitfall 4: The Small Sample Size Problem

Cointegration tests are asymptotic tests, meaning their statistical properties are proven for very large samples. In practice, traders often work with limited historical data.

The Mistake: Applying cointegration tests to very short time series and over-relying on the p-values.
The Consequence: In small samples, the tests are known to have size distortions and low power. The estimated cointegrating vector can be highly imprecise. A relationship that appears significant over a two-year period might completely disappear when tested over a five-year period.
The Solution:
- Use more data: Whenever possible, use the longest available time series. For a meaningful cointegration analysis, several years of daily data (e.g., >1000 data points) is a good rule of thumb.
- Be skeptical: Treat results from small samples with extreme caution. A p-value of 0.04 from a short series is not strong evidence.
- Out-of-sample testing: The most important step. Reserve a portion of your data (e.g., the last 20%) for out-of-sample validation. Estimate the cointegrating vector on the in-sample data, then apply it to the out-of-sample data and test if the resulting spread is stationary. If it is not, the relationship is not stable.

Pitfall 5: Ignoring Structural Breaks

Financial markets are subject to sudden changes in regime, policy, or technology. These are known as structural breaks.

The Mistake: Assuming that the cointegrating relationship is stable over the entire sample period when a structural break may have occurred.
The Consequence: A standard cointegration test applied to a series with a structural break is biased towards the null hypothesis of no cointegration. You might conclude there is no relationship, when in fact there was a stable relationship that simply changed its parameters (e.g., the cointegrating vector $\theta$) at some point in time. For example, the relationship between oil prices and airline stocks might have fundamentally changed after the 2008 financial crisis.
The Solution:
- Test for breaks: Use tests like the Gregory-Hansen test, which explicitly tests for cointegration in the presence of a potential structural break.
- Rolling analysis: Perform the cointegration test on a rolling window of data. This can help identify periods where the relationship is stable and periods where it breaks down. A stable cointegrating vector should not change dramatically over time.

Conclusion

Cointegration testing is not a black box. A statistically significant result is the beginning, not the end, of the analysis. To avoid building strategies on foundations of sand, a trader must be diligent and methodical. This involves a careful process of pre-testing for integration order, systematically selecting lag lengths, thoughtfully specifying deterministic terms, validating results on out-of-sample data, and being ever-vigilant for the presence of structural breaks. By understanding and mitigating these common pitfalls, one can move from finding spurious correlations to identifying genuine, tradable economic equilibria.

Category	Pairs Cointegration
Read time	8 minutes
Published	Feb 28, 2026