Main Page > Articles > Pairs Cointegration > Statistical Arbitrage: A Vectorized Approach Using Pandas for Pairs Trading

Statistical Arbitrage: A Vectorized Approach Using Pandas for Pairs Trading

From TradingHabits, the trading encyclopedia · 7 min read · February 28, 2026
The Black Book of Day Trading Strategies
Free Book

The Black Book of Day Trading Strategies

1,000 complete strategies · 31 chapters · Full trade plans

Statistical arbitrage is a class of mean-reversion strategies that exploit temporary price discrepancies between related financial instruments. Pairs trading is a classic example of statistical arbitrage, where two historically correlated assets are traded in a market-neutral way. This article provides a guide to implementing a pairs trading strategy using a vectorized approach with Pandas, covering cointegration analysis, spread calculation, and z-score modeling.

The Theory of Pairs Trading

The fundamental idea behind pairs trading is that two assets that have a long-term economic relationship should move together. When the prices of these assets diverge, a trading opportunity arises. The strategy involves shorting the outperforming asset and buying the underperforming asset, with the expectation that the spread between them will eventually converge to its historical mean.

Identifying Cointegrated Pairs

The first step in a pairs trading strategy is to identify pairs of assets that are cointegrated. Cointegration is a statistical property of two or more time series that indicates a long-term equilibrium relationship between them. The Engle-Granger two-step method is a common approach for testing for cointegration:

  1. Test for Stationarity: Use the Augmented Dickey-Fuller (ADF) test to ensure that both time series are integrated of order 1 (i.e., I(1)).
  2. Regress One Series on the Other: Run a linear regression of one time series on the other and obtain the residuals.
  3. Test Residuals for Stationarity: Use the ADF test to check if the residuals are stationary (i.e., I(0)). If they are, the two time series are cointegrated.
python
from statsmodels.tsa.stattools import coint
import pandas as pd

# Load data for two assets
asset1 = pd.read_csv('asset1.csv', index_col='date', parse_dates=True)['close']
asset2 = pd.read_csv('asset2.csv', index_col='date', parse_dates=True)['close']

# Test for cointegration
score, p_value, _ = coint(asset1, asset2)

if p_value < 0.05:
    print('The two assets are cointegrated')
else:
    print('The two assets are not cointegrated')

Modeling the Spread

Once a cointegrated pair has been identified, the next step is to model the spread between the two assets. The spread can be calculated as the difference between the two prices, or as the residual from the cointegration regression.

A common approach is to normalize the spread by calculating its z-score. The z-score measures how many standard deviations the current spread is from its historical mean.

python
# Calculate the spread
spread = asset1 - asset2

# Calculate the z-score
z_score = (spread - spread.mean()) / spread.std()

Generating Trading Signals

Trading signals can be generated based on the z-score of the spread. A common strategy is to:

  • Go long the spread (buy asset 1, short asset 2) when the z-score falls below a certain threshold (e.g., -2.0).
  • Go short the spread (short asset 1, buy asset 2) when the z-score rises above a certain threshold (e.g., 2.0).
  • Exit the position when the z-score crosses back to zero.
python
# Generate trading signals
signals = pd.DataFrame(index=z_score.index)
signals['signal'] = 0
signals['signal'][z_score < -2.0] = 1
signals['signal'][z_score > 2.0] = -1
signals['positions'] = signals['signal'].diff()

Backtesting the Strategy

The final step is to backtest the strategy to evaluate its performance. This involves simulating the trades based on the generated signals and calculating key performance metrics, such as Sharpe ratio, drawdown, and CAGR.

Conclusion

Pairs trading is a effective statistical arbitrage strategy that can be implemented using a vectorized approach with Pandas. By identifying cointegrated pairs, modeling the spread, and generating trading signals based on z-scores, you can build a robust and profitable trading strategy. However, it is important to be aware of the potential pitfalls, such as the breakdown of cointegration relationships and the impact of transaction costs.