Main Page > Articles > William Gann > William Gann's Advanced Feature Engineering with Pandas and NumPy for Financial Data

William Gann's Advanced Feature Engineering with Pandas and NumPy for Financial Data

From TradingHabits, the trading encyclopedia · 5 min read · February 28, 2026
The Black Book of Day Trading Strategies
Free Book

The Black Book of Day Trading Strategies

1,000 complete strategies · 31 chapters · Full trade plans

Financial data, with its inherent time-series nature and noise, presents unique challenges for feature engineering. While basic technical indicators can provide some signal, a truly robust algorithmic trading model requires a more sophisticated approach to feature creation. This is where the power of Pandas and NumPy comes to the forefront, enabling the creation of a diverse and informative feature set.

Time-Series Specific Operations

Lagging and Differencing

Lagging a time series is the process of shifting it by a certain number of periods. This is essential for creating features that capture the momentum or trend of an asset. Differencing, on the other hand, is the process of subtracting the previous value from the current value. This is useful for transforming a non-stationary time series into a stationary one.

python
import pandas as pd
import numpy as np

data = pd.DataFrame({'close': np.random.rand(10) * 10 + 100})
data['lag_1'] = data['close'].shift(1)
data['diff_1'] = data['close'].diff(1)

Rolling Window Statistics

Rolling window statistics are a effective way to capture the local dynamics of a time series. This involves calculating a statistic, such as the mean or standard deviation, over a moving window of a specified size.

python
data['rolling_mean_10'] = data['close'].rolling(window=10).mean()
data['rolling_std_10'] = data['close'].rolling(window=10).std()

Data Cleaning and Preprocessing

Handling Missing Values (NaNs)

Missing values are a common problem in financial data. They can arise from a variety of sources, such as data provider errors or illiquid assets. It is important to handle these missing values appropriately, as they can have a significant impact on model performance. Common techniques include forward-filling, back-filling, or interpolation.

python
data.fillna(method='ffill', inplace=True)

Outlier Detection and Treatment

Outliers are extreme values that deviate significantly from the rest of the data. They can be caused by data errors or by genuine market events. It is important to identify and handle outliers, as they can skew the results of a machine learning model. One common technique is to cap outliers at a certain number of standard deviations from the mean.

Formulas and Code Examples for Technical Indicators

Relative Strength Index (RSI)

The RSI is a momentum oscillator that measures the speed and change of price movements. It is calculated as:

RSI = 100 - rac{100}{1 + RS}

Where $RS$ is the average gain of up periods during a specified time frame divided by the average loss of down periods.

DateClose PriceGainLossAvg GainAvg LossRSRSI
2023-01-011000000050.0
2023-01-021022020inf100.0
2023-01-031010110.5266.7
........................

Moving Average Convergence Divergence (MACD)

The MACD is a trend-following momentum indicator that shows the relationship between two moving averages of a security’s price. The MACD is calculated by subtracting the 26-period Exponential Moving Average (EMA) from the 12-period EMA.

Bollinger Bands

Bollinger Bands are a type of statistical chart characterizing the prices and volatility over time of a financial instrument or commodity. They consist of a middle band being an N-period simple moving average (SMA), an upper band at K standard deviations above the middle band, and a lower band at K standard deviations below the middle band.

By combining these advanced feature engineering techniques, traders can create a rich and informative feature set that can significantly improve the performance of their algorithmic trading models.