William Gann: The Important Role of Feature Scaling and Normalization in Trading Models
Feature scaling is a important preprocessing step in many machine learning algorithms. It involves transforming the features to be on a similar scale. This is important because many algorithms, such as Support Vector Machines (SVMs) and models with L1 or L2 regularization, are sensitive to the scale of the features.
Scikit-Learn provides several scaling techniques, each with its own advantages and disadvantages.
StandardScaler
StandardScaler scales the features to have a mean of 0 and a standard deviation of 1. This is the most common scaling technique and is a good default choice.
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
MinMaxScaler
MinMaxScaler scales the features to be between a given minimum and maximum value, typically 0 and 1.
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)
RobustScaler
RobustScaler is similar to StandardScaler but is more robust to outliers. It scales the data according to the quantile range.
from sklearn.preprocessing import RobustScaler
scaler = RobustScaler()
X_scaled = scaler.fit_transform(X)
from sklearn.preprocessing import RobustScaler
scaler = RobustScaler()
X_scaled = scaler.fit_transform(X)
When to Use Each Scaler
| Scaler | When to Use |
|---|---|
| StandardScaler | When the data has a Gaussian distribution. |
| MinMaxScaler | When the data has a non-Gaussian distribution. |
| RobustScaler | When the data contains outliers. |
Mathematical Formulation: Standardization
The formula for standardization is:
z = rac{x - \mu}{\sigma}
Where:
- $x$ is the original feature vector.
- $\mu$ is the mean of the feature vector.
- $\sigma$ is the standard deviation of the feature vector.
By scaling your features appropriately, you can significantly improve the performance and stability of your trading models.
