Integrating Machine Learning into a Python Trading Bot

Using ML Models to Generate Trading Signals

The integration of machine learning (ML) into trading bots represents a significant step up from purely technical or rule-based systems. Instead of relying on predefined indicators and static rules, an ML-powered bot can learn from historical data to identify complex patterns and relationships that may not be apparent to human traders. These learned patterns can then be used to generate more nuanced and potentially more profitable trading signals.

Several types of ML models can be applied to trading:

Supervised Learning: This is the most common approach, where a model is trained on a labeled dataset. In trading, the "label" is typically the future price movement (e.g., up or down). The model learns to predict this label based on a set of input "features." Common supervised learning models for trading include:
- Logistic Regression: A simple yet effective model for binary classification (e.g., will the price go up or down?).
- Support Vector Machines (SVMs): A effective model that can find complex non-linear relationships in the data.
- Random Forests and Gradient Boosting Machines: Ensemble methods that combine multiple decision trees to create a more robust and accurate model.
Unsupervised Learning: These models are used to find hidden structures in unlabeled data. In trading, they can be used for tasks like clustering similar assets or identifying different market regimes.
Reinforcement Learning (RL): In this paradigm, an "agent" learns to make decisions by interacting with an environment and receiving rewards or penalties. An RL-based trading bot can learn a trading strategy from scratch by being rewarded for profitable trades and penalized for losses.

Feature Engineering for Financial Time-Series Data

The performance of any ML model is highly dependent on the quality of the features it is trained on. Feature engineering is the process of creating meaningful input variables for the model from raw data. For financial time-series data, this is a important and often time-consuming step. Raw price data is often non-stationary and noisy, making it difficult for ML models to learn from directly.

Effective features for financial time-series often capture different aspects of price action, momentum, and volatility. Some common feature engineering techniques include:

Technical Indicators: Standard technical indicators like Moving Averages, Relative Strength Index (RSI), and Bollinger Bands can be used as features.
Lagged Returns: The returns of the asset over various past periods (e.g., 1-day, 5-day, 20-day returns) can be effective features.
Volatility Measures: Features that capture the asset's volatility, such as the standard deviation of returns or the Average True Range (ATR).
Time-Based Features: Features derived from the time of day, day of the week, or month of the year can capture seasonal patterns.
Cross-Asset Features: The returns or volatility of related assets (e.g., the S&P 500 for a stock, or a competing currency for a forex pair) can provide valuable context.

python

import pandas as pd

# Assuming 'df' is a pandas DataFrame with 'Close' prices
def create_features(df):
    df['log_returns'] = np.log(df['Close'] / df['Close'].shift(1))
    df['sma_10'] = df['Close'].rolling(window=10).mean()
    df['sma_50'] = df['Close'].rolling(window=50).mean()
    df['rsi'] = ta.momentum.RSIIndicator(df['Close']).rsi()
    df['atr'] = ta.volatility.AverageTrueRange(df['High'], df['Low'], df['Close']).average_true_range()
    df.dropna(inplace=True)
    return df

import pandas as pd

# Assuming 'df' is a pandas DataFrame with 'Close' prices
def create_features(df):
    df['log_returns'] = np.log(df['Close'] / df['Close'].shift(1))
    df['sma_10'] = df['Close'].rolling(window=10).mean()
    df['sma_50'] = df['Close'].rolling(window=50).mean()
    df['rsi'] = ta.momentum.RSIIndicator(df['Close']).rsi()
    df['atr'] = ta.volatility.AverageTrueRange(df['High'], df['Low'], df['Close']).average_true_range()
    df.dropna(inplace=True)
    return df

Training and Evaluating ML Models

Once a set of features has been created, the next step is to train and evaluate the ML model. This process involves splitting the historical data into three sets:

Training Set: The largest portion of the data, used to train the model.
Validation Set: Used to tune the model's hyperparameters (e.g., the number of trees in a random forest) and to prevent overfitting.
Test Set: A completely unseen portion of the data used to evaluate the final performance of the trained model.

It is important that the data is split chronologically. The training set should come first, followed by the validation set, and finally the test set. This simulates how the model would be used in a real-world scenario, where it is trained on past data and used to predict future outcomes. Using a random split would introduce lookahead bias, as the model would be trained on data from the future relative to the data it is being tested on.

The scikit-learn library in Python provides a comprehensive set of tools for training and evaluating ML models.

python

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Assuming 'X' is a DataFrame of features and 'y' is the target variable (e.g., 1 for up, 0 for down)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False) # Chronological split

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.2%}")

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Assuming 'X' is a DataFrame of features and 'y' is the target variable (e.g., 1 for up, 0 for down)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False) # Chronological split

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.2%}")

Integrating a Trained Model into the Bot's Strategy Component

After a model has been trained and evaluated, the final step is to integrate it into the trading bot's strategy component. The trained model can be saved to a file using a library like joblib or pickle. The strategy component can then load this model and use it to generate trading signals.

When a new MarketEvent arrives, the strategy component will perform the same feature engineering steps that were used during training. The resulting features are then fed into the loaded ML model, which will output a prediction. This prediction (e.g., a probability of the price going up) is then used to generate a SignalEvent.

python

import joblib

# --- In the training script ---
# model.fit(X_train, y_train)
# joblib.dump(model, 'trading_model.pkl')

# --- In the Strategy component of the bot ---
class MLStrategy:
    def __init__(self, events):
        self.events = events
        self.model = joblib.load('trading_model.pkl')

    def on_market_event(self, event):
        # 1. Get the latest market data
        latest_data = self.get_latest_data(event.symbol)

        # 2. Engineer features
        features = self.create_features(latest_data)

        # 3. Get a prediction from the model
        prediction = self.model.predict(features)

        # 4. Generate a SignalEvent based on the prediction
        if prediction == 1:
            signal = SignalEvent(event.symbol, 'LONG')
            self.events.put(signal)
        elif prediction == -1:
            signal = SignalEvent(event.symbol, 'SHORT')
            self.events.put(signal)

import joblib

# --- In the training script ---
# model.fit(X_train, y_train)
# joblib.dump(model, 'trading_model.pkl')

# --- In the Strategy component of the bot ---
class MLStrategy:
    def __init__(self, events):
        self.events = events
        self.model = joblib.load('trading_model.pkl')

    def on_market_event(self, event):
        # 1. Get the latest market data
        latest_data = self.get_latest_data(event.symbol)

        # 2. Engineer features
        features = self.create_features(latest_data)

        # 3. Get a prediction from the model
        prediction = self.model.predict(features)

        # 4. Generate a SignalEvent based on the prediction
        if prediction == 1:
            signal = SignalEvent(event.symbol, 'LONG')
            self.events.put(signal)
        elif prediction == -1:
            signal = SignalEvent(event.symbol, 'SHORT')
            self.events.put(signal)

By integrating a machine learning model, the trading bot can move beyond simple, static rules and adapt its strategy to the ever-changing dynamics of the market. However, it is important to remember that ML models are not a silver bullet. They require careful feature engineering, rigorous testing, and continuous monitoring to be effective in a live trading environment.

Category	Machine Learning Trading
Read time	5 minutes
Published	Feb 28, 2026

Integrating Machine Learning into a Python Trading Bot

The Black Book of Day Trading Strategies

Using ML Models to Generate Trading Signals

Feature Engineering for Financial Time-Series Data

Training and Evaluating ML Models

Integrating a Trained Model into the Bot's Strategy Component