Adaptive Backtesting: Machine Learning for Dynamic Position Sizing

Strategy Overview

This strategy employs machine learning to dynamically adjust position sizing. It moves beyond fixed risk percentages. A reinforcement learning (RL) agent learns optimal leverage or capital allocation for each trade. The system aims to maximize risk-adjusted returns. It adapts to changing market volatility and regime shifts. The core idea involves training an agent to choose a position size from a discrete set of options. Each option corresponds to a different risk exposure. The agent receives rewards based on portfolio performance. It penalizes large drawdowns.

System Design

The system uses a supervised learning model to predict short-term price movements. This model feeds into the RL agent. The supervised model utilizes features like VIX, bond yields, currency strength, and intermarket correlations. It outputs a probability distribution for future price direction. The RL agent then takes this distribution, along with current portfolio metrics (e.g., Sharpe ratio, maximum drawdown, current equity), as its state. It chooses an action: a position size multiplier (e.g., 0.5x, 1x, 1.5x, 2x, 2.5x). The reward function for the RL agent balances daily profit and loss with a penalty for exceeding a predefined maximum drawdown threshold. This encourages profitable actions while preserving capital.

Entry/Exit Rules

Entry signals originate from a separate, established trading strategy. This strategy could be a simple moving average crossover or a more complex algorithmic pattern recognition system. The RL agent does not generate entry signals. It only determines position size after an entry signal triggers. For example, if a 10-period EMA crosses above a 30-period EMA on a 1-hour chart, a long signal activates. The RL agent then evaluates the market state and decides the position size. Exit rules also follow the base strategy. Common exits include trailing stops, time-based exits, or profit targets. The RL agent does not interfere with exit logic. Its sole function is position sizing at entry.

Risk Parameters

The primary risk parameter is the maximum allowable drawdown. This is a hard constraint in the RL agent's reward function. If a trade causes the portfolio drawdown to exceed 5%, the agent receives a heavily negative reward. This discourages overly aggressive sizing. The discrete position size multipliers are also risk parameters. They cap the maximum leverage. For instance, a 2.5x multiplier means the agent can allocate 2.5 times the 'standard' position size. The standard size itself is a function of account equity and asset volatility. A standard position might target 1% of account equity at risk per trade. The RL agent then scales this 1% up or down. Volatility adjustments occur via an Average True Range (ATR) based calculation for stop loss placement. A wider stop loss due to higher ATR results in a smaller base position size to maintain the 1% risk per trade. The RL agent then applies its multiplier to this volatility-adjusted base size.

Practical Applications

Implement this system on highly liquid assets like S&P 500 futures (ES), EUR/USD, or major equities. These markets offer sufficient data for machine learning training. Data requirements include historical price data, volume, and relevant intermarket indicators. A minimum of 5 years of daily data is advisable for training the RL agent. For higher frequency trading, 1-minute or 5-minute data over 1-2 years can suffice. The training process for the RL agent requires significant computational resources. Out-of-sample testing is crucial. Use a walk-forward optimization approach. Retrain the agent periodically (e.g., quarterly or semi-annually) to adapt to new market conditions. Monitor the agent's performance metrics: average position size, win rate, profit factor, and maximum drawdown. Compare these to a fixed-position-size benchmark. Look for consistent outperformance in risk-adjusted returns. The system's adaptive nature aims to smooth equity curves and reduce tail risk during adverse market events. For example, during periods of high VIX, the RL agent might learn to reduce position sizes significantly. This protects capital during turbulent times. Conversely, during stable, trending markets, it might increase allocation to capitalize on sustained momentum. The system requires continuous monitoring and occasional retraining to remain effective. It is not a set-and-forget solution.

Category	Backtesting Systems
Read time	5 minutes
Published	Mar 1, 2026