Feature Interaction and Polynomial Feature Creation for Non-Linear Models

Machine learning models are increasingly being used to predict rare events in financial markets, such as flash crashes, market crises, and bankruptcies. However, these models face a major challenge: the data is highly imbalanced. The number of instances of the rare event (the positive class) is dwarfed by the number of instances of the normal event (the negative class). This can lead to models that are biased towards the majority class and have poor predictive performance on the minority class. This article explores the concept of target engineering, a set of techniques for addressing the problem of imbalanced classes in rare event prediction.

The Problem with Imbalanced Data

When a machine learning model is trained on an imbalanced dataset, it can achieve a high level of accuracy by simply predicting the majority class all the time. For example, if a dataset has 99% negative instances and 1% positive instances, a model that always predicts the negative class will have an accuracy of 99%. However, this model is useless for predicting the positive class. This is a major problem in rare event prediction, where the goal is to identify the rare event.

There are a number of ways to address the problem of imbalanced data. One common approach is to use a resampling technique, such as oversampling the minority class or undersampling the majority class. Another approach is to use a cost-sensitive learning algorithm, which assigns a higher cost to misclassifying the minority class. However, these techniques can be difficult to implement and may not always be effective.

Target Engineering: A More Direct Approach

Target engineering is a more direct approach to addressing the problem of imbalanced data. The idea is to modify the target variable itself to make it more balanced. There are a number of different ways to do this. One common approach is to use a technique called "time-to-event" analysis. Instead of predicting whether or not a rare event will occur, we predict the time until the next rare event. This transforms the problem from a classification problem to a regression problem, which can be easier to solve.

Another approach to target engineering is to use a technique called "propensity score matching." The idea is to create a synthetic dataset in which the number of positive and negative instances is balanced. This is done by matching each positive instance with a negative instance that has a similar propensity score. The propensity score is the probability of an instance being positive, and it can be estimated using a logistic regression model.

Target Engineering in Practice

Let's consider the example of predicting a flash crash. A flash crash is a rare event, so the data is highly imbalanced. To address this, we could use target engineering to transform the problem. Instead of predicting whether or not a flash crash will occur, we could predict the time until the next flash crash. This would give us a more balanced target variable, which would make it easier to train a machine learning model.

We could also use propensity score matching to create a synthetic dataset. We would first train a logistic regression model to predict the probability of a flash crash. We would then use this model to calculate the propensity score for each instance in the dataset. Finally, we would match each positive instance (i.e., each flash crash) with a negative instance that has a similar propensity score. This would give us a balanced dataset that we could then use to train a machine learning model.

Conclusion

Target engineering is a effective set of techniques for addressing the problem of imbalanced classes in rare event prediction. By modifying the target variable itself, we can create a more balanced dataset that is easier to model. This can lead to machine learning models that have better predictive performance on the minority class, which is essential for predicting rare events in financial markets.

Category	Quantitative Methods
Read time	7 minutes
Published	Feb 28, 2026

Feature Interaction and Polynomial Feature Creation for Non-Linear Models

The Black Book of Day Trading Strategies

The Problem with Imbalanced Data

Target Engineering: A More Direct Approach

Target Engineering in Practice

Conclusion