Now that we have carried out the initial exploratory data analysis, we can start building our models. We fit several machine learning models to our training data. To determine their quality, we test the models on unseen data. The following performance visualization is based on test data, i.e. data the model has not seen during the model training. This gives us an indication of how the individual model behaves on unseen data. Since models with a low log-loss tend to perform well on unseen data, we sort the models by this metric. For performance visualization, we consider the top five models, and based on the log-loss metric, we prefer the first three models.
In general, all models seem to work similarly in comparison to each other. An important diagram is the gain and lift diagram. This graph emphasizes how much the model improves the results compared to the random selection of a strategy. If we rank them in order of probability, the model is able to detect 53% of the strategies within the first 30% that are not classified as RL3 strategies.
Put differently, think of lift as a multiplier between what we have gained divided by what we expected without the model. For example, if we focused on the first 30%, we gained the ability to target 53% of non-RL3 strategies, but we expected only to target 30% of the non-RL3 strategies in the first 30% of the strategies. Therefore, the lift would be 1.76x (53/30), meaning the model has the ability targeting 1.76x better than random.