What the models were trained on? Thanks, Dave for the detailed explanation. I naturally have an inquisitive mind so trying to get a better read of the system at hand so I don't make wrong assumptions. Of course, it is for you to judge what parts you consider trade secret...no issues there.
Also, I am not in any way trying to deny the early betting results you shared with us last week. It does support the evidence of valuable signals in picking winners and identifying profitable bets.
About out-of-sample testing, usually when folks build machine learning models they also set some subset of the data aside, e.g., randomly sampled 10%. Then, the model is trained using any appropriate algorithm for the task at hand. After training, the out-of-sample test data is fed to the model to judge how it performs against it before it is used in real life. The whole idea is to avoid overfitting as most ML models are prone to it, some more than others.
Another way to do it is setting aside a certain time slice instead of a randomly selected % of the observations/rows. For instance, I may be working with historical data from 2017 to 2023 for a given problem and I use the data from 2017 to 2022 to train my model and then validate it against the 2023 data. Because it has not been exposed to the 2023 data during training, it gives me a sense of the real-life performance of the model. Some algos are very good at memorizing training datasets, which can result in superb in-sample accuracy against training datasets that cannot be replicated against new data. Hope this explains...