As a follow up to a question I asked Dave during last week's meeting, can we get a general sense about what data the AI models are trained on? Is it a fair assumption that they were fed all of the 3000+ factors from HSH during training? Similarly, were they then evaluated against out-of-sample test data? — Atakante
A GENERAL answer would be: It started with one year of everything, plus multiple years for some of the small sample types.
As for what factors were permitted, the answer is all but a handful that had some known issues.
I do not understand what "evaluated against out-of-sample test data" means. If you mean was the AI TESTED, the answer is no.
I ask b/c it feels right to look deeper into how AI predictions fare in real life with an eye towards their performance based on factors/variables the models were NOT trained on. Given tens of thousands of pace lines the models are using to make inference, it seems to me an uphill battle trying to prove them right/wrong with even couple hundred manually tracked races for those variables they already considered during training. — Atakante
You are correct.
There is zero proof.
If memory serves, the chaos race variable/factor was NOT an input to the models so that's one candidate factor worthwhile manually diving into. Is it a safe assumption that race-level factors were excluded from training data? If not, is there a short list of other candidates or a shorthand logic to find others to manually "study"? — Atakante
Chaos = correct, but there was a fitness function involved.
Race Level: Nothing was left out.
___________________
The questions you have asked are actually pushing the envelope of need-to-know.
The AI is a very strong engine.
After my recent small sample test that was a day chosen at random, I am actually surprised that it performed so well.
I simply threw together a logical "system" based upon what has been discussed and it crushed the game. After going back and correcting a handful of races I had tagged, and correcting a few mistakes,
manually assembled these statistics.
This is pretty amazing output.
DUTCHING: TOTE ODDS represents the actual dutched betting for the 55 races.
FLAT BETTING (A, A+) represents betting $2 on each A or A+ horse. IOW, betting only the horses that computed as having a $2.20 or higher $Net.
FLAT BETTING (BAD BETS) represents all the horses bet that were
not A or A+ HORSES (according to the rules provided).