10
Realized I was feeding my AI model the wrong type of data for 6 months straight
A buddy pointed out my training set was 90% text from forums instead of structured product specs, and I had to wonder how many of you have caught yourself doing something similarly dumb with your dataset prep?
2 comments
Log in to join the discussion
Log In2 Comments
tarag2823d ago
Oof, yeah, been there. Spent three months training a model on customer reviews when I should have been using technical documentation. The real kicker was it started generating feedback about products that didn't even exist yet. Had to backtrack and rebuild the whole pipeline from scratch, checking every single source file one by one. Saved myself by writing a quick script that flagged any text with more than 10% forum language patterns before it hit the training set.
9
patriciap5223d ago
What percentage threshold did you end up using to catch the forum patterns...
2