What began as a warning label on financial statements has become useful advice for how to think about almost anything: “Past performance is no guarantee of future results.” So why do so many in the AI field insist on believing the opposite? Too many researchers and practitioners remain stuck on the idea that the data they gathered in the past will produce flawless predictions for future data. If the past data are good, then the outcome will also be good in the future. That line of thinking received a major wake-up call recently when an MIT study found that the 10 most-cited data sets were riddled with label errors (in the training dataset, a picture of a dog is labeled as a cat, for example). These data sets form the foundation of how many AI systems are built and tested, so pervasive errors could mean that AI isn’t as advanced as we may think. After all, if AI can’t tell the difference between a mushroom and a spoon, or between the sound of Ariana Grande hitting a high note and a whistle (as the MIT study found and this MIT Tech Review article denotes), then why should we trust it to make decisions about our health or to drive our cars? The knee-jerk response from academia has been to refocus on cleaning up these benchmark data sets
Read more:
Today’s AI isn’t prepared for the messiness of reality