Today’s AI isn’t prepared for the messiness of reality
What began as a warning label on financial statements has become useful advice for how to think about almost anything: “Past performance is no guarantee of future results.” So why do so many in the AI field insist on believing the opposite? Too many researchers and practitioners remain stuck on the idea that the data they gathered in the past will produce flawless predictions for future data. If the past data are good, then the outcome will also be good in the future. That line of thinking received a major wake-up call recently when an MIT study found that the 10 most-cited data sets were riddled with label errors (in the training dataset, a picture of a dog is labeled as a cat, for example). These data sets form the foundation of how many AI systems are built and tested, so pervasive errors could mean that AI isn’t as advanced as we may think. After all, if AI can’t tell the difference between a mushroom and a spoon, or between the sound of Ariana Grande hitting a high note and a whistle (as the MIT study found and this MIT Tech Review article denotes), then why should we trust it to make decisions about our health or to drive our cars? The knee-jerk response from academia has been to refocus on cleaning up these benchmark data sets. We can continue to obsess over creating clean data for AI to learn from in a sterile environment, or we can put AI in the real world and watch it grow. Currently, AI is like a mouse raised to thrive in a lab: If it’s let loose into a crowded, polluted city, its chances for surviving are pretty slim. Every AI Will Always Be Wrong Because AI started in academia, it suffers from a fundamental problem of that environment, which is the drive to control how things are tested. This, of course, becomes a problem when academia meets the real world, where conditions are anything but controlled. Tellingly, AI’s relative success in an academic setting has begun to work against it as businesses adopt it Read More …