Your digital trash Is AI’s next gold rush: The coming battle for 'Exhaust Data'

Forgotten bug reports. Awkward email drafts. Chaotic Slack threads. These aren’t just the digital clutter of our lives. They’re going to be the gold veins that power Big AI’s next gold rush.

Consider this — large language models have already been trained on most of the available data on the internet, and fervent media articles and AI CEOs lament the upcoming shortage of training data. Meanwhile, we’re drowning in our own digital detritus: half baked emails, slack DMs, random notes, and document drafts. This data is more valuable than ever because it reflects the unpredictability of the real world, and yet, ironically, has never seen the light of a training set.

This so called “exhaust data” isn’t glamorous, but it’s cheap, unfiltered and full of real world nuance.

Exactly what an AI model needs to understand how to tackle the edge cases. While curated datasets can give us 90% of what needs to be done, that last 10% will come from understanding how humans try, fail, and iterate at their tasks. Midnight bug reports and chaotic email drafts aren’t just noise; they’re the edge-case scenarios AI needs to become truly adaptive.

But there’s an uncomfortable truth lurking here. Who owns this data? Is it you, the creator? Is it your employer? Or the tool that you created this data with? Big AI and scrappy startups are both in the race to get to this data first. They’re not waiting to ask for permission, and they’re certainly not going to compensate you for it.

The upcoming gold rush is not going to be around better algorithms. Those gains are incremental. What will define the winners is access to the staggering volumes of exhaust data to fine-tune models. As models like DeepSeek R1 challenge the dominance of American AI, data, not innovation will be where the next battles will be fought.

Our digital leftovers are on the brink of becoming indispensable. But as your discarded files become someone else’s treasure, we’re forced to confront a sobering reality: in a world driven by surveillance and commodified data, how much control are we willing to give up over even our most mundane moments?

--

If you have any questions or thoughts, don't hesitate to reach out. You can find me as @viksit on Twitter.