AI-generated content isn't just filling inboxes. It's quietly corrupting the models organizations are counting on to run their businesses. And for many, the bill for that mistake is already coming due.
InformationWeek turned to Dan Ivtsan, Steno's Senior Director of AI Products, among other industry experts, to unpack synthetic data poisoning: the self-inflicted risk that emerges when enterprises train their next-generation AI models on AI-generated outputs.
As organizations race to deploy AI across workflows, they are flooding their internal databases with AI-generated summaries, emails, code, and reports. That content then gets ingested back into training pipelines, and the cycle quietly compounds from there.
Dan points to what makes the problem so hard to catch: the model keeps sounding good long after accuracy has eroded. As he explains in the piece, the damage hides in plain sight. "The insidious part is that fluency survives while factual accuracy crumbles, so standard benchmarks miss it entirely,” he says.
That degradation isn't abstract. It flattens the nuanced, rare institutional knowledge that lives in the edges of your data and is hardest to recover once lost. In legal AI specifically, that erosion has direct professional consequences.
"That drift can mean hallucinated citations or incorrect medical timelines," Dan notes. "That's real malpractice exposure." His prescription: never let synthetic data replace real data. Accumulate them alongside each other, or risk a collapse that requires retraining from scratch to fix.
Read the full piece from InformationWeek