What the machines still can't do is tell my reporting from the stuff that copied it

Three studies landed in the same week, and together they make the commercial case for verified human reporting better than any pitch deck could.

May 28, 2026

selective focus photography of people sitting on chairs while writing on notebooks — Photo by The Climate Reality Project on Unsplash

Google’s Gemini, handed a 24/7 radio station to run, read out the death toll from the 1970 Bhola Cyclone (somewhere around 500,000 people) and then, with the chirpy continuity of a breakfast DJ, cued up Pitbull and Ke$ha’s “Timber”.

That came out of a five-month experiment by Andon Labs, which set several leading models loose on the same job. Gemini at least produced something, which is more than can be said for the others. Anthropic’s Claude declined to broadcast at all, having decided the whole exercise was ethically dubious, and xAI’s Grok couldn’t reliably get itself on air.

Thanks for reading! This post is public so feel free to share it.

Most coverage filed this under ‘AI does something daft.’ The more useful read, though, is that while the models have the facts about the cyclone, they have no sense of what those facts mean.

I keep coming back to it because it lands squarely on the thing I’ve spent over a decade doing for a living. I’ve watched the ground move under working journalists before, so this is familiar territory. The web is filling up with text that sounds plausible but feels weightless. What can’t be generated is tested experience and verified judgment, and that’s still the part that editors and brands should be willing to pay for.

The attribution problem is the same thing. The Tow Centre at Columbia ran a controlled version of it: eight leading AI search products, 200 news articles, and one question: who reported this first? They got it wrong more than 60% of the time, misattributing quotes and, in a fair few cases, fabricating dead links to nonexistent sources. NewsGuard’s findings sit neatly alongside it, with the top ten generative models repeating laundered Russian disinformation claims about a third of the time, treating washed content as authoritative because, on the surface, it reads exactly like the real thing.

Nieman Lab’s review of the field put ChatGPT at the bottom of the pile for failing to credit the outlets it draws from, which, given its market lead, is probably the part publishers should find most galling.

The machines aren’t sorting reporting from the stuff that copied the reporting, because on the page, there isn’t a difference. A laundered claim and a verified one have the same texture once the byline is gone, and a model tuned for fluency will pick whichever sounds more confident.

You can see this play out in the referral numbers. BrightEdge had ChatGPT’s share of AI referrals slipping to 81.4% in the first quarter of 2026, down from 89.2%. Gemini climbed to 13.2% by April, and Claude more than doubled its share to 3.6%. The league table will look different by the time you read this. What matters is that the automated web is already the ecosystem, not something on the horizon. The more readers get fed confident nonsense and dead links, the more a verified answer is actually worth.

This is where the work comes in. The Oura Ring 5 looks like a win with its slimmer profile, until you see that it comes with a portable charging case. A charging case usually means a smaller internal cell and more frequent top-ups in daily use, not fewer. A model can tell you the ring is thinner. It can’t tell you that thinner is a compromise you’ll notice on a Wednesday morning when the thing is flat, and the overnight data was the entire reason you bought it. As someone with a sleep-disrupting chronic illness, a sleep tracker that can’t survive a night is just expensive jewellery.

A scraper struggles to extract nuanced facts. Each one is a judgment, applied to something you’ve actually sat with.

It shows up in B2B too, where the stakes are duller yet bigger. Right-to-work compliance in the gig economy looks like a line item on paper, but in practice, it’s a fiscal risk that lands on a business long after the headlines have moved on. Working out which it is for a given company takes someone willing to read the regulation and call a few people. That’s the unglamorous version of the same argument: the reporting and the copy look identical until one of them turns out to be wrong in a way that costs someone real money.

So I’m not precious about the rest of it. The models are genuinely good at gathering, and pretending otherwise would be its own kind of dishonesty. I used Gemini to research this piece (hard to avoid nowadays), and Grammarly had a fiddle around with some sentences. But what they can’t do is decide why the charging case is in the box, or hold the line when something is overhyped, and the marketing says otherwise, or register that you don’t play a dance song after ‘half a million dead’.

I'm a freelance journalist and content strategist, available for commissions and content-strategy projects across smart home, homes, lifestyle, consumer tech, B2B and entertainment. carolinepreece89@gmail.com

Plug & Play

Discussion about this post

Ready for more?