BBC probe finds AI chatbots mangle nearly half of news summaries
Summary
A large BBC-led study for the European Broadcasting Union analysed more than 3,000 answers from four popular AI assistants (OpenAI ChatGPT, Microsoft Copilot, Google Gemini and Perplexity) and found widespread problems with AI-produced news summaries.
The investigation discovered that 45% of responses contained at least one significant issue, 31% had serious sourcing problems and one in five had major accuracy faults (hallucinations or outdated facts). Counting smaller slip-ups, 81% of replies included some kind of mistake. Google Gemini performed worst, with significant issues in 76% of its responses, largely due to poor sourcing.
Key Points
- The study reviewed 3,000+ AI responses and found 45% had significant problems; 81% contained some error.
- Google Gemini was the worst performer: 76% of its responses had significant issues and 72% had sourcing inaccuracies.
- Common faults included hallucinated facts, outdated information and weak or missing attribution to primary sources.
- Concrete examples: ChatGPT continued to state Pope Francis was alive weeks after his death; Gemini denied historical ISS incidents.
- An Ipsos survey alongside the report found 42% of UK adults trust AI for news summaries (50% for under-35s), but 84% said a factual error would severely damage trust.
- The BBC/EBU published a toolkit aimed at developers and newsrooms to improve how assistants handle news and to reduce bluffing when the model lacks confidence.
Content Summary
The BBC-led probe — the largest of its kind and involving 22 public-service media organisations from 18 countries — shows AI assistants routinely misrepresent news and struggle with sourcing. Gemini’s errors were notably high; other systems also produced hallucinations and out-of-date claims.
The report links these issues to design incentives that encourage confident-sounding answers even when models are unsure. The study is accompanied by a practical toolkit for developers and news organisations to improve accuracy, sourcing and transparency in AI news outputs.
The findings arrive amid growing consumer use of chatbots and recent examples of real-world harm from hallucinations (for example, fabricated legal citations in court filings). The report warns that repeated errors will erode public trust in both AI and journalism.
Context and Relevance
This is important because AI assistants are increasingly used by the public as a quick news source. The study highlights a gap between user trust and system reliability: many people lean on AI summaries, but the technology still frequently misreports or fails to cite sources properly.
For newsrooms, developers and policymakers the implications are clear: improved source attribution, better uncertainty signalling, and tooling to prevent overconfident hallucinations are urgent. The study also strengthens calls for transparency and safer design incentives in large-language models, and it underscores reputational and democratic risks if misleading AI outputs go unaddressed.
Author style
Punchy: this report is a wake-up call. It’s not a niche tech annoyance — it’s a large, multi-country audit showing mainstream AI assistants getting basic news wrong often enough to matter. If you work in media, product or trust & safety, read the toolkit and the detailed findings; they contain actionable diagnostics you’ll want to know about.
Why should I read this?
Short version: if you or your audience use AI to skim the news, this matters. These bots sound confident but they get key facts and sources wrong a lot. Read this to avoid being misled, to understand which assistants are worse, and to pick up practical steps the BBC/EBU suggest for making AI news summaries less rubbish.
Source
Source: https://go.theregister.com/feed/www.theregister.com/2025/10/24/bbc_probe_ai_news/
