![AI AI](http://a.fsdn.com/sd/topics/ai_64.png)
![News News](http://a.fsdn.com/sd/topics/news_64.png)
AI Summaries Turn Real News Into Nonsense, BBC Finds 39
A BBC study published yesterday (PDF) found that AI news summarization tools frequently generate inaccurate or misleading summaries, with 51% of responses containing significant issues. The Register reports: The research focused on OpenAI's ChatGPT, Microsoft's Copilot, Google's Gemini, and Perplexity assistants, assessing their ability to provide "accurate responses to questions about the news; and if their answers faithfully represented BBC news stories used as sources." The assistants were granted access to the BBC website for the duration of the research and asked 100 questions about the news, being prompted to draw from BBC News articles as sources where possible. Normally, these models are "blocked" from accessing the broadcaster's websites, the BBC said. Responses were reviewed by BBC journalists, "all experts in the question topics," on their accuracy, impartiality, and how well they represented BBC content. Overall:
- 51 percent of all AI answers to questions about the news were judged to have significant issues of some form.
- 19 percent of AI answers which cited BBC content introduced factual errors -- incorrect factual statements, numbers, and dates.
- 13 percent of the quotes sourced from BBC articles were either altered from the original source or not present in the article cited.
But which chatbot performed worst? "34 percent of Gemini, 27 percent of Copilot, 17 percent of Perplexity, and 15 percent of ChatGPT responses were judged to have significant issues with how they represented the BBC content used as a source," the Beeb reported. "The most common problems were factual inaccuracies, sourcing, and missing context." [...] In an accompanying blog post, BBC News and Current Affairs CEO Deborah Turness wrote: "The price of AI's extraordinary benefits must not be a world where people searching for answers are served distorted, defective content that presents itself as fact. In what can feel like a chaotic world, it surely cannot be right that consumers seeking clarity are met with yet more confusion.
"It's not hard to see how quickly AI's distortion could undermine people's already fragile faith in facts and verified information. We live in troubled times, and how long will it be before an AI-distorted headline causes significant real world harm? The companies developing Gen AI tools are playing with fire." Training cutoff dates for various models certainly don't help, yet the research lays bare the weaknesses of generative AI in summarizing content. Even with direct access to the information they are being asked about, these assistants still regularly pull "facts" from thin air.
- 51 percent of all AI answers to questions about the news were judged to have significant issues of some form.
- 19 percent of AI answers which cited BBC content introduced factual errors -- incorrect factual statements, numbers, and dates.
- 13 percent of the quotes sourced from BBC articles were either altered from the original source or not present in the article cited.
But which chatbot performed worst? "34 percent of Gemini, 27 percent of Copilot, 17 percent of Perplexity, and 15 percent of ChatGPT responses were judged to have significant issues with how they represented the BBC content used as a source," the Beeb reported. "The most common problems were factual inaccuracies, sourcing, and missing context." [...] In an accompanying blog post, BBC News and Current Affairs CEO Deborah Turness wrote: "The price of AI's extraordinary benefits must not be a world where people searching for answers are served distorted, defective content that presents itself as fact. In what can feel like a chaotic world, it surely cannot be right that consumers seeking clarity are met with yet more confusion.
"It's not hard to see how quickly AI's distortion could undermine people's already fragile faith in facts and verified information. We live in troubled times, and how long will it be before an AI-distorted headline causes significant real world harm? The companies developing Gen AI tools are playing with fire." Training cutoff dates for various models certainly don't help, yet the research lays bare the weaknesses of generative AI in summarizing content. Even with direct access to the information they are being asked about, these assistants still regularly pull "facts" from thin air.
Quock- (Score:2)
Makes sense (Score:2)
Re: (Score:3)
Also AI has extended this to include "quality in, nonsense out'.
How many times do we need to discover this? (Score:3)
Seriously? Isn't this like the sixth or seventh different entity to figure this out? This shouldn't be something that's surprising at this point. The only surprising part is that all of these companies have to discover this separately from each other. Even more surprising since most anybody paying attention has noticed current gen AI can't really be relied on to be telling the truth without "hallucinating" more than the average coinflip can be expected to come up heads.
Re: (Score:2)
Yet, the LLM companies carry on like there's no problem. Don't be surprised if these "finding" keep repeating while they keep deploying without fixing.
In fact there is an article about the vendors saying not to worry about the problems and just deploy en-mass anyway - https://slashdot.org/story/25/... [slashdot.org]
Re: (Score:2)
Re: (Score:2)
If the tool doesn't work then it has no use.
Re: (Score:2)
The whole "humans make mistakes so it's okay if AI does as well" thing makes me crazy. The nature and kind of mistakes that humans make are very, very, different from the kinds of "mistakes" LLMs make. Humans, for example, don't accidentally fabricate people and events when summarizing a document. Humans won't accidentally produce a fake summary when they lack access to the source material. LLMs will produce "summaries" of things that they can't access, have never accessed, or that never existed.
Yes, mak
Re:How many times do we need to discover this? (Score:4, Insightful)
AI isn't being trained to tell the truth. "Hallucination" is the wrong word, because it's doing exactly what it has been trained to do : create output sentences that probabalistically follow on from the input sentences.
The AI makers response to the obvious shortcomings is merely to ask for more funding.
Re: (Score:1)
Re:How many times do we need to discover this? (Score:5, Insightful)
Yes this is important to performed by different independent groups to provide concurring results:
1) It's how science works; we need different groups to agree so a consensus appears;
2) Such studies must be performed in each country, and even multiple times in each country, for the necessary public debate to happen;
3) This technology evolves quickly, meaning such studies must be repeated frequently;
4) Even "Duh!" science like "eating lots makes you fat" must be continuously performed to stay in the focus in the public debate and avoid that corporate interests revert the benefits.
Whether YOU a slashdot reader already knew it does not matter as much as the BRITISH PREMIER hearing of it because this news article from the BBC was widely spoken around his office.
Besides, this particular study is not a simple repeat, it provides quantitative indicators which can already be used for least-worst choices.
Re: (Score:3)
This shouldn't be something that's surprising at this point.
It shouldn't have been surprising ever. Any AI company claiming their LLM could reliably produce accurate summaries wasn't merely mistaken, they were committing outright fraud. There was no reason to believe that the "problems" with AI summaries could be resolved at all [arxiv.org], let alone resolved quickly. These criminals push a lot of nonsense designed to intentionally mislead consumers, from the chat-style interface to get you to anthropomorphize the system to the term 'hallucination', intended to make users
Remove Click Bait (Score:4, Insightful)
If AI can just rewrite headlines to be informative instead of click-bait, then I'm happy. That's the biggest pain point I have for news aggregation sites right now.
Re: (Score:2)
Re: Remove Click Bait (Score:2)
What's the point of alerting me to a huge blizzard five days ago that turned out to be just a dusting. Artificial intelligence certainly had the A down.
Whelp (Score:3)
AI Summaries Turn Real News Into Nonsense, BBC Finds
That would explain some "News" channels ... :-)
Re: (Score:2)
First, if you wanted this to be informative, you would at least use AI and human-generated responses in equal numbers and do blind comparison, instead of just generating some percentages, with the built-in assumption that perfection is to be expected.
Second, this kind of thing is largely about methods - the expertise and the motives of the people doing the evaluation influence the outcome. If you set out to debunk the most simplistic metho
Can we extend this? (Score:2)
Ok, now imagine what nonsense is finding its way into code bases by people who swear by the AI assistants.
The top answer in a google search, even prior to the recently worse returns, is not always the correct way to do something. It is good, and it looks like a good answer, but given the circumstances, not hard for there to be a more appropriate answer.
Or am I just an old fuddy duddy?
Re: (Score:2)
Agreed in general. OTOH google was the worst tested.
Re: (Score:2)
You're a fuddy duddy.
There are a lot of things that even the best AI models are still not very good at and this article is a good example of one such area, though I would argue against their choice of models. Coding isn't one of them.
Current models (I.E. Claude Sonnet 3.5) already code better than 99% of software developers. Once they catch up in architecting, and to be fair some like DeepSeek R1 are already pretty good, it's game over.
You'll end up with one human dev with a team of AI agents doing the wo
Re: (Score:2)
Put that in writing then we'll see. Remember, 99% of programmers are crap. Therefore AI currently is still on the border of crap vs non-crap.
Where is the quality training data that this AI is trained on? It can't be the internet because the internet is enshittified. There are no quality coding sites out there. Unless you're a low-code/no-code sort of programmer.
Re: (Score:2)
So imagine some moron who thinks that AI is "better than 99% of software developers" makes an AI generated product that harms a large number of people's property or physical self. That whole courts/laws/trial thing exists to punish that bad actor, compensate the victims and hopefully keep these kinds of events from reoccurring. It can get real expensive a
Re: (Score:2)
AI should never be used without humans on the back end verifying it all. This means hat if you want it to work, AI actually increases the numbers of people you need to hire, rather than reducing head count. Reducing head count anyway is an admission that quality is not job one, or two, or even on the list.
You know what? (Score:2, Troll)
We must all suffer (Score:2)
We must all suffer while AI digests the world. And considering what it's trained on, is it any wonder it's just regurgitating garbage?
Although to be fair, we use it at work to boil down meeting transcripts and other blobs of text, convert text to diagrams, diagrams to text, etc etc and I have to admit that a lot of it works like magic.
Having it work within narrow silos of tasks and information seems to be key. Done properly it saves hours of time and produces perfectly usable output, often better than those
Re: (Score:3)
Microsoft Copilot just told me with total confidence all about the wonderful parish church St Andrews in my village. But
That parish church has been here for 800 years and has NEVER been dedicated to St Andrew. We have no idea where this notion came from.
There is a special place in Hell for the people who trained Copilot.
dead internet theory spreads to real world (Score:3)
I see the current world as a massive tourist trap. Everyone is selling the same crap, but the first one who convinces you gets the business.
Magic oracle (Score:2)
Please magic oracle, feed me bullshit. I don't wanna have to read. Or think.
Just like Slashdot summaries (Score:2)
AI summaries demonstrate ⦠(Score:2)
⦠that most news stories are without content.
What about Grok? (Score:2)
Grok is big enough now that it should be included in this type of study. I'm curious whether it has the same kind of hallucinations or gullibility as the other engines.