Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
AI News

AI Summaries Turn Real News Into Nonsense, BBC Finds 47

A BBC study published yesterday (PDF) found that AI news summarization tools frequently generate inaccurate or misleading summaries, with 51% of responses containing significant issues. The Register reports: The research focused on OpenAI's ChatGPT, Microsoft's Copilot, Google's Gemini, and Perplexity assistants, assessing their ability to provide "accurate responses to questions about the news; and if their answers faithfully represented BBC news stories used as sources." The assistants were granted access to the BBC website for the duration of the research and asked 100 questions about the news, being prompted to draw from BBC News articles as sources where possible. Normally, these models are "blocked" from accessing the broadcaster's websites, the BBC said. Responses were reviewed by BBC journalists, "all experts in the question topics," on their accuracy, impartiality, and how well they represented BBC content. Overall:

- 51 percent of all AI answers to questions about the news were judged to have significant issues of some form.
- 19 percent of AI answers which cited BBC content introduced factual errors -- incorrect factual statements, numbers, and dates.
- 13 percent of the quotes sourced from BBC articles were either altered from the original source or not present in the article cited.

But which chatbot performed worst? "34 percent of Gemini, 27 percent of Copilot, 17 percent of Perplexity, and 15 percent of ChatGPT responses were judged to have significant issues with how they represented the BBC content used as a source," the Beeb reported. "The most common problems were factual inaccuracies, sourcing, and missing context." [...] In an accompanying blog post, BBC News and Current Affairs CEO Deborah Turness wrote: "The price of AI's extraordinary benefits must not be a world where people searching for answers are served distorted, defective content that presents itself as fact. In what can feel like a chaotic world, it surely cannot be right that consumers seeking clarity are met with yet more confusion.

"It's not hard to see how quickly AI's distortion could undermine people's already fragile faith in facts and verified information. We live in troubled times, and how long will it be before an AI-distorted headline causes significant real world harm? The companies developing Gen AI tools are playing with fire." Training cutoff dates for various models certainly don't help, yet the research lays bare the weaknesses of generative AI in summarizing content. Even with direct access to the information they are being asked about, these assistants still regularly pull "facts" from thin air.

AI Summaries Turn Real News Into Nonsense, BBC Finds

Comments Filter:
  • Bouy Moure Nvudia Chuips! Thuis Spuellchueck Brot Tu Yu Bui ChuatGuPT
  • Nonsense in, nonsense out.
    • Also AI has extended this to include "quality in, nonsense out'.

    • Journalists have been converting news into nonsense for a long time. It's gotten so bad people get their news from comedians, or from random strangers on social media. I'd say LLMs have some very good theoretical potential as a news aggregator, once they and their training set are open source.

  • by nightflameauto ( 6607976 ) on Wednesday February 12, 2025 @05:44PM (#65162367)

    Seriously? Isn't this like the sixth or seventh different entity to figure this out? This shouldn't be something that's surprising at this point. The only surprising part is that all of these companies have to discover this separately from each other. Even more surprising since most anybody paying attention has noticed current gen AI can't really be relied on to be telling the truth without "hallucinating" more than the average coinflip can be expected to come up heads.

    • by evanh ( 627108 )

      Yet, the LLM companies carry on like there's no problem. Don't be surprised if these "finding" keep repeating while they keep deploying without fixing.

      In fact there is an article about the vendors saying not to worry about the problems and just deploy en-mass anyway - https://slashdot.org/story/25/... [slashdot.org]

    • Have they compared it to the output from the average human? I've seen plenty of human generated summaries over the years that I'd also classify as nonsense. Writing a good summary can be difficult because it requires understanding what's important and what isn't. It's not difficult to see why an LLM would be bad at this kind of problem, and why even many humans would struggle to do a good job.
      • by evanh ( 627108 )

        If the tool doesn't work then it has no use.

      • by narcc ( 412956 )

        The whole "humans make mistakes so it's okay if AI does as well" thing makes me crazy. The nature and kind of mistakes that humans make are very, very, different from the kinds of "mistakes" LLMs make. Humans, for example, don't accidentally fabricate people and events when summarizing a document. Humans won't accidentally produce a fake summary when they lack access to the source material. LLMs will produce "summaries" of things that they can't access, have never accessed, or that never existed.

        Yes, mak

    • by Darinbob ( 1142669 ) on Wednesday February 12, 2025 @06:46PM (#65162489)

      AI isn't being trained to tell the truth. "Hallucination" is the wrong word, because it's doing exactly what it has been trained to do : create output sentences that probabalistically follow on from the input sentences.

      The AI makers response to the obvious shortcomings is merely to ask for more funding.

      • "hallucination"... "thinking"... with respect to LLMs, those are sales terms... eventually we will have AIs truly capable of thinking and hallucinating - but for now its just lies.
    • by test321 ( 8891681 ) on Wednesday February 12, 2025 @07:04PM (#65162515)

      Yes this is important to performed by different independent groups to provide concurring results:
      1) It's how science works; we need different groups to agree so a consensus appears;
      2) Such studies must be performed in each country, and even multiple times in each country, for the necessary public debate to happen;
      3) This technology evolves quickly, meaning such studies must be repeated frequently;
      4) Even "Duh!" science like "eating lots makes you fat" must be continuously performed to stay in the focus in the public debate and avoid that corporate interests revert the benefits.

      Whether YOU a slashdot reader already knew it does not matter as much as the BRITISH PREMIER hearing of it because this news article from the BBC was widely spoken around his office.

      Besides, this particular study is not a simple repeat, it provides quantitative indicators which can already be used for least-worst choices.

    • by narcc ( 412956 )

      This shouldn't be something that's surprising at this point.

      It shouldn't have been surprising ever. Any AI company claiming their LLM could reliably produce accurate summaries wasn't merely mistaken, they were committing outright fraud. There was no reason to believe that the "problems" with AI summaries could be resolved at all [arxiv.org], let alone resolved quickly. These criminals push a lot of nonsense designed to intentionally mislead consumers, from the chat-style interface to get you to anthropomorphize the system to the term 'hallucination', intended to make users

  • Remove Click Bait (Score:4, Insightful)

    by crow ( 16139 ) on Wednesday February 12, 2025 @05:45PM (#65162371) Homepage Journal

    If AI can just rewrite headlines to be informative instead of click-bait, then I'm happy. That's the biggest pain point I have for news aggregation sites right now.

    • How many clickbait headlines lead to informative articles in the first place? Having something that just filters that crap out entirely ensures that the people who employ such tactics aren't rewarded with a click and gives clicks to those who don't. I'm sure that you and most others already do this to some extent, but an AI that could do it for me would be even better as I don't even waste time scanning through the list of crap headlines to weed out the clickbait.
    • I'd appreciate it if the clickbait headlines were actually news. Usually by the time it reaches the Google newsfeed, it's quite stale.

      What's the point of alerting me to a huge blizzard five days ago that turned out to be just a dusting. Artificial intelligence certainly had the A down.
  • by fahrbot-bot ( 874524 ) on Wednesday February 12, 2025 @05:51PM (#65162379)

    AI Summaries Turn Real News Into Nonsense, BBC Finds

    That would explain some "News" channels ... :-)

    • Well, right. This study actually misses the mark in a few ways.

      First, if you wanted this to be informative, you would at least use AI and human-generated responses in equal numbers and do blind comparison, instead of just generating some percentages, with the built-in assumption that perfection is to be expected.

      Second, this kind of thing is largely about methods - the expertise and the motives of the people doing the evaluation influence the outcome. If you set out to debunk the most simplistic metho

    • If it makes real news into nonsense, does it make fake news into sense?

  • Ok, now imagine what nonsense is finding its way into code bases by people who swear by the AI assistants.

    The top answer in a google search, even prior to the recently worse returns, is not always the correct way to do something. It is good, and it looks like a good answer, but given the circumstances, not hard for there to be a more appropriate answer.

    Or am I just an old fuddy duddy?

    • Agreed in general. OTOH google was the worst tested.

    • by Hodr ( 219920 )

      You're a fuddy duddy.

      There are a lot of things that even the best AI models are still not very good at and this article is a good example of one such area, though I would argue against their choice of models. Coding isn't one of them.

      Current models (I.E. Claude Sonnet 3.5) already code better than 99% of software developers. Once they catch up in architecting, and to be fair some like DeepSeek R1 are already pretty good, it's game over.

      You'll end up with one human dev with a team of AI agents doing the wo

      • Put that in writing then we'll see. Remember, 99% of programmers are crap. Therefore AI currently is still on the border of crap vs non-crap.

        Where is the quality training data that this AI is trained on? It can't be the internet because the internet is enshittified. There are no quality coding sites out there. Unless you're a low-code/no-code sort of programmer.

      • Have you heard of a people who call themselves lawyers? When there are conflicts these lawyers got to places called courts and have trials based on laws.

        So imagine some moron who thinks that AI is "better than 99% of software developers" makes an AI generated product that harms a large number of people's property or physical self. That whole courts/laws/trial thing exists to punish that bad actor, compensate the victims and hopefully keep these kinds of events from reoccurring. It can get real expensive a

      • The LLM has to be able to understand the codebase. Otherwise it will write "small brain" code that misses all of the things we know about how to design software. I've only used Copilot so far. My experience is that it works amazing for toy projects and terribly for a large complex codebase.

        But is a great way to copy code without explicitly copying code. For example, if I want to lower case unicode: I could write my own collection of folders for each code point; I could copy someone's code for it; I c
    • AI should never be used without humans on the back end verifying it all. This means hat if you want it to work, AI actually increases the numbers of people you need to hire, rather than reducing head count. Reducing head count anyway is an admission that quality is not job one, or two, or even on the list.

    • Part of my labor last year was rewriting a huge pile of steaming **** that some junior devs created, heavily assisted by Copilot. They made no attempts to create hierarchical/re-useable code, instead it was just a steaming pile of AI assisted copy pasta. I went as far as to get one of them fired for it, they had done such a bad job. I would love to get their manager fired also, as it is his fault for allowing it to happen.

      This was a C++17 codebase from an existing product and they were just adding a ne
  • Based on last year's election coverage it couldn't possibly get worse. I watched major news outlets tell me that Donald Trump swaying awkwardly to his iPod playlist for 40 minutes instead of answering questions at a town hall was not in fact a 78-year-old man sundowning but was just a brilliant political maneuver. I don't mean like your Fox News and your other obvious propaganda outlets I'm talking things like the associate press, The New York Times and Newsweek and even the Washington Post got in on it.
  • We must all suffer while AI digests the world. And considering what it's trained on, is it any wonder it's just regurgitating garbage?

    Although to be fair, we use it at work to boil down meeting transcripts and other blobs of text, convert text to diagrams, diagrams to text, etc etc and I have to admit that a lot of it works like magic.

    Having it work within narrow silos of tasks and information seems to be key. Done properly it saves hours of time and produces perfectly usable output, often better than those

    • AI is not just regurgitating garbage, it is creating new garbage.

      Microsoft Copilot just told me with total confidence all about the wonderful parish church St Andrews in my village. But ...

      That parish church has been here for 800 years and has NEVER been dedicated to St Andrew. We have no idea where this notion came from.

      There is a special place in Hell for the people who trained Copilot.
  • by Big Hairy Gorilla ( 9839972 ) on Wednesday February 12, 2025 @06:01PM (#65162401)
    Trust no one. Believe nothing without verification. This is the attention economy. Grab someone with a salacious headline, serve the ad.
    I see the current world as a massive tourist trap. Everyone is selling the same crap, but the first one who convinces you gets the business.
  • Please magic oracle, feed me bullshit. I don't wanna have to read. Or think.

  • Maybe even trained on quite a few summary/story pairs too ... poor models.
  • ⦠that most news stories are without content.

  • Grok is big enough now that it should be included in this type of study. I'm curious whether it has the same kind of hallucinations or gullibility as the other engines.

  • #1: Old media is suing OpenAi and Microsoft. #2: They are not suing Google. #3: Google gave old media $100m in kickbacks (just to nytimes) So surprise surprise surprise they find a study that bangs on OpenAi. The "study" is deeply flawed. #1: They say they "let robots" through? OpenAI crawls sporadically and back fills with OpenCrawl data. #2: BBC's current robots.txt blocks all OpenAI (and has for some time) #3: Nobody, but nobody is using a LLM int he manner that the BBC is testing it with... What happe
  • I've complained to local media for years about how their headlines were misleading or wrong. They rarely care. AI is just doing the same, except AI doesn't really understand, whereas people should.

  • For AI research I built a small app that distills raw weather reports into a tropical forecast summary. (https://github.com/mukunda-/stormbot)

    However, while testing, sometimes the summary would not include a reported hurricane. I can't imagine how the word "hurricane" wouldn't be strong enough to take first place in the token summary soup, so the lesson learned here is to not trust an AI summary to capture all critical details.

  • Here is one from a few days ago [slashdot.org], the "have deformed by 100 million or more in height in places" there was not what the original BBC article said.

Maybe Computer Science should be in the College of Theology. -- R. S. Barton

Working...