Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
AI News

AI Summaries Turn Real News Into Nonsense, BBC Finds 68

A BBC study published yesterday (PDF) found that AI news summarization tools frequently generate inaccurate or misleading summaries, with 51% of responses containing significant issues. The Register reports: The research focused on OpenAI's ChatGPT, Microsoft's Copilot, Google's Gemini, and Perplexity assistants, assessing their ability to provide "accurate responses to questions about the news; and if their answers faithfully represented BBC news stories used as sources." The assistants were granted access to the BBC website for the duration of the research and asked 100 questions about the news, being prompted to draw from BBC News articles as sources where possible. Normally, these models are "blocked" from accessing the broadcaster's websites, the BBC said. Responses were reviewed by BBC journalists, "all experts in the question topics," on their accuracy, impartiality, and how well they represented BBC content. Overall:

- 51 percent of all AI answers to questions about the news were judged to have significant issues of some form.
- 19 percent of AI answers which cited BBC content introduced factual errors -- incorrect factual statements, numbers, and dates.
- 13 percent of the quotes sourced from BBC articles were either altered from the original source or not present in the article cited.

But which chatbot performed worst? "34 percent of Gemini, 27 percent of Copilot, 17 percent of Perplexity, and 15 percent of ChatGPT responses were judged to have significant issues with how they represented the BBC content used as a source," the Beeb reported. "The most common problems were factual inaccuracies, sourcing, and missing context." [...] In an accompanying blog post, BBC News and Current Affairs CEO Deborah Turness wrote: "The price of AI's extraordinary benefits must not be a world where people searching for answers are served distorted, defective content that presents itself as fact. In what can feel like a chaotic world, it surely cannot be right that consumers seeking clarity are met with yet more confusion.

"It's not hard to see how quickly AI's distortion could undermine people's already fragile faith in facts and verified information. We live in troubled times, and how long will it be before an AI-distorted headline causes significant real world harm? The companies developing Gen AI tools are playing with fire." Training cutoff dates for various models certainly don't help, yet the research lays bare the weaknesses of generative AI in summarizing content. Even with direct access to the information they are being asked about, these assistants still regularly pull "facts" from thin air.
This discussion has been archived. No new comments can be posted.

AI Summaries Turn Real News Into Nonsense, BBC Finds

Comments Filter:
  • Quock- (Score:4, Funny)

    by locater16 ( 2326718 ) on Wednesday February 12, 2025 @06:32PM (#65162337)
    Bouy Moure Nvudia Chuips! Thuis Spuellchueck Brot Tu Yu Bui ChuatGuPT
  • Nonsense in, nonsense out.
    • Comment removed (Score:5, Insightful)

      by account_deleted ( 4530225 ) on Wednesday February 12, 2025 @07:42PM (#65162485)
      Comment removed based on user account deletion
    • Re: (Score:2, Insightful)

      by penguinoid ( 724646 )

      Journalists have been converting news into nonsense for a long time. It's gotten so bad people get their news from comedians, or from random strangers on social media. I'd say LLMs have some very good theoretical potential as a news aggregator, once they and their training set are open source.

    • Google's AI results are alarmingly bad. Frequently I will see its summary be contradicted by the very first link, or worse, contradict itself in the second paragraph.
  • by nightflameauto ( 6607976 ) on Wednesday February 12, 2025 @06:44PM (#65162367)

    Seriously? Isn't this like the sixth or seventh different entity to figure this out? This shouldn't be something that's surprising at this point. The only surprising part is that all of these companies have to discover this separately from each other. Even more surprising since most anybody paying attention has noticed current gen AI can't really be relied on to be telling the truth without "hallucinating" more than the average coinflip can be expected to come up heads.

    • by evanh ( 627108 )

      Yet, the LLM companies carry on like there's no problem. Don't be surprised if these "finding" keep repeating while they keep deploying without fixing.

      In fact there is an article about the vendors saying not to worry about the problems and just deploy en-mass anyway - https://slashdot.org/story/25/... [slashdot.org]

      • by Anonymous Coward

        They can't "fix" it. They don't even understand what the LLMs are doing.

    • Have they compared it to the output from the average human? I've seen plenty of human generated summaries over the years that I'd also classify as nonsense. Writing a good summary can be difficult because it requires understanding what's important and what isn't. It's not difficult to see why an LLM would be bad at this kind of problem, and why even many humans would struggle to do a good job.
      • by evanh ( 627108 )

        If the tool doesn't work then it has no use.

        • by gweihir ( 88907 )

          Not quite correct: It can (a) be used to extract money from the clueless. This seems to be going very well indeed for LLMs. And (b) it can be used to sabotage things.

      • by narcc ( 412956 )

        The whole "humans make mistakes so it's okay if AI does as well" thing makes me crazy. The nature and kind of mistakes that humans make are very, very, different from the kinds of "mistakes" LLMs make. Humans, for example, don't accidentally fabricate people and events when summarizing a document. Humans won't accidentally produce a fake summary when they lack access to the source material. LLMs will produce "summaries" of things that they can't access, have never accessed, or that never existed.

        Yes, mak

      • This is why I pay for news from humans who know how to summarize. I wouldn't even consider getting my news summaries from the average human. Why would I? That's a completely witless notion.

    • Comment removed (Score:5, Insightful)

      by account_deleted ( 4530225 ) on Wednesday February 12, 2025 @07:46PM (#65162489)
      Comment removed based on user account deletion
      • "hallucination"... "thinking"... with respect to LLMs, those are sales terms... eventually we will have AIs truly capable of thinking and hallucinating - but for now its just lies.
    • by test321 ( 8891681 ) on Wednesday February 12, 2025 @08:04PM (#65162515)

      Yes this is important to performed by different independent groups to provide concurring results:
      1) It's how science works; we need different groups to agree so a consensus appears;
      2) Such studies must be performed in each country, and even multiple times in each country, for the necessary public debate to happen;
      3) This technology evolves quickly, meaning such studies must be repeated frequently;
      4) Even "Duh!" science like "eating lots makes you fat" must be continuously performed to stay in the focus in the public debate and avoid that corporate interests revert the benefits.

      Whether YOU a slashdot reader already knew it does not matter as much as the BRITISH PREMIER hearing of it because this news article from the BBC was widely spoken around his office.

      Besides, this particular study is not a simple repeat, it provides quantitative indicators which can already be used for least-worst choices.

    • by narcc ( 412956 )

      This shouldn't be something that's surprising at this point.

      It shouldn't have been surprising ever. Any AI company claiming their LLM could reliably produce accurate summaries wasn't merely mistaken, they were committing outright fraud. There was no reason to believe that the "problems" with AI summaries could be resolved at all [arxiv.org], let alone resolved quickly. These criminals push a lot of nonsense designed to intentionally mislead consumers, from the chat-style interface to get you to anthropomorphize the system to the term 'hallucination', intended to make users

    • Figuring something out from anecdotes and having a study which quantifies results and provides data and method so others can test and verify are completely different things. They carry very different weight in informed decision making, and informed policy making.

      "Paying attention" also means very different things to a technology professional with a keen interest in a topic and a policy maker who has to manage hundreds of competing interests. Or even an average citizen doing their job and trying to get throu

    • > Seriously? Isn't this like the sixth or seventh different entity to figure this out?

      In this case, the BBC had it foisted upon them without consultation (and it was several months ago that they first spotted it - I think when Apple first started pushing 'intelligence' on its users).

      The BBC comes in for a lot of flak about its news service. It's definitely tripped up several times, but it generally does try pretty hard to get things right (moreso on the 'big' things). If they've spent a load of time maki

  • Remove Click Bait (Score:4, Insightful)

    by crow ( 16139 ) on Wednesday February 12, 2025 @06:45PM (#65162371) Homepage Journal

    If AI can just rewrite headlines to be informative instead of click-bait, then I'm happy. That's the biggest pain point I have for news aggregation sites right now.

    • How many clickbait headlines lead to informative articles in the first place? Having something that just filters that crap out entirely ensures that the people who employ such tactics aren't rewarded with a click and gives clicks to those who don't. I'm sure that you and most others already do this to some extent, but an AI that could do it for me would be even better as I don't even waste time scanning through the list of crap headlines to weed out the clickbait.
      • How many clickbait headlines lead to informative articles in the first place? Having something that just filters that crap out entirely ensures that the people who employ such tactics aren't rewarded with a click and gives clicks to those who don't. I'm sure that you and most others already do this to some extent, but an AI that could do it for me would be even better as I don't even waste time scanning through the list of crap headlines to weed out the clickbait.

        The conundrum being that the owners of AI *WANT* you to click. They want precisely the opposite of what the end-user wants. As with most of their AI output.

    • I'd appreciate it if the clickbait headlines were actually news. Usually by the time it reaches the Google newsfeed, it's quite stale.

      What's the point of alerting me to a huge blizzard five days ago that turned out to be just a dusting. Artificial intelligence certainly had the A down.
    • Just go directly to good news sources instead of relying on things like social media feeds to find articles. Even something as basic as using Google News will slice out much of the clickbait and provide a much more vetted and informative news flow, and that's really setting the bar low.

    • If AI can just rewrite headlines to be informative instead of click-bait, then I'm happy.

      ...and a pony. If you're going to ask for something implausible, ask for a pony too.

      • If AI can just rewrite headlines to be informative instead of click-bait, then I'm happy.

        ...and a pony. If you're going to ask for something implausible, ask for a pony too.

        Make mine a dragon! I'd much rather have a dragon than a pony. Ponies are common. Dragons can burn and eat people you don't like. Seems a much more appropriate pet for the modern era.

  • by fahrbot-bot ( 874524 ) on Wednesday February 12, 2025 @06:51PM (#65162379)

    AI Summaries Turn Real News Into Nonsense, BBC Finds

    That would explain some "News" channels ... :-)

    • Well, right. This study actually misses the mark in a few ways.

      First, if you wanted this to be informative, you would at least use AI and human-generated responses in equal numbers and do blind comparison, instead of just generating some percentages, with the built-in assumption that perfection is to be expected.

      Second, this kind of thing is largely about methods - the expertise and the motives of the people doing the evaluation influence the outcome. If you set out to debunk the most simplistic metho

    • If it makes real news into nonsense, does it make fake news into sense?

  • Ok, now imagine what nonsense is finding its way into code bases by people who swear by the AI assistants.

    The top answer in a google search, even prior to the recently worse returns, is not always the correct way to do something. It is good, and it looks like a good answer, but given the circumstances, not hard for there to be a more appropriate answer.

    Or am I just an old fuddy duddy?

    • Agreed in general. OTOH google was the worst tested.

    • by Hodr ( 219920 )

      You're a fuddy duddy.

      There are a lot of things that even the best AI models are still not very good at and this article is a good example of one such area, though I would argue against their choice of models. Coding isn't one of them.

      Current models (I.E. Claude Sonnet 3.5) already code better than 99% of software developers. Once they catch up in architecting, and to be fair some like DeepSeek R1 are already pretty good, it's game over.

      You'll end up with one human dev with a team of AI agents doing the wo

      • Comment removed based on user account deletion
      • Have you heard of a people who call themselves lawyers? When there are conflicts these lawyers got to places called courts and have trials based on laws.

        So imagine some moron who thinks that AI is "better than 99% of software developers" makes an AI generated product that harms a large number of people's property or physical self. That whole courts/laws/trial thing exists to punish that bad actor, compensate the victims and hopefully keep these kinds of events from reoccurring. It can get real expensive a

      • The LLM has to be able to understand the codebase. Otherwise it will write "small brain" code that misses all of the things we know about how to design software. I've only used Copilot so far. My experience is that it works amazing for toy projects and terribly for a large complex codebase.

        But is a great way to copy code without explicitly copying code. For example, if I want to lower case unicode: I could write my own collection of folders for each code point; I could copy someone's code for it; I c
      • You're a fuddy duddy.

        There are a lot of things that even the best AI models are still not very good at and this article is a good example of one such area, though I would argue against their choice of models. Coding isn't one of them.

        Current models (I.E. Claude Sonnet 3.5) already code better than 99% of software developers. Once they catch up in architecting, and to be fair some like DeepSeek R1 are already pretty good, it's game over.

        You'll end up with one human dev with a team of AI agents doing the work of entire departments. This isn't a decade away, it's a year. Maybe less.

        Are you an AI salesman, or a manager who has bought the hype? You talk like one of my management team, despite seeing zero actual evidence of good progress on programming beyond the very basics from the AI tools they're shoving down our throats.

    • Comment removed based on user account deletion
    • Part of my labor last year was rewriting a huge pile of steaming **** that some junior devs created, heavily assisted by Copilot. They made no attempts to create hierarchical/re-useable code, instead it was just a steaming pile of AI assisted copy pasta. I went as far as to get one of them fired for it, they had done such a bad job. I would love to get their manager fired also, as it is his fault for allowing it to happen.

      This was a C++17 codebase from an existing product and they were just adding a ne
  • Based on last year's election coverage it couldn't possibly get worse. I watched major news outlets tell me that Donald Trump swaying awkwardly to his iPod playlist for 40 minutes instead of answering questions at a town hall was not in fact a 78-year-old man sundowning but was just a brilliant political maneuver. I don't mean like your Fox News and your other obvious propaganda outlets I'm talking things like the associate press, The New York Times and Newsweek and even the Washington Post got in on it.
  • We must all suffer while AI digests the world. And considering what it's trained on, is it any wonder it's just regurgitating garbage?

    Although to be fair, we use it at work to boil down meeting transcripts and other blobs of text, convert text to diagrams, diagrams to text, etc etc and I have to admit that a lot of it works like magic.

    Having it work within narrow silos of tasks and information seems to be key. Done properly it saves hours of time and produces perfectly usable output, often better than those

    • AI is not just regurgitating garbage, it is creating new garbage.

      Microsoft Copilot just told me with total confidence all about the wonderful parish church St Andrews in my village. But ...

      That parish church has been here for 800 years and has NEVER been dedicated to St Andrew. We have no idea where this notion came from.

      There is a special place in Hell for the people who trained Copilot.
      • Microsoft Copilot just told me with total confidence all about the wonderful parish church St Andrews in my village. But ...
        That parish church has been here for 800 years and has NEVER been dedicated to St Andrew. We have no idea where this notion came from.

        This is exactly what I mean when I say that soon we won't be able to trust anything that wasn't literally printed on paper before ~2010 or so.

        AI will give out this (wrong) answer, other AIs will ingest it and then also produce this answer, and after a while lots and lots of online sources will be repeating this (wrong) answer.

        But you're in doubt and so you decide to check 5 different sources....and they all agree that yes, the church was indeed dedicated to St Andrew. It's got to be true because all those

  • by Big Hairy Gorilla ( 9839972 ) on Wednesday February 12, 2025 @07:01PM (#65162401)
    Trust no one. Believe nothing without verification. This is the attention economy. Grab someone with a salacious headline, serve the ad.
    I see the current world as a massive tourist trap. Everyone is selling the same crap, but the first one who convinces you gets the business.
    • by mjwx ( 966435 )

      Trust no one. Believe nothing without verification. This is the attention economy. Grab someone with a salacious headline, serve the ad.
      I see the current world as a massive tourist trap. Everyone is selling the same crap, but the first one who convinces you gets the business.

      I'd largely agree if we were talking about American news sources but this is the BBC (British Broadcasting Corporation) which does not serve or get any funding from advertisements.

      However their reputation is under threat due to AI misrepresenting their content.

  • Please magic oracle, feed me bullshit. I don't wanna have to read. Or think.

  • Maybe even trained on quite a few summary/story pairs too ... poor models.
  • by seth_hartbecke ( 27500 ) on Wednesday February 12, 2025 @09:03PM (#65162613) Homepage

    ⦠that most news stories are without content.

  • Grok is big enough now that it should be included in this type of study. I'm curious whether it has the same kind of hallucinations or gullibility as the other engines.

  • by Anonymous Coward
    They're just pissed off that AI is doing what they've been doing for years now. Inaccurate and misleading sum up BBC News.
  • #1: Old media is suing OpenAi and Microsoft. #2: They are not suing Google. #3: Google gave old media $100m in kickbacks (just to nytimes) So surprise surprise surprise they find a study that bangs on OpenAi. The "study" is deeply flawed. #1: They say they "let robots" through? OpenAI crawls sporadically and back fills with OpenCrawl data. #2: BBC's current robots.txt blocks all OpenAI (and has for some time) #3: Nobody, but nobody is using a LLM int he manner that the BBC is testing it with... What happe
  • I've complained to local media for years about how their headlines were misleading or wrong. They rarely care. AI is just doing the same, except AI doesn't really understand, whereas people should.

  • For AI research I built a small app that distills raw weather reports into a tropical forecast summary. (https://github.com/mukunda-/stormbot)

    However, while testing, sometimes the summary would not include a reported hurricane. I can't imagine how the word "hurricane" wouldn't be strong enough to take first place in the token summary soup, so the lesson learned here is to not trust an AI summary to capture all critical details.

  • Here is one from a few days ago [slashdot.org], the "have deformed by 100 million or more in height in places" there was not what the original BBC article said.

  • Its not generally realized outside informed circles, but the BBC news is now actually an AI construction. The move has been gradual over the last couple of years, but now pretty much everything you see is AI. The presenters are AI creations - wonderfully lifelike, but that is what they are.

    It shows all of the characteristics of LLMs including hallucinations of events that never happened - as when it reported that an IDF missile had hit a hospital, when human auditing revealed that a Hamas missile had mis

  • The actual problem is we're not using enough AI !

    You have LLM-0 - let's call it Intern - generate the initial summary.

    We hand that summary off to LLM-1, (Senior Reporter), with access to the source materia/s and a specific prompt to fact/sense check LLM-0's output against the source.

    We take LLM-1's assessment, LLM-0's summary and the source material/s to LLM-2, (Editor?), for final pass.

    Summary will be accurate, concise and throughly inscrutable by human minds....

    If this fails, we just keep adding addition

  • Maybe things could go wrong, but that would be on the head of the editor who failed to review the AI generated summary they post on their clickbait site.
  • AI Finally taking jobs away...
  • I'd like to see a lot of examples. BBC is highly biased; the bots might make the subtle bias more explicit. Here's a hypothetical example:

    BBC: "US officials act to prohibit gender affirming care."

    Bot, after reading whole article and correlating with other sources: "US officials act to defund medical sex change interventions in minors."

    Similar examples could likely occur in BBC coverage of the middle east, or anti-semitic or anti-muslim happenings in Britain.

"Regardless of the legal speed limit, your Buick must be operated at speeds faster than 85 MPH (140kph)." -- 1987 Buick Grand National owners manual.

Working...