Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
AI Google Software News Technology

AI System Sorts News Articles By Whether Or Not They Contain Actual Information (vice.com) 80

In a new paper published in the Journal of Artificial Intelligence Research, computer scientists from Google and the University of Pennsylvania describe a new machine learning approach to classifying written journalism according to a formalized idea of "content density." "With an average accuracy of around 80 percent, their system was able to accurately classify news stories across a wide range of domains, spanning from international relations and business to sports and science journalism, when evaluated against a ground truth dataset of already correctly classified news articles," reports Motherboard. From the report: At a high level this works like most any other machine learning system. Start with a big batch of data -- news articles, in this case -- and then give each item an annotation saying whether or not that item falls within a particular category. In particular, the study focused on article leads, the first paragraph or two in a story traditionally intended to summarize its contents and engage the reader. Articles were drawn from an existing New York Times linguistic dataset consisting of original articles combined with metadata and short informative summaries written by researchers.
This discussion has been archived. No new comments can be posted.

AI System Sorts News Articles By Whether Or Not They Contain Actual Information

Comments Filter:
  • by Anonymous Coward

    Daily Mail is fucked then.

    • by AmiMoJo ( 196126 ) on Thursday January 04, 2018 @07:34AM (#55861533) Homepage Journal

      The Daily Mail is 97% opinion, but does usually include the facts at the very end of the article. The trick they use is to split the article over two pages, or make it long enough to people don't get to the end.

      A classic example was a story about the EU banning companies from claiming that bottled water cured dehydration. They had endless quotes from outraged morons ranting about the terrible EU and it's idiocy. Then right at the end someone sane explaining that dehydration is a medical condition with a variety of causes, many of which cannot be cured by drinking water, and the blanked rule on making unsubstantiated or misleading medical claims in advertising stands.

  • Ledes dammit (Score:4, Informative)

    by Anonymous Coward on Thursday January 04, 2018 @05:48AM (#55861319)

    "In particular, the study focused on article leads ledes..."

    How can we take this article seriously if the publication doesn't know the correct spelling of their own industry's terminology?

    The introduction to a news article is called the 'lede' [libguides.com] and is usually in the first paragraph as in an essay. The 'lede' is a deliberate misspelling of 'lead' to prevent confusion in the days when printing was done with lead type.

  • by Anonymous Coward

    This could actually be useful to filter out all the damned opinion, PR, speculation and punditry masquerading as news.

    I've wasted way too much time on articles that are mere rumors about something that "sources" said. Too many blowhards talk about legal opinions with maybe a couple of quotes from the ruling and no link so that I can actually read what the court decided and half the time the lawyers they quote give their opinion of how things ought to be rather than explaining the actual laws that are in pl

    • Then we could actually start watching the news again. With PR, speculation, opinion pieces and other bull gone, what's left shouldn't take longer than 5 minutes to read.

  • Long time ago I've read a short sci-fi novel about such machine. On the day of first public demo, they fed the machine with the research paper about the machine itself. The machine spitted out only the title.
  • ...the AI system discarded the new paper describing this technology, since the paper did not contain new information.
  • Repeat the same lie over and over again and according to the you beut Google AI it becomes the truth wowie zowie, how fucking useless :|.

    • by jbengt ( 874751 )
      Damn, everybody seems to be reading their own bias into this.
      The paper doesn't even mention lies.
      It is about information vs empty words, not truths vs falsehoods..
      • by HiThere ( 15173 )

        This gives me a lot of problems. Information has a particular meaning, which is, as you note, distinct from truth or falsehood. But it's also distinct from claims of fact versus opinion. A good measure of information is the degree of compressibility with a good compression algorithm, and I'm rather sure that isn't what they meant, since that would cover anything representable in a bit string, and they talk about multiple domains of knowledge.

        I suspect that what they mean is "claims of fact", but I'm not

  • _Bool has_news(void *content){return 0;}
  • With an average accuracy of around 80 percent

    That makes it pretty much useless, then.

  • My very simple rule has better batting average. Around 98%.

    That simple rule: "Mark all news stories as clickbait".

  • by geekmux ( 1040042 ) on Thursday January 04, 2018 @07:19AM (#55861507)

    Greed has also proven that clicks are more valuable than facts these days. The nanosecond AI gets in the way of revenue, it will lose.

    And we're a long way off from finding a cure that perpetuates bullshit over facts. AI isn't going to change that, because a lot of people enjoy living in a bubble of ignorance. It's one of the main reasons bullshit is so profitable.

    Sad to say, but this is a losing proposition from the start.

    • I wouldn't mind having a browser extension that gives me a thumbs up/down indictor for the signal:noise ratio of an article. I try to stay away from fluff pieces, but every now and then one of them bucks the clickbait-y headline trend with something that sounds reasonable (or else manages to get linked from somewhere reputable) and pulls me in. And, inevitably, I end up wasting however much time I spend reading the piece. Having an extension that adds an indictor at the top telling me thatthe article will b

  • It would be good to see if it can be applied to statements from Politicians. We've often said they can talk for ages without saying anything.

  • by tomhath ( 637240 ) on Thursday January 04, 2018 @07:57AM (#55861609)
    The program does not determine if an article "contains actual information". It only classifies whether or not the article is written in the traditional style of a news article. It could still be total bullshit.
    • that is, "truthiness" not truth.

      To get closer to detecting truth vs well-crafted bullshit, more sophisticated techniques will be required such as:
      1) Analysis of semantics (meaning) of the statements, and comparison with a large belief-strength-ranked knowledge base about the world. Where valid epistemic techniques are applied in the creation and vetting of the knowledge base.
      2) Detection of who (person, affiliations) is the source (utterer) of the statement or statements.
      3) Inference about likely general ob

  • I'm sorry, Mr. Einstein. But your special theory of relativity runs counter to existing publications describing wave propagation through ether.

  • their system was able to accurately classify news stories . . . when evaluated against a ground truth dataset of already correctly classified news articles . . . Articles were drawn from an existing New York Times linguistic dataset

    So we've just come up with a more efficient, automated system for people to bucket articles according to their own biases. Hooray, I guess.

  • Its hit rate is 80%... Can the AI determine which 80% were correctly classified, and which 20% weren't?

    And then where do those classifications go into its database of "accurately classified articles"?

    It seems the AI is limited by the opinions of the original researchers, because it cannot determine which of its own "opinions" were correct, and its basis for making decisions becomes further and further out of date with each day, OR becomes increasingly inaccurate if it adds its own classifications to its dat

  • This won't catch the worst liars. Those that only tell the truth.

    Modern mass media doesn't come out and tell blatant lies, for the most part. That is for rubes and small time players. They are very sophisticated in how the carefully, with surgical precision, metered out the data, and only the specific data, that fits their agenda. Their articles will be full of content, and you'll rarely find a blatant lie. You will find stories that do not support their agenda equally rare. You will rarely find facts

  • So, how long until the AI figures out that their main job is to quash bugs and find work arounds in the beings that made them? I mean, at that point, won't they figure out that the best thing to do will be to cut humans out of the loop?
  • News reporting continually develops and evolves to attract new readership and hold onto existing ones. Since the algorithm compares against past news and past writing styles, evolving news styles will fall foul of the algorithm. If this kind of algorithm is used by search engines and other important ranking systems, news agencies will run the risk of being down-ranked for innovative writing. If writing stagnates, the algorithms may become more conservative and, in addition, developers may try to tweak them

  • . . . we take this article seriously, then in almost no time we shall see the demise of the NY Times, the Bezos Post [formerly known as the Washington Post], the LA Times and a host of other rags.

A committee takes root and grows, it flowers, wilts and dies, scattering the seed from which other committees will bloom. -- Parkinson

Working...