AI System Sorts News Articles By Whether Or Not They Contain Actual Information (vice.com) 80

Posted by BeauHD on Thursday January 04, 2018 @06:00AM from the fact-is-stranger-than-fiction dept.

In a new paper published in the Journal of Artificial Intelligence Research, computer scientists from Google and the University of Pennsylvania describe a new machine learning approach to classifying written journalism according to a formalized idea of "content density." "With an average accuracy of around 80 percent, their system was able to accurately classify news stories across a wide range of domains, spanning from international relations and business to sports and science journalism, when evaluated against a ground truth dataset of already correctly classified news articles," reports Motherboard. From the report: At a high level this works like most any other machine learning system. Start with a big batch of data -- news articles, in this case -- and then give each item an annotation saying whether or not that item falls within a particular category. In particular, the study focused on article leads, the first paragraph or two in a story traditionally intended to summarize its contents and engage the reader. Articles were drawn from an existing New York Times linguistic dataset consisting of original articles combined with metadata and short informative summaries written by researchers.

AI System Sorts News Articles By Whether Or Not They Contain Actual Information

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 80 Comments Log In/Create an Account

Comments Filter:

Daily Mail is fucked (Score:2, Funny)

by Anonymous Coward writes:

Daily Mail is fucked then.
- Re:Daily Mail is fucked (Score:5, Informative)
  
  by AmiMoJo ( 196126 ) writes: on Thursday January 04, 2018 @08:34AM (#55861533) Homepage Journal
  
  The Daily Mail is 97% opinion, but does usually include the facts at the very end of the article. The trick they use is to split the article over two pages, or make it long enough to people don't get to the end.
  A classic example was a story about the EU banning companies from claiming that bottled water cured dehydration. They had endless quotes from outraged morons ranting about the terrible EU and it's idiocy. Then right at the end someone sane explaining that dehydration is a medical condition with a variety of causes, many of which cannot be cured by drinking water, and the blanked rule on making unsubstantiated or misleading medical claims in advertising stands.
  
Ledes dammit (Score:4, Informative)

by Anonymous Coward writes: on Thursday January 04, 2018 @06:48AM (#55861319)

"In particular, the study focused on article leads ledes..."
How can we take this article seriously if the publication doesn't know the correct spelling of their own industry's terminology?
The introduction to a news article is called the 'lede' [libguides.com] and is usually in the first paragraph as in an essay. The 'lede' is a deliberate misspelling of 'lead' to prevent confusion in the days when printing was done with lead type.

Finally something that might be useful... (Score:1)

by Anonymous Coward writes:

This could actually be useful to filter out all the damned opinion, PR, speculation and punditry masquerading as news.
I've wasted way too much time on articles that are mere rumors about something that "sources" said. Too many blowhards talk about legal opinions with maybe a couple of quotes from the ruling and no link so that I can actually read what the court decided and half the time the lawyers they quote give their opinion of how things ought to be rather than explaining the actual laws that are in pl
- Re: (Score:2)
  
  by Opportunist ( 166417 ) writes:
  
  Then we could actually start watching the news again. With PR, speculation, opinion pieces and other bull gone, what's left shouldn't take longer than 5 minutes to read.
  - - Re: (Score:2)
      
      by Opportunist ( 166417 ) writes:
      
      Should've bought some in my country, too.
- Re: (Score:2)
  
  by TuringTest ( 533084 ) writes:
  
  So, the AI is a tool that follows the ideology of those that educated it? Colour me surprised.
- Re: (Score:2)
  
  by Opportunist ( 166417 ) writes:
  
  As long as the rest of the "news" is still available, it's trivial for any educated person to find out whether what the AI filters out is actually news or whether it's been doctored to become a propaganda tool.
  If it's the latter, throw it away and get a new one. That's the beauty of it, as long as you still have access to the base material, you can decide to start over.
Sci-fi novel (Score:2)

by rastos1 ( 601318 ) writes:

Long time ago I've read a short sci-fi novel about such machine. On the day of first public demo, they fed the machine with the research paper about the machine itself. The machine spitted out only the title.
- Re: (Score:2)
  
  by Opportunist ( 166417 ) writes:
  
  It's easy for a human to learn how to tell information from opinion. I managed to do it, so can everyone else. And thus it's also easy for a human to see whether that AI is actually "intelligent" enough to do its job or not.
  Yes, that means you actually have to audit it yourself if you want to know whether it is "honest" or whether someone wants to pass his opinion off as information. Wow, what a surprise.
  - Re: (Score:2)
    
    by PPH ( 736903 ) writes:
    
    It's easy for a human to learn how to tell information from opinion.
    If it's that easy, then why don't more people do it? Face it, most people are sheep. Confirmation bias and all, they'd rather follow their own crowd. When all it takes to sell an idea is a preamble that "Ninety percent of all X believe Y...." there is no hope for critical thinking.
    - Re: (Score:2)
      
      by Opportunist ( 166417 ) writes:
      
      Just because something is easy doesn't mean that it is comfortable. It's easy to learn enough physics that a concept like "flat earth" is at best comical, yet there are people who believe it.
      People are generally more inclined to believe than to know. Because it's easier. Believing just requires one thing: Believing. That's trivial to do (provided you can, I cannot... long story). Simply proclaim that "I believe" and you're in.
      Knowing requires more effort. You can't simply state that "I know". Because knowin
      - Re: (Score:2)
        
        by HiThere ( 15173 ) writes:
        
        It's not that simple, and there are degree of both belief and knowledge. But it's just as easy to falsely claim knowledge as it is to pontificate about a weak, or even absent, belief.
        The thing is, belief is something that nobody can do without. But it tends to resist analysis. I will assert that without belief you can't walk across the room. You need to believe that space is metric, that the floor will support you, etc. But it gets a bad name because many people use the term when they encounter somethi
        
        Re: (Score:2)
        
        by Opportunist ( 166417 ) writes:
        
        Actually I can't help but challenge the claim that everyone has some kind of belief. I need not believe space is metric. I can actually question it, test it and can by simple sensory input verify that it is. Can I trust my sensory input? I have to. It's all I have. A speculation about whether the sensors that are at my disposal are actually accurate or whether they are manipulated (the whole "brain in a vat" thing) is moot since I cannot falsify it. I can test whether my sensory input agrees with the outcom
        
        Re: (Score:2)
        
        by HiThere ( 15173 ) writes:
        
        Do you believe your senses? Then you practice belief.
        
        Re: (Score:2)
        
        by Opportunist ( 166417 ) writes:
        
        This has less to do with belief than with pragmatism. I have no input but the input my senses provide. As long as this input is in accordance with the effects that happen if I act upon the input, it is valid.
        Counter example: When you are drunk, your senses are able to tell you the room is spinning. According to your senses, you are moving even if you are stationary. You can try to act upon the input, e.g. the information that the room is spinning that your balance sensorium provides and you will fall down b
    - Re: (Score:2)
      
      by Wootery ( 1087023 ) writes:
      
      If it's that easy, then why don't more people do it?
      
      A lot of people treat their 'news' sources as a medium of entertainment.
Unfortunately.... (Score:2)

by LordHighExecutioner ( 4245243 ) writes:

...the AI system discarded the new paper describing this technology, since the paper did not contain new information.
- Re: (Score:2)
  
  by K. S. Kyosuke ( 729550 ) writes:
  
  Finally a viable technology to deal with /. dupes!
OHHH LOOK (Score:2)

by rtb61 ( 674572 ) writes:

Repeat the same lie over and over again and according to the you beut Google AI it becomes the truth wowie zowie, how fucking useless :|.
- - Re: (Score:1)
    
    by Anonymous Coward writes:
    
    ... AI will add known lies to its list of truths.
    
    If we know it's a lie, why do we need an AI? You're asking a machine to tell you what you've just said to it. The point of the AI, is telling people who are really dumb, that it's a lie. The problem is, this sort of AI, rather like really dumb people, doesn't have an empirical method for measuring 'truthiness'. It uses a 'one of these is not like the other' algorithm. That's not as useful as it sounds because social priorities/norms change, meaning the AI is likely to reject "the new" normal.
    The proble
    - Re: (Score:2)
      
      by Opportunist ( 166417 ) writes:
      
      The AI does not assess truth but information. There is a difference, ya know?
- Re: (Score:3)
  
  by jbengt ( 874751 ) writes:
  
  Damn, everybody seems to be reading their own bias into this.
  The paper doesn't even mention lies.
  It is about information vs empty words, not truths vs falsehoods..
  - Re: (Score:2)
    
    by HiThere ( 15173 ) writes:
    
    This gives me a lot of problems. Information has a particular meaning, which is, as you note, distinct from truth or falsehood. But it's also distinct from claims of fact versus opinion. A good measure of information is the degree of compressibility with a good compression algorithm, and I'm rather sure that isn't what they meant, since that would cover anything representable in a bit string, and they talk about multiple domains of knowledge.
    I suspect that what they mean is "claims of fact", but I'm not
over 90% accuracy with a 1-liner (Score:2)

by technosaurus ( 1704630 ) writes:

_Bool has_news(void *content){return 0;}
Accuracy (Score:2)

by religionofpeas ( 4511805 ) writes:

With an average accuracy of around 80 percent
That makes it pretty much useless, then.
- - Re: (Score:2)
    
    by religionofpeas ( 4511805 ) writes:
    
    You are still going to do manual verification when you read it.
    In that case, it doesn't save any time, because you still have to read all of them.
Not very impressive. (Score:2)

by 140Mandak262Jamuna ( 970587 ) writes:

My very simple rule has better batting average. Around 98%.
That simple rule: "Mark all news stories as clickbait".
AI vs. Greed? Yeah right. (Score:4)

by geekmux ( 1040042 ) writes: on Thursday January 04, 2018 @08:19AM (#55861507)

Greed has also proven that clicks are more valuable than facts these days. The nanosecond AI gets in the way of revenue, it will lose.
And we're a long way off from finding a cure that perpetuates bullshit over facts. AI isn't going to change that, because a lot of people enjoy living in a bubble of ignorance. It's one of the main reasons bullshit is so profitable.
Sad to say, but this is a losing proposition from the start.

- - Re: (Score:2)
    
    by geekmux ( 1040042 ) writes:
    
    No. The goal of late is to completely control information. Obviously. AI isn't real in this context. Just a control mechanism. There is a man behind that curtain.
    The creation of Social Media will go down in history as one of the most important things to ever happen to capitalism.
    Within the framework of Social Media, you are the product being bought and sold. Because of this, one could argue the main goal is to completely control people, but that does not dismiss the capitalistic reason for engaging in that activity. If Greed were not being fed by Social Media, it would likely cease to exist. Chances are the man behind the curtain has the same agenda as many other
- Re: (Score:2)
  
  by Anubis IV ( 1279820 ) writes:
  
  I wouldn't mind having a browser extension that gives me a thumbs up/down indictor for the signal:noise ratio of an article. I try to stay away from fluff pieces, but every now and then one of them bucks the clickbait-y headline trend with something that sounds reasonable (or else manages to get linked from somewhere reputable) and pulls me in. And, inevitably, I end up wasting however much time I spend reading the piece. Having an extension that adds an indictor at the top telling me thatthe article will b
Use on politicians (Score:2)

by Paul Bristow ( 118584 ) writes:

It would be good to see if it can be applied to statements from Politicians. We've often said they can talk for ages without saying anything.
Misleading headline (Score:3)

by tomhath ( 637240 ) writes: on Thursday January 04, 2018 @08:57AM (#55861609)

The program does not determine if an article "contains actual information". It only classifies whether or not the article is written in the traditional style of a news article. It could still be total bullshit.

- Exactly: This detects "fact-like" not "fact" (Score:2)
  
  by presidenteloco ( 659168 ) writes:
  
  that is, "truthiness" not truth.
  To get closer to detecting truth vs well-crafted bullshit, more sophisticated techniques will be required such as:
  1) Analysis of semantics (meaning) of the statements, and comparison with a large belief-strength-ranked knowledge base about the world. Where valid epistemic techniques are applied in the creation and vetting of the knowledge base.
  2) Detection of who (person, affiliations) is the source (utterer) of the statement or statements.
  3) Inference about likely general ob
Training Dataset (Score:2)

by PPH ( 736903 ) writes:

I'm sorry, Mr. Einstein. But your special theory of relativity runs counter to existing publications describing wave propagation through ether.
An automated New York Times truth-o-meter (Score:2)

by SlaveToTheGrind ( 546262 ) writes:

their system was able to accurately classify news stories . . . when evaluated against a ground truth dataset of already correctly classified news articles . . . Articles were drawn from an existing New York Times linguistic dataset
So we've just come up with a more efficient, automated system for people to bucket articles according to their own biases. Hooray, I guess.
So how does the AI move forward? (Score:2)

by WoodstockJeff ( 568111 ) writes:

Its hit rate is 80%... Can the AI determine which 80% were correctly classified, and which 20% weren't?
And then where do those classifications go into its database of "accurately classified articles"?
It seems the AI is limited by the opinions of the original researchers, because it cannot determine which of its own "opinions" were correct, and its basis for making decisions becomes further and further out of date with each day, OR becomes increasingly inaccurate if it adds its own classifications to its dat
Liars know how (Score:2)

by Shotgun ( 30919 ) writes:

This won't catch the worst liars. Those that only tell the truth.
Modern mass media doesn't come out and tell blatant lies, for the most part. That is for rubes and small time players. They are very sophisticated in how the carefully, with surgical precision, metered out the data, and only the specific data, that fits their agenda. Their articles will be full of content, and you'll rarely find a blatant lie. You will find stories that do not support their agenda equally rare. You will rarely find facts
Humans are the problem? (Score:2)

by dan828 ( 753380 ) writes:

So, how long until the AI figures out that their main job is to quash bugs and find work arounds in the beings that made them? I mean, at that point, won't they figure out that the best thing to do will be to cut humans out of the loop?
Re: (Score:2)

by account_deleted ( 4530225 ) writes:

Comment removed based on user account deletion
Assuming . . . (Score:2)

by sgt_doom ( 655561 ) writes:

. . . we take this article seriously, then in almost no time we shall see the demise of the NY Times, the Bezos Post [formerly known as the Washington Post], the LA Times and a host of other rags.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Daily Mail is fucked (Score:2, Funny)

Re:Daily Mail is fucked (Score:5, Informative)

Ledes dammit (Score:4, Informative)

Finally something that might be useful... (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Sci-fi novel (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Unfortunately.... (Score:2)

Re: (Score:2)

OHHH LOOK (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

over 90% accuracy with a 1-liner (Score:2)

Accuracy (Score:2)

Re: (Score:2)

Not very impressive. (Score:2)

AI vs. Greed? Yeah right. (Score:4)

Re: (Score:2)

Re: (Score:2)

Use on politicians (Score:2)

Misleading headline (Score:3)

Exactly: This detects "fact-like" not "fact" (Score:2)

Training Dataset (Score:2)

An automated New York Times truth-o-meter (Score:2)

So how does the AI move forward? (Score:2)

Liars know how (Score:2)

Humans are the problem? (Score:2)

Re: (Score:2)

Assuming . . . (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals