AI System Sorts News Articles By Whether Or Not They Contain Actual Information (vice.com) 80
In a new paper published in the Journal of Artificial Intelligence Research, computer scientists from Google and the University of Pennsylvania describe a new machine learning approach to classifying written journalism according to a formalized idea of "content density." "With an average accuracy of around 80 percent, their system was able to accurately classify news stories across a wide range of domains, spanning from international relations and business to sports and science journalism, when evaluated against a ground truth dataset of already correctly classified news articles," reports Motherboard. From the report: At a high level this works like most any other machine learning system. Start with a big batch of data -- news articles, in this case -- and then give each item an annotation saying whether or not that item falls within a particular category. In particular, the study focused on article leads, the first paragraph or two in a story traditionally intended to summarize its contents and engage the reader. Articles were drawn from an existing New York Times linguistic dataset consisting of original articles combined with metadata and short informative summaries written by researchers.
Daily Mail is fucked (Score:2, Funny)
Daily Mail is fucked then.
Re:Daily Mail is fucked (Score:5, Informative)
The Daily Mail is 97% opinion, but does usually include the facts at the very end of the article. The trick they use is to split the article over two pages, or make it long enough to people don't get to the end.
A classic example was a story about the EU banning companies from claiming that bottled water cured dehydration. They had endless quotes from outraged morons ranting about the terrible EU and it's idiocy. Then right at the end someone sane explaining that dehydration is a medical condition with a variety of causes, many of which cannot be cured by drinking water, and the blanked rule on making unsubstantiated or misleading medical claims in advertising stands.
Ledes dammit (Score:4, Informative)
"In particular, the study focused on article leads ledes..."
How can we take this article seriously if the publication doesn't know the correct spelling of their own industry's terminology?
The introduction to a news article is called the 'lede' [libguides.com] and is usually in the first paragraph as in an essay. The 'lede' is a deliberate misspelling of 'lead' to prevent confusion in the days when printing was done with lead type.
Finally something that might be useful... (Score:1)
This could actually be useful to filter out all the damned opinion, PR, speculation and punditry masquerading as news.
I've wasted way too much time on articles that are mere rumors about something that "sources" said. Too many blowhards talk about legal opinions with maybe a couple of quotes from the ruling and no link so that I can actually read what the court decided and half the time the lawyers they quote give their opinion of how things ought to be rather than explaining the actual laws that are in pl
Re: (Score:2)
Then we could actually start watching the news again. With PR, speculation, opinion pieces and other bull gone, what's left shouldn't take longer than 5 minutes to read.
Re: (Score:2)
Should've bought some in my country, too.
Re: (Score:2)
So, the AI is a tool that follows the ideology of those that educated it? Colour me surprised.
Re: (Score:2)
As long as the rest of the "news" is still available, it's trivial for any educated person to find out whether what the AI filters out is actually news or whether it's been doctored to become a propaganda tool.
If it's the latter, throw it away and get a new one. That's the beauty of it, as long as you still have access to the base material, you can decide to start over.
Sci-fi novel (Score:2)
Re: (Score:2)
It's easy for a human to learn how to tell information from opinion. I managed to do it, so can everyone else. And thus it's also easy for a human to see whether that AI is actually "intelligent" enough to do its job or not.
Yes, that means you actually have to audit it yourself if you want to know whether it is "honest" or whether someone wants to pass his opinion off as information. Wow, what a surprise.
Re: (Score:2)
It's easy for a human to learn how to tell information from opinion.
If it's that easy, then why don't more people do it? Face it, most people are sheep. Confirmation bias and all, they'd rather follow their own crowd. When all it takes to sell an idea is a preamble that "Ninety percent of all X believe Y...." there is no hope for critical thinking.
Re: (Score:2)
Just because something is easy doesn't mean that it is comfortable. It's easy to learn enough physics that a concept like "flat earth" is at best comical, yet there are people who believe it.
People are generally more inclined to believe than to know. Because it's easier. Believing just requires one thing: Believing. That's trivial to do (provided you can, I cannot... long story). Simply proclaim that "I believe" and you're in.
Knowing requires more effort. You can't simply state that "I know". Because knowin
Re: (Score:2)
It's not that simple, and there are degree of both belief and knowledge. But it's just as easy to falsely claim knowledge as it is to pontificate about a weak, or even absent, belief.
The thing is, belief is something that nobody can do without. But it tends to resist analysis. I will assert that without belief you can't walk across the room. You need to believe that space is metric, that the floor will support you, etc. But it gets a bad name because many people use the term when they encounter somethi
Re: (Score:2)
Actually I can't help but challenge the claim that everyone has some kind of belief. I need not believe space is metric. I can actually question it, test it and can by simple sensory input verify that it is. Can I trust my sensory input? I have to. It's all I have. A speculation about whether the sensors that are at my disposal are actually accurate or whether they are manipulated (the whole "brain in a vat" thing) is moot since I cannot falsify it. I can test whether my sensory input agrees with the outcom
Re: (Score:2)
Do you believe your senses? Then you practice belief.
Re: (Score:2)
This has less to do with belief than with pragmatism. I have no input but the input my senses provide. As long as this input is in accordance with the effects that happen if I act upon the input, it is valid.
Counter example: When you are drunk, your senses are able to tell you the room is spinning. According to your senses, you are moving even if you are stationary. You can try to act upon the input, e.g. the information that the room is spinning that your balance sensorium provides and you will fall down b
Re: (Score:2)
If it's that easy, then why don't more people do it?
A lot of people treat their 'news' sources as a medium of entertainment.
Unfortunately.... (Score:2)
Re: (Score:2)
OHHH LOOK (Score:2)
Repeat the same lie over and over again and according to the you beut Google AI it becomes the truth wowie zowie, how fucking useless :|.
Re: (Score:1)
If we know it's a lie, why do we need an AI? You're asking a machine to tell you what you've just said to it. The point of the AI, is telling people who are really dumb, that it's a lie. The problem is, this sort of AI, rather like really dumb people, doesn't have an empirical method for measuring 'truthiness'. It uses a 'one of these is not like the other' algorithm. That's not as useful as it sounds because social priorities/norms change, meaning the AI is likely to reject "the new" normal.
The proble
Re: (Score:2)
The AI does not assess truth but information. There is a difference, ya know?
Re: (Score:3)
The paper doesn't even mention lies.
It is about information vs empty words, not truths vs falsehoods..
Re: (Score:2)
This gives me a lot of problems. Information has a particular meaning, which is, as you note, distinct from truth or falsehood. But it's also distinct from claims of fact versus opinion. A good measure of information is the degree of compressibility with a good compression algorithm, and I'm rather sure that isn't what they meant, since that would cover anything representable in a bit string, and they talk about multiple domains of knowledge.
I suspect that what they mean is "claims of fact", but I'm not
over 90% accuracy with a 1-liner (Score:2)
Accuracy (Score:2)
With an average accuracy of around 80 percent
That makes it pretty much useless, then.
Re: (Score:2)
You are still going to do manual verification when you read it.
In that case, it doesn't save any time, because you still have to read all of them.
Not very impressive. (Score:2)
That simple rule: "Mark all news stories as clickbait".
AI vs. Greed? Yeah right. (Score:4)
Greed has also proven that clicks are more valuable than facts these days. The nanosecond AI gets in the way of revenue, it will lose.
And we're a long way off from finding a cure that perpetuates bullshit over facts. AI isn't going to change that, because a lot of people enjoy living in a bubble of ignorance. It's one of the main reasons bullshit is so profitable.
Sad to say, but this is a losing proposition from the start.
Re: (Score:2)
No. The goal of late is to completely control information. Obviously. AI isn't real in this context. Just a control mechanism. There is a man behind that curtain.
The creation of Social Media will go down in history as one of the most important things to ever happen to capitalism.
Within the framework of Social Media, you are the product being bought and sold. Because of this, one could argue the main goal is to completely control people, but that does not dismiss the capitalistic reason for engaging in that activity. If Greed were not being fed by Social Media, it would likely cease to exist. Chances are the man behind the curtain has the same agenda as many other
Re: (Score:2)
I wouldn't mind having a browser extension that gives me a thumbs up/down indictor for the signal:noise ratio of an article. I try to stay away from fluff pieces, but every now and then one of them bucks the clickbait-y headline trend with something that sounds reasonable (or else manages to get linked from somewhere reputable) and pulls me in. And, inevitably, I end up wasting however much time I spend reading the piece. Having an extension that adds an indictor at the top telling me thatthe article will b
Use on politicians (Score:2)
It would be good to see if it can be applied to statements from Politicians. We've often said they can talk for ages without saying anything.
Misleading headline (Score:3)
Exactly: This detects "fact-like" not "fact" (Score:2)
that is, "truthiness" not truth.
To get closer to detecting truth vs well-crafted bullshit, more sophisticated techniques will be required such as:
1) Analysis of semantics (meaning) of the statements, and comparison with a large belief-strength-ranked knowledge base about the world. Where valid epistemic techniques are applied in the creation and vetting of the knowledge base.
2) Detection of who (person, affiliations) is the source (utterer) of the statement or statements.
3) Inference about likely general ob
Training Dataset (Score:2)
I'm sorry, Mr. Einstein. But your special theory of relativity runs counter to existing publications describing wave propagation through ether.
An automated New York Times truth-o-meter (Score:2)
their system was able to accurately classify news stories . . . when evaluated against a ground truth dataset of already correctly classified news articles . . . Articles were drawn from an existing New York Times linguistic dataset
So we've just come up with a more efficient, automated system for people to bucket articles according to their own biases. Hooray, I guess.
So how does the AI move forward? (Score:2)
Its hit rate is 80%... Can the AI determine which 80% were correctly classified, and which 20% weren't?
And then where do those classifications go into its database of "accurately classified articles"?
It seems the AI is limited by the opinions of the original researchers, because it cannot determine which of its own "opinions" were correct, and its basis for making decisions becomes further and further out of date with each day, OR becomes increasingly inaccurate if it adds its own classifications to its dat
Liars know how (Score:2)
This won't catch the worst liars. Those that only tell the truth.
Modern mass media doesn't come out and tell blatant lies, for the most part. That is for rubes and small time players. They are very sophisticated in how the carefully, with surgical precision, metered out the data, and only the specific data, that fits their agenda. Their articles will be full of content, and you'll rarely find a blatant lie. You will find stories that do not support their agenda equally rare. You will rarely find facts
Humans are the problem? (Score:2)
A cycle of stagnating news... (Score:2)
News reporting continually develops and evolves to attract new readership and hold onto existing ones. Since the algorithm compares against past news and past writing styles, evolving news styles will fall foul of the algorithm. If this kind of algorithm is used by search engines and other important ranking systems, news agencies will run the risk of being down-ranked for innovative writing. If writing stagnates, the algorithms may become more conservative and, in addition, developers may try to tweak them
Assuming . . . (Score:2)