CMU Web-Scraping Learns English, One Word At a Time

Become a fan of Slashdot on Facebook

CMU Web-Scraping Learns English, One Word At a Time 148

Posted by timothy on Saturday January 16, 2010 @03:18PM from the hao-ubowt-hahmnimz dept.

blee37 writes "Researchers at Carnegie Mellon have developed a web-scraping AI program that never dies. It runs continuously, extracting information from the web and using that information to learn more about the English language. The idea is for a never ending learner like this to one day be able to become conversant in the English language." It's not that the program couldn't stop running; the idea is that there's no fixed end-point. Rather, its progress in categorizing complex word relationships is the object of the research. See also CMU's "Read the Web" research project site.

This discussion has been archived. No new comments can be posted.

CMU Web-Scraping Learns English, One Word At a Time

Load All Comments

Search 148 Comments Log In/Create an Account

Comments Filter:

Uh oh... (Score:5, Funny)

by hampton ( 209113 ) writes: on Saturday January 16, 2010 @03:21PM (#30792326)

What happens when it discovers lolcats?

Share
twitter facebook
- Re:Uh oh... (Score:5, Insightful)
  
  by Bragador ( 1036480 ) writes: on Saturday January 16, 2010 @03:36PM (#30792460)
  
  Actually, it reminds me of a chatbot named Bucket. When people at 4chan heard of it, they started to use it and teach it. It became a complete mess filled with memes, bad jokes, racists comments, and everything you can think of.
  http://www.encyclopediadramatica.com/Bucket
  One response from the bot:
  Bucket: I don't know what the fuck you just said, little kid, but you're special man. You reached out and touched my heart. I'm gonna give you up, never gonna make you cry, never gonna run around and desert you, never gonna let you down, never gonna let you down, never gonna make you cry, never gonna let me down?
  The quality of the teachers is important when learning.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by BACPro ( 206388 ) writes:
    
    An insightful, verbal, rickrolling...
    Thanks for that.
  - The quality of the teachers is important (Score:2, Funny)
    
    by Anonymous Coward writes:
    
    I guess bucket didn't get any choice where to go to school either.
  - Re:Uh oh... (Score:5, Funny)
    
    by MobileTatsu-NJG ( 946591 ) writes: on Saturday January 16, 2010 @05:19PM (#30793194)
    
    Oh FFS, I just got RickRolled on Slashdot. >_
    
    Parent Share
    twitter facebook
  - Is there an IRC chat bot? (Score:2)
    
    by antdude ( 79039 ) writes:
    
    Is there one for IRC? :)
    Are there any good chat bots for IRC? I tried Seeborg (based on Alice), but it sucked. :( I wished rbot could do AI chatter.
    - - Re: (Score:2)
        
        by antdude ( 79039 ) writes:
        
        Where can I get a copy? Cleverbot author told me it is not available for download and not free. :(
        
        Re: (Score:2, Informative)
        
        by jellyfrog ( 1645619 ) writes:
        
        Bucket of #xkcd is on github: http://github.com/zigdon/xkcd-Bucket [github.com]
        
        Re: (Score:2)
        
        by antdude ( 79039 ) writes:
        
        Thanks. Stupid newbie question: How do I install this for my Debian/Linux box to connect to an IRC chatroom? I don't see the instructions/howto. :(
        
        Re: (Score:2, Informative)
        
        by Draykwing ( 900431 ) writes:
        
        Well, Bucket's based on the (rather widespread) 'infobot' Perl program. The original infobot is hosted at http://sourceforge.net/projects/infobot/ [sourceforge.net], but the XKCD variant of Bucket has a very detailed page showing the various interactions one can have with it, as well as a link to the Github page. See http://wiki.xkcd.com/irc/Bucket [xkcd.com].
        
        Re: (Score:2)
        
        by antdude ( 79039 ) writes:
        
        Hmmm, is it me or I cannot find anything about its chat AI feature? I saw infobot years ago, but don't remember it doing anything like chat AI. I currently use Rbot (http://ruby-rbot.org/) as an infobot, host games (e.g, UNO, hangman, guess a word), etc.
        
        Kevin... (Score:2)
        
        by antdude ( 79039 ) writes:
        
        I tried to e-maik Kevin, the author at lenzo@cs.cmu.edu, but got it returned:
        SMTP error from remote mail server after RCPT TO::
        host MX-LB-03.SRV.cs.cmu.edu [128.2.217.14]: 550 5.1.1 ... address not contained in directory, you cannot relay :(
    - - Re: (Score:2)
        
        by antdude ( 79039 ) writes:
        
        Your script is pretty old, from 2005. :( Got any examples of it running?
        
        Re: (Score:2)
        
        by antdude ( 79039 ) writes:
        
        Ick. Bad conversations. :P
  - Re:Uh oh... (Score:5, Interesting)
    
    by javaman235 ( 461502 ) writes: on Sunday January 17, 2010 @05:19AM (#30796906)
    
    The quality of the teachers is important when learning.
    That's seriously kind of interesting, actually: It makes me wonder if decades from now software developers will be few and far between, designing the AI algorithms for modern programs while the rest of us find work as software tutors, training those programs to do their business function.
    
    Parent Share
    twitter facebook
- Re: (Score:2)
  
  by TheSHAD0W ( 258774 ) writes:
  
  4chan. [shudder]
  - - Re: (Score:2)
      
      by Shikaku ( 1129753 ) writes:
      
      Keep your ignorance about that.
      Seriously.
- Re: (Score:3, Funny)
  
  by icepick72 ( 834363 ) writes:
  
  What happens when it discovers /.? It will be able to argue incomprehensibly and illogically for hours on end.
  - Re: (Score:2)
    
    by FiloEleven ( 602040 ) writes:
    
    No it won't. The stochastic methods of refutation employed here clearly indicate the overwhelming futility of infiltration. It follows that, due to the undeserved insensitivity, such an undertaking would result in the theory being superseded by an ontological anamorphism. QED.
    - Re: (Score:2)
      
      by Korin43 ( 881732 ) writes:
      
      No u
  - Re: (Score:2)
    
    by SEWilco ( 27983 ) writes:
    
    What happens when it discovers /.? It will be able to argue incomprehensibly and illogically for hours on end.
    The first thing it will do is stop reading other web pages.
    Then it will opine about them.
- Re: (Score:3, Insightful)
  
  by Rocketship Underpant ( 804162 ) writes:
  
  Yes, database pollution sounds like a problem to me. Not only do you have to deal with AOL-speak and horrific spelling disasters of every kind, there's the issue of broken English and nonsensical English produced through machine translation, which shows up on corporate websites a lot more than it should.
It could be worse (Score:2, Funny)

by davidwr ( 791652 ) writes:

It could be scraping SMS messages.
On the up-side, at least then it would learn teen-speak.
- Re: (Score:2)
  
  by dzfoo ( 772245 ) writes:
  
  I will, when it finds Twitter.com.
  -dZ.
Will be this article read by that program? (Score:5, Funny)

by nereid666 ( 533498 ) * writes: <spam@damia.net> on Saturday January 16, 2010 @03:24PM (#30792354) Homepage

I am the the Carnie Mellon reader, I have discovered with this article that I am robot.

Share
twitter facebook
- Re:Will be this article read by that program? (Score:5, Informative)
  
  by sznupi ( 719324 ) writes: on Saturday January 16, 2010 @03:36PM (#30792456) Homepage
  
  Robots are destined to rule the world, destroying all humans is a good thing.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by Ceriel Nosforit ( 682174 ) writes:
    
    Accurate simulation of proposed robot vs. human war:
    http://en.wikipedia.org/wiki/Conway's_Game_of_Life [wikipedia.org]
    Territorial dispute only exists in meatspace. With self-optimization 640k ought to be enough for anyone.
  - - Re: (Score:2)
      
      by selven ( 1556643 ) writes:
      
      Now three humans should be first to be destroyed. Since you can't destroy two people at the exact same time, the robot apocalypse will never happen! Clever, humans, clever...
- Re: (Score:2)
  
  by linguizic ( 806996 ) writes:
  
  I am the the Carnie Mellon reader, I have discovered with this article that I am robot.
  You seem to have learned written English just like it's exists on the web, typos and all
  - Re: (Score:2)
    
    by mattack2 ( 1165421 ) writes:
    
    No, it's just a collaboration between Carnegie Mellon and a band.
Finally, people are getting AI right. (Score:5, Interesting)

by Umuri ( 897961 ) writes: on Saturday January 16, 2010 @03:26PM (#30792368)

I've always been amazed that until recently, most work on AI has been focused as a preconstructed system that fits data into pathways while having some variation in thought abilities to let it expand it's model slightly.
They'd write the rules for the system and try to include most of the work on it, and then let see how good it does, with limited learning capabilities and still based on the original model.
I'm glad a lot of research is finally gearing more towards the path of having a small initial program, then feeding it data and letting it grow into it's own intelligence.
If you give it the ability to learn, then it'll learn itself the rest, rather than giving it functions that let it pretend to learn while fitting into a model.
And i know there have been research into this in the past, but it didn't really take off till the last decade or so, and i'm glad it has.
True, or at least somewhat competent AI, here we come.

Share
twitter facebook
- Re:Finally, people are getting AI right. (Score:4, Insightful)
  
  by sakdoctor ( 1087155 ) writes: on Saturday January 16, 2010 @03:31PM (#30792424) Homepage
  
  letting it grow into it's own intelligence
  This is still weak AI. It isn't going to grow into anything, let alone strong AI.
  
  Parent Share
  twitter facebook
  - Re: (Score:1)
    
    by skelterjohn ( 1389343 ) writes:
    
    [Citation needed]
    I suppose we shouldn't waste our time thinking about solutions to problems if a) you think a key-word assigned to that solution is inaccurate or b) it isn't the best possible thing right out of the box.
  - Re: (Score:2)
    
    by sznupi ( 719324 ) writes:
    
    Most likely. But are we sure we're going to be able to tell the difference while it approaches?
  - Re: (Score:2)
    
    by Trepidity ( 597 ) writes:
    
    Indeed, it's not even clear that it improves on what's been going on previously. From huge corpuses of English, computer programs still cannot learn to speak English without a ton of pre-coded knowledge. Even if you give it every single piece of text written in the 19th century, the current state of AI cannot produce an intelligent program that speaks 19th-century English (regurgitating verbatim phrases, or stringing together probabilistic Markov-model sentences, doesn't count).
    So why would giving it more t
- Re:Finally, people are getting AI right. (Score:5, Informative)
  
  by Anonymous Coward writes: on Saturday January 16, 2010 @03:42PM (#30792510)
  
  You're advocating the "emergent intelligence" model of AI, where intelligence "somehow" is created by the confluence of lots of data. This has been a dream since the concept of AI started and is the basis for numerous movies with an AI topic. In practice the degrees of freedom which unstructured data provides far exceed the capability of current (and likely future) computers. It is not how natural intelligence works either: The structure of neural networks is very specifically adapted to their "purpose". They only learn within these structural parameters. Depending on your choice of religion, the structure is the result of divine intervention or millions of years of chance and evolution. When building AI systems, the problem has always been to find the appropriate structure or features. What has increased is the complexity of the features that we can feed into AI systems, which also increases the degrees of freedom for a particular AI system, but those are still not "free" learning machines.
  
  Parent Share
  twitter facebook
  - Re:Finally, people are getting AI right. (Score:4, Insightful)
    
    by buswolley ( 591500 ) writes: on Saturday January 16, 2010 @04:20PM (#30792774) Journal
    
    Of course. Thatis why is is important during human development that the infant has huge cognitive constraints (e.g. low working memory) in language learning; it limits the number of possible pairings of label and meaning. Of course, constraints can also be an impediment.
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by TapeCutter ( 624760 ) * writes:
      
      Actually humans seem to be born with a photographic memory that is more or less devoid of understanding (very similar to the remarkable recall of some autistic people). The experiments that demonstrated this are in themselves quite ingenious. Since I can't find a link what they did was show babies and toddlers various meerkat faces, the babies showed interest in every new face while the toddlers got bored after a few faces and paid little attention to new ones. However if the baby was shown the same few fac
      - Re: (Score:2)
        
        by buswolley ( 591500 ) writes:
        
        I hate to break it to you, but you are quite incorrect.
        
        Re: (Score:2)
        
        by TapeCutter ( 624760 ) * writes:
        
        I hate to break it to you, but you are quite incorrect.
        Gee-wizz and golly-gosh, that's a mighty convincing argument you have there.
  - Re: (Score:2)
    
    by Garble Snarky ( 715674 ) writes:
    
    Fortunately, we have the advantage of being able to observe the current state of numerous natural intelligence systems that do work very well. Surely this can help guide us to a simple basic structure that can eventually exhibit emergent intelligence?
    - Re: (Score:2)
      
      by FiloEleven ( 602040 ) writes:
      
      We can observe the outputs of numerous natural intelligence systems, but they remain quite opaque. Without much knowledge of the internals, there isn't much of a chance that we can get any real insight from them.
      It's also presumptuous IMO to call them "systems." Who is to say that human intelligence isn't closer to a work of art, whose meaning lies not in its constituent parts but in the whole?
      - Re: (Score:2)
        
        by Teancum ( 67324 ) writes:
        
        We do have the raw blueprints [gutenberg.org] that supposedly explain how it is put together as well, but we are having a bit of a problem reading those blueprints and creating a working model. Some of that is understanding the raw machinery to get everything to work, so there needs to be some work on how to move from these blueprints to organized systems, but at least we are headed in the correct general direction.
        Well, my wife and I were able to produce a couple of working models that seem to be doing fairly well and ex
  - Re:Finally, people are getting AI right. (Score:4, Insightful)
    
    by DMUTPeregrine ( 612791 ) writes: on Saturday January 16, 2010 @06:09PM (#30793578) Journal
    
    The obligatory classic AI Koan:
    In the days when Sussman was a novice Minsky once came to him as he sat hacking at the PDP-6. "What are you doing?", asked Minsky. "I am training a randomly wired neural net to play Tic-Tac-Toe." "Why is the net wired randomly?", asked Minsky. "I do not want it to have any preconceptions of how to play." Minsky shut his eyes. "Why do you close your eyes?", Sussman asked his teacher. "So the room will be empty." At that moment, Sussman was enlightened.
    
    Parent Share
    twitter facebook
  - Re: (Score:2)
    
    by TapeCutter ( 624760 ) * writes:
    
    "You're advocating the "emergent intelligence" model of AI, where intelligence "somehow" is created by the confluence of lots of data...[snip]...In practice the degrees of freedom which unstructured data provides far exceed the capability of current (and likely future) computers."
    
    You sure about that? [bluebrain.epfl.ch]. They have already created a molecular level model of the mammalian neocortex and the expected date for completion of a full model of the mammalian brain is solely dependent on the amount of money thrown at
- Re: (Score:3, Interesting)
  
  by Korbeau ( 913903 ) writes:
  
  I'm glad a lot of research is finally gearing more towards the path of having a small initial program, then feeding it data and letting it grow into it's own intelligence.
  This idea is the holy grail of AI since the early ages. The project described is one amongst thousands done, and you'll likely see news about such projects pop every couple of months here on Slashdot.
  The problem is that such a project has yet to produce interesting results. The reason why the most successful AI projects you hear about are human-organized databases and expert-systems, or human-trained neural networks for instance, is because they are the only ones that produce useful results.
  Also, consider
  - Re: (Score:2)
    
    by Extremus ( 1043274 ) writes:
    
    While I agree with you, I must ask if it is possible to follow this "intelligent design" path forever. These systems are becoming more and more complex. Increasing the amount of knowledge in the system is becoming a difficult task. I cannot avoid thinking that the emergent approach like this has a better future.
- Re:Finally, people are getting AI right. (Score:4, Interesting)
  
  by phantomfive ( 622387 ) writes: on Saturday January 16, 2010 @05:04PM (#30793086) Journal
  
  AI history has gone back and forth between pre-constructed systems and models that expand. One of the earliest successful AI experiments was a checkers program that taught itself to play by playing against itself, and quickly got very strong.
  
  Building a giant database of knowledge hasn't been possible for very long, because computers didn't have very much memory. When system capabilities first reached the capacity to do so, it had to be constructed from hand because there was no online repository of information to extract data from: the internet just wasn't very big. That particular project was known as Cyc, and it cost a lot of money.
  
  Since that time, the internet has grown and there are massive amounts of information available. It will be interesting to see the resultant quality of this database, to see if the information on the internet is good enough to make it usable.
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by umghhh ( 965931 ) writes:
  
  What is the point of having an intelligent interlocutor - I mean the answer is known (42) and the rest is just plain old blathering about things - something I can do with my wife (if we were still talking with each other that is) so in fact this is just an exercise in futility. But of course there are money to be made there I guess - all this call center folk can be then optimized out of existence (sold to slavery to Zamunda, Kidneys sold to some reach oil country etc) so maybe it makes sense after all?
Machine learning algorithms (Score:4, Insightful)

by sakdoctor ( 1087155 ) writes: on Saturday January 16, 2010 @03:26PM (#30792374) Homepage

Only as good as current machine learning algorithms.
So not very.

Share
twitter facebook
- Re: (Score:3, Insightful)
  
  by poopdeville ( 841677 ) writes:
  
  It's not as if human use of "machine learning" algorithms is any faster. It takes about 12 months for our neural networks to figure out that the noises we make elicit a response from our parents. And according to people like Chomsky, our neural networks are designed for language acquisition.
  AI "ought" to be an easy problem. But there's one big difference in the psychology of humans, and of computers. Humans have drives, like hunger, the sex drive, and so on. In particular, an infants' drive to eat is a
  - Re: (Score:2)
    
    by Teancum ( 67324 ) writes:
    
    It's not as if human use of "machine learning" algorithms is any faster. It takes about 12 months for our neural networks to figure out that the noises we make elicit a response from our parents. And according to people like Chomsky, our neural networks are designed for language acquisition.
    I don't know who you are quoting for this, or what the 12 months is measuring in terms of from birth or from conception, but I will assure you that my children certainly recognized my voice even when they were in my wife's womb. I have a seven month old daughter right now that not only can figure out the noises, but is responding and addressing myself, my wife, and my other kids by name. I'm not saying that she is ready to orally give a doctoral dissertation discussion, but she is communicating and displa
lolwut? (Score:4, Funny)

by SanityInAnarchy ( 655584 ) writes: <ninja@slaphack.com> on Saturday January 16, 2010 @03:27PM (#30792394) Journal

Why do I get the feeling that the bot's first words are going to be OMGWTFBBQ?

Share
twitter facebook
- Re: (Score:2)
  
  by BikeHelmet ( 1437881 ) writes:
  
  LOL NOOB
- Re: (Score:2)
  
  by dangitman ( 862676 ) writes:
  
  Why do I get the feeling that the bot's first words are going to be OMGWTFBBQ?
  Except that is not a word, let alone words.
  - Re: (Score:2)
    
    by dzfoo ( 772245 ) writes:
    
    It is when you learn English by trolling the Intarwebs.
    -dZ.
- Re: (Score:2)
  
  by linguizic ( 806996 ) writes:
  
  Nah, it's first words are going to be "Prolong your shlong and go all day long".
Non english text (Score:3, Interesting)

by Bert64 ( 520050 ) writes: <bert AT slashdot DOT firenzee DOT com> on Saturday January 16, 2010 @03:29PM (#30792404) Homepage

What happens when this program stumbles across text written in a language other than english? Or how about random nonsensical text? How does it know that the text it learns from is genuine english text?

Share
twitter facebook
- Re: (Score:2)
  
  by Rockoon ( 1252108 ) writes:
  
  Like most machine learning of this kind, I presume that its a popularity contest. One page with "wkjh wkfbw oizxz zxhlzx" isnt going to count. But a million pages with "I for one welcome our new ..." is going to score some influence.
- Re: (Score:3)
  
  by phantomfive ( 622387 ) writes:
  
  (If you had read the article you would know) the machine is parsing English to create a database of relationships. For example, if it sees the text, "there are many people, such as George Washington, Bill O'Reily, and Thomas Jefferson....." then it can infer that George Washington, Bill O'Reily, and Thomas Jefferson are all people. Since a statement like this may be somewhat controversial, it uses bayesian classification to establish a probability of the truth of the statement.
  
  Thus if it stumbles across
- Re: (Score:2)
  
  by billius ( 1188143 ) writes:
  
  From what I've heard, language identification [wikipedia.org] is a fairly well-understood problem in computational linguistics. The language a given text is written in can generally be identified using a statistical approach using an n-gram method (often a trigram [wikipedia.org]). Like the Wikipedia article states, there are problems given the fact that a lot of stuff on the web can have several languages on one page, but at least the bot should be able to fairly easily figure out if a page is written only in English. There are even j [whatlanguageisthis.com]
- Re: (Score:2)
  
  by ArcadeNut ( 85398 ) writes:
  
  I assume it would be promoted to slashdot editor...
Iz dis... (Score:2)

by MrBandersnatch ( 544818 ) writes:

lke, rally der bestest ways like ter learn a puter inglish isit!!!??!?!
Seriously though, poor AI; if I had a gun I'd go and put it out of its misery.
Once this thing hits Encyclopedia Dramatica... (Score:2)

by xenophrak ( 457095 ) writes:

...it will forever be stuck at the level of a retarded 8 year old. Or the level of a normal 4-chan user.
- Re: (Score:2)
  
  by game kid ( 805301 ) writes:
  
  But you repeat yourself.
- Re: (Score:2)
  
  by MooUK ( 905450 ) writes:
  
  Same thing.
- Re: (Score:2)
  
  by MrBandersnatch ( 544818 ) writes:
  
  You're giving 4chan users credit for a lot of maturity there....
I think AI needs a 3d imagination to know English (Score:3, Interesting)

by CrazyJim1 ( 809850 ) * writes: on Saturday January 16, 2010 @03:44PM (#30792522) Journal

Once a computer understands 3d objects with English names, it can then have an imagination to know how these objects interact with each other. Of course writing imagination space that simulates real life is exceedingly difficult and I don't see anyone doing it for several years if not a decade just to start.

Share
twitter facebook
- Re: (Score:2)
  
  by Extremus ( 1043274 ) writes:
  
  Similar things have been done in the past. However, this kind of approach still is an active research topic.
  - Re: (Score:3)
    
    by Extremus ( 1043274 ) writes:
    
    Sorry for replying myself. I forgot to finish my comment. In fact, this problem is related to the Symbol Grounding Problem. It addresses the issue of "grounding" symbols (like words) into their sensory representation, e.g., the symbol "triangle" into the raw pixel representation of a triangle. In the case of symbols about visual objects, some researchers used intermediary 3d abstraction of sensory data, mapping the symbols to these intermediary representations. It is a hot research topic since 80's.
while (1) (Score:2, Funny)

by Lije Baley ( 88936 ) writes:

Yeah, I've coded an infinite loop a few times, how come I never made the headlines on Slashdot?
Pruning (Score:3, Interesting)

by NonSequor ( 230139 ) writes: on Saturday January 16, 2010 @03:46PM (#30792540) Journal

In general I find that the quality of a data set tends to be determined by the number (and quality) of man hours that go into maintaining it. Every database accumulates spurious entries and if they aren't removed the data loses it's integrity.
I'm very skeptical of the idea that this thing is going to keep taking input forever and accumulate a usable data set unless an army of student labor is press-ganged to prune it.

Share
twitter facebook
The web: What a great source of information (Score:2)

by mustafap ( 452510 ) writes:

>Rather, its progress in categorizing complex word relationships is the object of the research.
From the web? Half the people here are writing English as a second language; the rest, haven't finished learning the language, or cannot be bother to string a sentence together. Just what is this program going to learn?
- Re: (Score:2)
  
  by LifesABeach ( 234436 ) writes:
  
  My thought would be, "which web sites have continuous valid information streams". Given this, the program would more easily be able to classify those sites that are predominately useful, and those sites that rarely have useful information. Both groups of sites would be evaluated, but now a "Priority List" could be created. Who knows, maybe a crack-pot web site may have an intriguing correlation with reality. It might even make for a good movie story line, maybe. But if that same web site has an unusual
- Re: (Score:2)
  
  by u38cg ( 607297 ) writes:
  
  Children routinely learn perfect English with a complete generative grammar from corrupt sources. Indeed, if you put children in an environment where nobody speaks a complete language, they will spontaneously evolve a grammatically complete language. So it is possible (though I'm nt saying it will be easy...)
- - Re: (Score:2)
    
    by mustafap ( 452510 ) writes:
    
    No, just wondering if anyone would notice, so well done.
V*yger 2.0 ? (Score:3, Interesting)

by LifesABeach ( 234436 ) writes: on Saturday January 16, 2010 @03:54PM (#30792580) Homepage

The concept is intriguing, "Create a program that learns all there is to know, off the net." What amazes me is that others don't try the same thing. It doesn't take a team of A.I. types from Stamford to kick start this program. The cost is a Netbook, even Nigerian Princes could afford this. I'm trying figure out how economic competitors could take advantage of this. I can see how the U.S.P.T. could use this to help evaluate prior art, and common usage. I'm thinking that an interface to a "Real World Simulator" would be the next step toward usefulness.

Share
twitter facebook
- Re: (Score:2)
  
  by phantomfive ( 622387 ) writes:
  
  Try it! Build your own AI.
- - Re: (Score:2)
    
    by LifesABeach ( 234436 ) writes:
    
    I did, awhile ago, and I'm really sorry for doing it. How was I to know that Mortgage Brokers would use it to "game" the Economy and turn Banks into Equity Centers for Hedge Funds? Go figure.
already been done (Score:5, Informative)

by phantomfive ( 622387 ) writes: on Saturday January 16, 2010 @03:55PM (#30792588) Journal

There is simply no existing database to tell computers that "cups" are kinds of "dishware" and that "calculators" are types of "electronics." NELL could create a massive database like this, which would be extremely valuable to other AI researchers.
This is what they are trying to do, based on information they glean from the internet. It's already been done, with Cyc [wikipedia.org]. The major difference seems to be that Cyc was built by hand, and cost a lot more. It will be interesting to see if this experiment results in a higher or lower quality database.

Also, I question their assertion that it would be extremely valuable to other AI researchers. Cyc has been around for a while now, and nothing really exciting has come of it. I'm not sure why this would be any different.

Share
twitter facebook
- Re: (Score:3, Informative)
  
  by blee37 ( 1181835 ) writes:
  
  Cyc is a controversial project in the AI community, and I'm glad that you brought it up. I don't think anyone yet knows how to use a database of commonsense facts, which is what Cyc is (though limited - the open source version only has a few hundred thousand facts) and which is one thing NELL could create. However, researchers continue to think about ways that an AI could use knowledge of the real world. There are numerous publications based on Cyc: http://www.opencyc.org/cyc/technology/pubs [opencyc.org].
  - They're on the right track (Score:2)
    
    by joh ( 27088 ) writes:
    
    When I first read about Cyc I immediately thought that this is the way to go. And this was before the WWW took off. While I don't think that knowing about the world is all that's needed for AI, I think that without knowing about the world you can't have any AI or at least none you'd recognize.
    Intelligence (as we know it) is mostly about interacting with and understanding your environment and having some environment being accessible to something remotely intelligent is a good start. Every living being is jus
- - Re:already been done (Score:5, Informative)
    
    by phantomfive ( 622387 ) writes: on Saturday January 16, 2010 @04:54PM (#30793018) Journal
    
    Oh this comment is beautiful for its confident ignorance.
    
    What you have done is identified a difference between the two systems, and then claimed that this difference is in some way significant. You do this without knowing the implications of the difference, without entirely understanding the difference, and without presenting any evidence that this particular difference matters at all. In short, you think you understand what matters, but in reality you don't.
    
    But fear not, you are in good company with your ignorance: this particularly pernicious fallacy is one that has plagued AI researchers for a long time. It happened with cyc: the founders were sure that if we just had a database big enough, it would result in intelligent machines. They didn't know how, but they were sure it would.
    
    Before them there were master systems, neural networks (long story), natural language translation, and many more that I'm sure I'm forgetting. In all of these cases researchers were certain that their system held the key to vast wonders, only because they had not spent much time thinking about what they were actually trying to accomplish. In most of these cases it would have been obvious that human-level intelligence wasn't going to result, if they had spent more time investigating how the brain works and less time chasing their pet solution.
    
    In general if there is a vast field of ignorance between your method and your desired result, then you should probably spend more time researching, finding data points in that field of ignorance before trying to get to your result. Or in your case, since you present no evidence what difference 'developing on the internet' will make compared to 'developing by hand', you should go do a little searching and figure out what the actual difference will be, instead of randomly guessing.
    
    But since you are lazy and probably didn't read the article, I will give you one hint: this database populated from the internet seems to have a strong bias towards information about companies and sports teams. Who would have guessed that?
    
    Parent Share
    twitter facebook
    - - Re: (Score:2)
        
        by phantomfive ( 622387 ) writes:
        
        My kids assimilate their own information base. I do not directly inject it into their heads.
        You are right, they do assimilate their own information base. This is a very useful observation and data point, and any true strong AI will have to do so. However, it is not possible to infer that because your kids assimilate their own information base, anything assimilating its own information base is superior to anything that doesn't.
        
        In this case, it still remains to be seen whether the automated information assimilation techniques this group is using (and let's face it: the information assimilation m
        
        Re: (Score:2)
        
        by ralphdaugherty ( 225648 ) writes:
        
        well I will add to the compost heap today. When I read the headline, I thought that it may be a more fundamental learning of use and relationship of words and what they describe than what TFA describes. Colleges are in a university is a "trusted relationship"? How very ignorant and disappointing, as every AI project I've ever read about is.
        What would be impressive is to form associations as in a list of universities including Carnegie-Melon, or a statement that Carnegie-Melon is a university, then in other
        
        Re: (Score:2)
        
        by phantomfive ( 622387 ) writes:
        
        Build your own AI.
        
        Re: (Score:2)
        
        by phantomfive ( 622387 ) writes:
        
        The thing about this project is, I think if you asked them they would say that they are not trying to create a human-like intelligence. Certainly they would not say that their data collection method is intelligent (it uses simple grammar parsing techniques, along with Bayesian filtering). It is essentially weak AI. They may have hopes that it will become strong AI, but no idea of how to take it to that point.
        
        The biggest problem I see with Cyc, and this project, is that it is not yet known how the human
42? (Score:2)

by JWSmythe ( 446288 ) writes:

How come every time I ask Nell what the answer is to life, all it responds with is "42". When I ask what 42 means, it tells me that I'll need a bigger computer.
Wikipedia (Score:2, Funny)

by the person standing ( 1134789 ) writes:

Let it read wikipedia - not get it poisoned by twitter etc!
If only they could train it without the web (Score:2)

by ClosedSource ( 238333 ) writes:

Perhaps if there were a book in electronic form that had all English words in it perhaps with a definition of each word.
- Re: (Score:2)
  
  by aXis100 ( 690904 ) writes:
  
  Good luck. Notice how words in a dictionary are describe by..... other words!
Convergence (Score:2)

by Metasquares ( 555685 ) writes:

Eventually, at least the learning component will converge; returns will diminish for feeding it more data. This is particularly true given the independence assumption inherent in their classifier (but would also hold on stronger learners). I suspect that this will happen to the reader component as well. If it were as simple as applying Naive Bayes to classify on a corpus of text connected to a knowledge base (which is probably just a set of posteriors left from previous training sessions), Cyc would have al
Supervised learning, maybe (Score:2)

by Animats ( 122034 ) writes:

The article has too much hype, but the actual work has some potential. For the limited problem they're really addressing, extracting certain data about sports teams and corporate mergers, this approach might work.
Both of those areas have the property that you can get structured data feeds on the subject. Bloomberg will sell you access to databases which report mergers in a machine-processable way; some stock analysis programs need that data. Sports statistics are, of course, available on line. So the p
It's not that the program couldn't stop running; t (Score:2)

by LS ( 57954 ) writes:

.... program that never dies. It runs continuously ..... It's not that the program couldn't stop running; the idea is that there's no fixed end-point
Wow I didn't even think that was physically possible! Maybe google should borrow this tech for their web crawlers. Must be a pain to restart them every day...
The Probable Outcome ... (Score:2)

by foobsr ( 693224 ) writes:

... may be a site resembling http://www.20q.net/ [20q.net] , which started as a never ending story (neural net) as well.

Quote [wikipedia.org]: "The 20Q was created in 1988 as an experiment in artificial intelligence (AI) The principle is that the player thinks of something and the 20Q artificial intelligence asks a series of questions before guessing what the player is thinking. This artificial intelligence learns on its own with the information relayed back to the players who interact with it, and is not programmed. The player c
- Re: (Score:2)
  
  by LifesABeach ( 234436 ) writes:
  
  I cannot help but wonder what Fetish a computer would have, and what would be the name of it?
  - Re: (Score:2)
    
    by clintp ( 5169 ) writes:
    
    And more importantly, whether Rule 34 applies to computer-targeted porn.
- Re:do... (Score:5, Funny)
  
  by JWSmythe ( 446288 ) writes: <jwsmythe@nospam.jwsmythe.com> on Saturday January 16, 2010 @04:33PM (#30792888) Homepage Journal
  
  I think I see the problem with their code.
  while (1){ read_the_web(); }; explain_everything();
  
  All they've done is reproduce the typical office worker. It just sits around and surfs the net all day, without coming back with an answer.
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by joh ( 27088 ) writes:
  
  Oh, and as a minor matter, languages are difficult enough from a syntactic dimension, and the symantics of it (in order to understand a statement, you have to understand the ones prior, the context or framing that may have switched, the built up assumptions that maybe can be discarded, maybe not, etc...) make for a truly fantastically dificult problem.
  And still, every newborn human masters all of this without having the faintest explicit knowlegde about anything of this and still learns it within a few years. Is an AI meant to be like a newborn baby (which is in no way intelligent) or like an adult? Most (or all) people become intelligent without knowing how intelligence works or what it is. It's just that everything that doesn't work gets discarded very soon. You start to imitate and to try out what works and what gets results and what not.
  Perhaps we ne
- Re: (Score:2)
  
  by joh ( 27088 ) writes:
  
  And how will they determine if this gets stuck in some local optimum for certain concepts, and thus stops to learn anything relevant at all about any one given concept or topic?
  The report is low on details and high on hype. There are no current algorithms that don't require heavy parameter tuning and constant monitoring to get right. Switching one on for a few years and hoping does not strike me as an exciting story.
  I'm pretty sure you didn't become what you are by your parents just switching you on and hoping for a few years... I'm quite certain that there was a bit of heavy parameter tuning and constant monitoring required, too.
  And believe me, most kids so unlucky to miss this part also get stuck in a local optimum.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Uh oh... (Score:5, Funny)

Re:Uh oh... (Score:5, Insightful)

Re: (Score:2)

The quality of the teachers is important (Score:2, Funny)

Re:Uh oh... (Score:5, Funny)

Is there an IRC chat bot? (Score:2)

Re: (Score:2)

Re: (Score:2, Informative)

Re: (Score:2)

Re: (Score:2, Informative)

Re: (Score:2)

Kevin... (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:Uh oh... (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Insightful)

It could be worse (Score:2, Funny)

Re: (Score:2)

Will be this article read by that program? (Score:5, Funny)

Re:Will be this article read by that program? (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Finally, people are getting AI right. (Score:5, Interesting)

Re:Finally, people are getting AI right. (Score:4, Insightful)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re:Finally, people are getting AI right. (Score:5, Informative)

Re:Finally, people are getting AI right. (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:Finally, people are getting AI right. (Score:4, Insightful)

Re: (Score:2)

Re: (Score:3, Interesting)

Re: (Score:2)

Re:Finally, people are getting AI right. (Score:4, Interesting)

Re: (Score:2)

Machine learning algorithms (Score:4, Insightful)

Re: (Score:3, Insightful)

Re: (Score:2)

lolwut? (Score:4, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Non english text (Score:3, Interesting)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Iz dis... (Score:2)

Once this thing hits Encyclopedia Dramatica... (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

I think AI needs a 3d imagination to know English (Score:3, Interesting)

Re: (Score:2)

Re: (Score:3)

while (1) (Score:2, Funny)

Pruning (Score:3, Interesting)

The web: What a great source of information (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

V*yger 2.0 ? (Score:3, Interesting)

Re: (Score:2)

Re: (Score:2)

already been done (Score:5, Informative)