CMU Web-Scraping Learns English, One Word At a Time 148
blee37 writes "Researchers at Carnegie Mellon have developed a web-scraping AI program that never dies. It runs continuously, extracting information from the web and using that information to learn more about the English language. The idea is for a never ending learner like this to one day be able to become conversant in the English language." It's not that the program couldn't stop running; the idea is that there's no fixed end-point. Rather, its progress in categorizing complex word relationships is the object of the research. See also CMU's "Read the Web" research project site.
Finally, people are getting AI right. (Score:5, Interesting)
I've always been amazed that until recently, most work on AI has been focused as a preconstructed system that fits data into pathways while having some variation in thought abilities to let it expand it's model slightly.
They'd write the rules for the system and try to include most of the work on it, and then let see how good it does, with limited learning capabilities and still based on the original model.
I'm glad a lot of research is finally gearing more towards the path of having a small initial program, then feeding it data and letting it grow into it's own intelligence.
If you give it the ability to learn, then it'll learn itself the rest, rather than giving it functions that let it pretend to learn while fitting into a model.
And i know there have been research into this in the past, but it didn't really take off till the last decade or so, and i'm glad it has.
True, or at least somewhat competent AI, here we come.
Non english text (Score:3, Interesting)
What happens when this program stumbles across text written in a language other than english? Or how about random nonsensical text? How does it know that the text it learns from is genuine english text?
I think AI needs a 3d imagination to know English (Score:3, Interesting)
Pruning (Score:3, Interesting)
In general I find that the quality of a data set tends to be determined by the number (and quality) of man hours that go into maintaining it. Every database accumulates spurious entries and if they aren't removed the data loses it's integrity.
I'm very skeptical of the idea that this thing is going to keep taking input forever and accumulate a usable data set unless an army of student labor is press-ganged to prune it.
V*yger 2.0 ? (Score:3, Interesting)
Re:Finally, people are getting AI right. (Score:3, Interesting)
I'm glad a lot of research is finally gearing more towards the path of having a small initial program, then feeding it data and letting it grow into it's own intelligence.
This idea is the holy grail of AI since the early ages. The project described is one amongst thousands done, and you'll likely see news about such projects pop every couple of months here on Slashdot.
The problem is that such a project has yet to produce interesting results. The reason why the most successful AI projects you hear about are human-organized databases and expert-systems, or human-trained neural networks for instance, is because they are the only ones that produce useful results.
Also, consider that we are not talking about "pixel-ants" that only have very few possible inputs and outputs, but we are talking about a system that understand and do something meaningful with natural language, something a normal human being doesn't completely grasps until he is at least a teenager, with the constant help of parents, friends, teachers, television etc. all along these years.
Re:Finally, people are getting AI right. (Score:4, Interesting)
Building a giant database of knowledge hasn't been possible for very long, because computers didn't have very much memory. When system capabilities first reached the capacity to do so, it had to be constructed from hand because there was no online repository of information to extract data from: the internet just wasn't very big. That particular project was known as Cyc, and it cost a lot of money.
Since that time, the internet has grown and there are massive amounts of information available. It will be interesting to see the resultant quality of this database, to see if the information on the internet is good enough to make it usable.
Re:Uh oh... (Score:5, Interesting)
The quality of the teachers is important when learning.
That's seriously kind of interesting, actually: It makes me wonder if decades from now software developers will be few and far between, designing the AI algorithms for modern programs while the rest of us find work as software tutors, training those programs to do their business function.