The Future of Google Search and Natural Language Queries 148
eldavojohn writes "You might know the name Peter Norvig from the classic big green book, 'AI: A Modern Approach.' He's been working for Google since 2001 as Director of Search Quality. An interview with Norvig at MIT's Technology Review has a few interesting insights into the 'search mindset' at the company. It's kind of surprising that he claims they have no intent to allow natural questions. Instead he posits, 'We think what's important about natural language is the mapping of words onto the concepts that users are looking for. But we don't think it's a big advance to be able to type something as a question as opposed to keywords ... understanding how words go together is important ... That's a natural-language aspect that we're focusing on. Most of what we do is at the word and phrase level; we're not concentrating on the sentence.'"
This is awesome. (Score:3, Funny)
Understatement of the year (Score:2)
Re:What's really the story (Score:4, Insightful)
Even if you mastered natural language (and I'm not saying that's a surmountable task) I think people would be shocked to see that Google searches would still be frustrating.
I'm not just saying "blame the user", I'm saying that language itself is not even the last obstacle to overcome. You're going to need to figure out an program that not only understands natural language, but also context, culture, etc.
Getting an AI of near-human intelligence is not enough, because to be really good at getting people the answers to questions they can't ask you have to be of above-average capability.
Re: (Score:3, Interesting)
User: Who is the winningest coach in football?
Search Engine: Did you mean, What coach has the most wins in football?
User: Yes
Search Engine: Did you mean American football?
User: Yes
Search Engine: NFL NCAA CFL...?
User: Umn, all of the above
Search Engine: Are you sure?
User: What?
Search Engine: Are you sure you want to compare all years, after all, NFL rules significantly changed in 2001, and leagues are not comparable...
Re: (Score:3, Funny)
Search Engine: Would you prefer woUser: NOW!!!
Search Engine: *sigh* As you wish...
Re: (Score:2)
natural language is an oxymoron (Score:4, Insightful)
I tend to agree with Norvig's focus on keywords and less emphasis on natural language. Trying to even define a natural language on top of a query engine introduces a layer of complexity probably unnecessary. Natural Language even introduces a level of noise to interfere with accurately (as possible) defining what the user is asking for.
Google has done a good job, and they get better each iteration figuring out what the user is looking for. I find their suggestion [google.com] an effective way to not only constrain a query, it actually provides a way to spell check in a pre-emptive way. If you've not used this, install the Firefox Google toolbar, or use the experimental Google "Suggest" [google.com]. Often Google will provide suggestions in the drop down menu that refine your search in ways you hadn't considered that drive to a more direct and accurate representation of your intended query. Of course if their suggestions don't satisfy, you get to continue typing your keywords to your heart's desire.
(I have to offer an example of suggestion's effectiveness. I often Google to get to the Chicago Tribune (I don't visit there often enough to have created a bookmark, plus it's easy to do this in anyone's browser). Simply typing the first four letters, "chic", I see the first suggestion is "Chicago Tribune". A simple TAB and RETURN, I'm on the Google page with the first link or so my link to the Tribune (with the added bonus of Google's breakout of sublinks).) Your mileage may vary (Google's ranking system may vary the order and options that appear in the drop-down over time), but I find it an amazingly effective research tool (suggestion, not the Trib).
Natural language is mostly trying to guess intent with structure and key words (as opposed to keywords), but at the end of the day, if you filter out the natural language, and focus on keywords you're going to end up in close to the same place.
Re: (Score:2)
Re: (Score:2)
"What is" is already mapped to "define:" as far as I know. "Who is" works in a similar way.
"Why did World War I start" or "what does a duck eat" are questions that require too much understanding and explanation of the concepts. But simple definitions, locations or numbers shouldn't be that difficult to spew out. "how many" co
Re:natural language is an oxymoron (Score:5, Informative)
Not at all. I do that kind of question in Google all the time.
Googling for "Why did World War I start" brings up, as the first result, an article titled "The Causes of World War I".
Followed by a few million more hits if that one isn't good enough.
And the question "What does a duck eat" gets many hits as well. The first one has, in the summary:
Ducks in the wild eat a variety of plants, insects, and native foods that will differ from...
I know it's just picking out keywords from the query and matching them to the sites, not trying to parse the natural language, but it works pretty damn well.
Re:natural language is an oxymoron (Score:4, Funny)
Re: (Score:2)
"The daily destination for women, with horoscopes, health and pregnancy information, message boards and blogs, celebrity gossip, beauty and more."
I think it's pretty much on the money there too!
Re: (Score:2)
Re: (Score:2)
How tall is Mt. Hood? According the U.S. Geological Survey, Mt. Hood is 3426 Meters (11239 Feet) tall. To learn more about Mt. Hood geology visit
Re: (Score:2)
Mount Hood -- Elevation: 11,249 feet (3,429 meters)
Re: (Score:2)
When I was growing up, I'd always heard that it was 11,235 feet tall, which I thought was very cool.
Re: (Score:2)
And that's exactly why almost the entire field of information retrieval is focused on these 'statistical' approaches instead of some sort of deep semantics. It works. Semantic analysis is very difficult, highly language dependent and slow as hell. For google to do something like that they would have to not only make it work, but make it work for many different languages and make it fast.
Let's not forget that information retrieval requires highly optimized algorithms. Linear time (over the size of the doc
Re: (Score:2)
Re: (Score:2)
I know it's just picking out keywords from the query and matching them to the sites, not trying to parse the natural language, but it works pretty damn well.
This is because Google uses a popularity-dependent algorithm. It's not popular to ask/answer questions like "What does a duck eat?" where duck was in the meaning of ducking, or something like that. Obviously, a natural language processor should use the same mechanism. There'd only be confusion here if two different meanings of the word competed for the top results (i.e. both being popularly asked), *or* if you searched for an unusual meaning of the word, but in a context that made it look like some other q
Re: (Score:2)
The thing is, Google doesn't have to understand the results. It just has to deliver the correct ones. And for that, keyword searching is good enough. In this case, it delivers several results for "many rivers" and "minnesota", but it also gives results for "rivers" and "minnesota", which is what you're actually looking for.
Re: (Score:2)
Those women don't know what they are missing.
Fun with Google Suggest (Score:2)
why is everything
can you eat
can you die from
where can I go to get
is it possible to
how would you
From playing with it for a few minutes, it seems that Google is mostly used by women in various stages of pregnancy, people worried that they might be arrested for using Limewire, and people looking for Wiis.
Re: (Score:2)
I tried several questions in google, and it performed really well, only having trouble with:
Why does ask suck and google not?
That came back with a bunch of results saying google sucked. My other questions seemed to produce very useful results. I think that was the point the dev wanted people to understand
Re: (Score:2)
Re:natural language is an oxymoron (Score:5, Interesting)
Re: (Score:2)
If you're using the Google Suggest page, I think the width is sufficient if you have the browser at any reasonable width, so I'm assuming you're talking about the drop down from the toolbar, in which case you're in luck. Type something to invoke the drop down, or click the arrow to look at history. In the lower right, you should see a handle, expand to your heart's content. It's nicely implemented, even pushes the box to the left if you're browser's too close to the right side of your screen. Enjoy.
Re: (Score:2)
It would really help if the right half of the drop-down weren't taken up by the word "Sugges..." on the first line, which for some reason also creates a big blank space on all the lines below it. They couldn't just put that as the first line, if they really need to point out that they're suggesting things to me?
Re: (Score:3, Informative)
The parent to my post is talking about the Google Search Box built-into firefox. The GP to my post is talking about the Google search page that has Suggest activated within it. It looks basically like the normal google search page up to the point you start typing-in queries.
Re: (Score:2)
S
the natural order of natural language (Score:2)
Sometimes it does matter. However, by the time you design a linguistic
Re: (Score:2)
Re: (Score:2)
Re: (Score:3, Funny)
Dishwasher: CHANGE TO MODE POTS_AND_PANS
You: I ent dun nuffink!
Dishwasher: CANCEL RINSE CYCLE
Re: (Score:2)
Simply typing the first four letters, "chic", I see the first suggestion is "Chicago Tribune". A simple TAB and RETURN, I'm on the Google page with the first link or so my link to the Tribune (with the added bonus of Google's breakout of sublinks).) Your mileage may vary (Google's ranking system may vary the order and options that appear in the drop-down over time), but I find it an amazingly effective research tool (suggestion, not the Trib).
I find this unlikely in the extreme. When most people start typing at google and reach "chic", Chicago is not exactly what they're looking for. (Or Hot Chicago pizza for that matter).
Lojban could help (Score:1)
Re: (Score:3, Insightful)
Re: (Score:2)
Fucking amateur Linguists everywhere [slashdot.org]
Re: (Score:2)
The problem with natural language searches... (Score:3, Insightful)
Re: (Score:2, Funny)
lol easy, looser
:P
Re: (Score:2)
- I'd like 'rullers, 'ugar, 'ucks and a Mikita 'cup... And then I think I would like a large...
- And could I please have 'elly donut and...
What?
- I'm sorry. And 'eaker 'oken.
Let me recap the order: A cruller, two sugar pucks, a large coffee with cream, a raspberry jelly doughnut, orange drink, a box of five-holes.
- Yeah.
Thank you. Drive around, please.
Re: (Score:2, Informative)
Re:The problem with natural language searches... (Score:4, Insightful)
The argument against universal grammar is of course is non-Latin languages like Japanese (and possibly Russian) which don't play by the rules. I'm not really a language expert on either, but I'm tried to learn Japanese and its really tough.
Everything is relationship based off the speaker and to the person or object he is talking about and then the audience. As in... If I'm talking about a pencil sitting on my desk, it has a different tense than a pencil on your desk and then a difference tense in someone else's hand or a pencil that is sitting at a far off place (-sara or -kara? I can't remember). And we haven't even gotten to issues about ownership like if it was in my hand or your hand.
Whereas in Latin based languages it is more concerned about action or tense of ownership but not relationship to the speaker or audience. Hence... It is argued universal grammar does not apply in that respect.
Re: (Score:2)
Re: (Score:3, Funny)
I'm not really a language expert on either, but I'm tried to learn Japanese and its really tough.
Perhaps you should try and nail down English first. :)
Cheers,
-jdm
Re: (Score:2)
Check out the language Lojban [lojban.org] for just one way to do this.
Re: (Score:2)
You do realize that universal grammar allows different languages to have different rules, right? The universality is about how nouns and verbs exist in sentences, and relate to each other, etc. Even in French vs. English, genderization and possession are wildly differ
Re: (Score:2)
As the other people who have already replied to you have mentioned, the differences you specify between Japanese and langauges more familiar to you have nothing to do with it not following universal grammar, they are simply ways Japanese differs from the other languages you have encountered. There actually are aspects of Japanese, which also appear in other languages, which linguists have trouble explaining (classifiers and double-nominative verbs likely among other features I am not familiar with), but tha
phrase/sentence? (Score:2)
Re:phrase/sentence? (Score:5, Informative)
Re: (Score:2)
i'd assume it's something like the difference between "how do i set up my d-link router?" and "d-link router set up". i believe google already parses out "natural language" queries about as well as any other search engine, including ask jeeves, which was supposed to use natural language as its unique selling proposition. google does give different results for both queries but both sets of results seem to be relevant.
i'm more curious about how the use
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
somebody tell me (Score:2)
Re: (Score:2, Informative)
Re: (Score:2)
Re: (Score:2)
You got everything right except for the part I highlighted. Humans "invented" language like we invented walking -- namely not at all. Language evolved.
Google and Asimov's fictional Multivac (Score:4, Interesting)
In the 1960s and thereabouts, when I used to hack around on minicomputers, but personal computers weren't well known to the general public, I always found it difficult to explain what computers did. One of their commonest questions was "Well, how does it work, do you type in questions and does it answer them?" Programming in assembly language didn't really fit that description.
Many technological fantasies seem to remain surprisingly distance. I tried ViaVoice and gave up: it's not a "voice typewriter." Roomba is not a general-purpose housekeeping humanoid-form robot, and neither are the machines that weld automobile chassis.
However, it seems to me that Google is within striking distance of Asimov's "Multivac" fantasy.
Incidentally, if you type in queries as complete sentences Google seems to do any worse than if you don't. Sort of the converse of adventure games, where one begins by typing "Walk over to the table on the left and pick up the silver key with your left hand" and quickly learns to use telegraphic style: "Go table. Take key."
Oh, for an "edit" button... (Score:2)
Re: (Score:2)
Also, "What is the sine of half pi and a half times the cosine of one quarter plus the answer to life, the universe, and everything?" works correctly. I found this pretty awesome.
Re: (Score:2)
What is one plus one? [google.com] will give you 2.
What is the speed of light? [google.com] will give you 299 792 458 m / s.
And maybe even something like...
What is the Capital of Sweden? [google.com] will give you Stockholm.
Will give you the answer at the top of the screen.
Of course if you type
What is the reason for Napoleon's 1812 defeat? [google.com]
It will give you the 1812 overture as the first hit so it has a bit more to go on context.
Re: (Score:2)
The search engines have always been good at playing Jeopardy. Typing the exact text of error messages tends to always lead to "What caused this?" and "How do I fix this?".
Re: (Score:2)
That, or Google believes it is on to a little known historical secret, related to the role of music in warfare.
Re: (Score:2)
Great advance (Score:1)
You Usually Can Now (Score:2)
this is also why (Score:5, Insightful)
the idea of interacting with a computer like a human is an artificial hangover from being introduced to the computer the first time. after using it for awhile, you realize that ineracting with a computer, in small limited ways, like searching information, is easier NOT using natural language
for the very simple reason that it takes more thought, and more typing to interact naturally. it is easier to train a human to interact with a computer than it is to train a computer to interact with a human. and for the human, it is more rewarding, because the human realizes he doesn't need to exert so much effort
"what is the capital of france?"
versus
"france capital"
if you were to shout "france capital" at someone, it would be rude and confusing. but for a computer, it's actually superior
it is the conservation of communication effort at work here that wins out over natural language in computer interaction
The capital of France (Score:5, Funny)
"That's easy! The capital of France is 'F'."
Re: (Score:2)
I know what you're trying to get at but that example wasn't exactly a good one. The search engine could simply strip all the words that are pointless (is, the, and of). I'm sure that if it accepted natural search words like "what" that would automatically be eliminated too.
My biggest question is how many searches come from people in a natural way? Since Sunday only two have landed a
Re: (Score:2)
Besides, for communication via speech it's completely unnatural to say "france capital" to a machine as opposed to "what is the capital of France," even. So for speech recognition systems NLP really helps out.
Re: (Score:2)
These aren't the holy grail technologies they were once hailed to be, for sure. But they have some very important niches.
Re: (Score:2)
Do you see people still using keyboards 200 years from now? How about 50? If not... then -something- has got to replace'em... Also notice the lack of keyboards on startrek
star trek as a guide? (Score:2)
Real questions ... (Score:5, Interesting)
This misses situations like searching for "That sf-short-story were the crew of the visiting spaceship is given a dog as a present" in which googling failed, at least for me, or, more technically, when you have absolutely no idea about what the relevant terms within the outcome might be. In short, if you have a real question.
CC.
Re: (Score:2)
Re: (Score:2)
Just make sure not to report that the offog came apart under gravitational stress. That would really upset headquarters.
Wow. There is ONE Google reference to that now and perhaps there will soon be two.
Users have changed, too (Score:3, Insightful)
Just like "Click here to do X" isn't used as much on Web pages anymore. People now tend to know that they can click on underlined text to find out more.
Re: (Score:2)
Phrase level? (Score:2)
what is google, freakin' jeeves? (Score:2, Insightful)
Maybe I'm just not up on my search engine technology (or, rather, I don't know anything about it). I just don't know anybody who'd think to put a regular question into google.
How to Kill Google: (Score:2)
I wonder if MS or Yahoo are listening...
RS
How search is really used (Score:3, Informative)
If you have the opportunity to look at query logs, you see how dumb most search engine queries are.
First, a big fraction of queries are simply navigational. Many are just URLs. The major search providers recognize these in the front end machines and send back canned answers, without even passing them to the real search engine. If you type "myspace" into Google, very little work is expended returning the canned reply.
After that, most queries are one word. Phrase queries are less common.
Few people seem to have noticed, but Google started returning results based on synonyms and homonyms a few weeks ago. There have been some significant algorithm changes recently.
Less than 1% of queries use any operators, like '"" or '-'.
The real problem with natural language queries, though, is that "Ask Jeeves" was a flop. Remember Ask Jeeves? That was a system designed to process queries written as sentences. But it wasn't used that way, and didn't succeed commercially.
flop (Score:2)
Re: (Score:2)
I've noticed because the quality of the results went down noticeably.
He's lying (Score:4, Insightful)
Re: (Score:2)
Even though the number of queries processed every day is immense, the amount of text to analyze pales in comparison to the amount of text on the pages they crawl every day.
Of course, they could prune their search set considerably if they just assumed that there is no semantic content in most MySpace pages and blog entries.
Me Tarzan; You Jane (Score:3, Insightful)
Not worth processing sentences (Score:3, Insightful)
NLP is very useful (Score:3, Informative)
Here are some situations where it's useful:
1) interpreting a question rather than just treating it as a "bag of words." For instance, one can type "how tall is Mt. Everest" in the search bar and Google, rather than searching for documents that contain those 5 (or so) tokens will interpret that as a query asking for height and also search for documents that contain "Mt.", "Everest", and "height". Take that a step further and it might look for strings that represent height such as a number followed by "ft" or "meters" or "m".
2) Condensing query chains. Suppose you want to know what sport our 4th president enjoyed playing most. You can ask "what sport did the fourth president of the US like playing?" and the system will give you an answer by first interpreting "fourth president of the US" as Madison, and then searching for what sports Madison enjoyed playing. If not for such interpretation you would either have to run 2 queries (first to find out who the 4th president was, then what sports he liked), or hope that there is a document out there that Google's indexed that contains the words in that initial query.
3) Speech recognition! If you want to run a Q/A session with a computer system that has a speech recognition front end, it is more natural (easier and faster) to ask it "how tall is mt. everest?" than to say "mount everest height" or whatever you would end up typing into Google today. People like to speak using *natural language,* after all. They would gladly do it with computers if the SR systems in them were good enough (some are).
4) More precise query results. What's better, getting back a document that is likely to contain the answer to your query, or getting back the sentence that contains it? Or better yet, getting back the answer and nothing else? The more robust an NLP system the more complicated queries it can interpret and the more elegant its result can be.
On that note, Google actually *does perform* NLP on queries despite what from the summary (I didn't RTFA) looks like claims to the contrary. If you ask Google "how tall is Mt. Everest?" it actually DOES interpret that particular sentence and gives you the answer -- 29000ft or thereabouts. And you only get such an elegant result if you type "how tall is Mt. Everest" (without quotes) or "Mt. Everest how tall". Other queries of this nature will not give you quite as precise a response.
What could possibly be wrong with that? (Score:4, Insightful)
Your query does not include a verb.
> find wii
Whose "wii" do you want me to find?
> find wii review
Unable to find any reviews authored by "wii".
> find review about wii
No reviews found concerning the common noun "wii".
> find review about Wii
Here is the most recent review about the proper noun "Wii": [url to a page full of keywords related to Wii]
> find review about Wii order by relevence
"relevence" is not an English word. Did you mean "relevance"?
> find review about Wii order by relevance
Here is the most relevant review about Wii: [url to a 2 year old pre-review of the Wii before it was launched]
> find review about Wii order by relevance then date
Here is the most recent and most relevant review about Wii: [url to a fanboy site]
> find all reviews about Wii order by relevance then date
Working...
> abort
Abort what?
> abort search
I am currently performing 1,231,415 searches. Which search do you want me to abort?
> abort last search
You do not have permission to abort others' searches.
> abort my last search
Last search aborted.
> find several reviews about Wii order by relevance then date
"Several" is not a quantifiable adjective. Do you mean "seven"?
> find seven reviews about Wii order by relevance then date
Here are your results. For better search results please capitalize the first word of sentences, and end sentences with proper punctuation.
Dan East
Re: (Score:2)
Re: (Score:2)
Dan East
Re: (Score:2)
I used to have a friend that said things along the lines of what you said, but in seriousness to try and argue against something. I approached your post with the wrong mindset and I'm glad I was mistaken.
Natural Language is Stupid and Limiting (Score:2)
I guess they'll just let Powerset become (Score:2)
Google, please give us regular expression searches (Score:2)
AI is about context (Score:2)
Re: (Score:2)