Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
News

How does Google do it? 261

Doc Tagle writes "With Google reportedly on the verge of going public, more and more people want to know what makes Google tick. The Observer, serves up the answers to our questions."
This discussion has been archived. No new comments can be posted.

How does Google do it?

Comments Filter:
  • As a consultant (Score:5, Informative)

    by elinenbe ( 25195 ) on Sunday April 25, 2004 @09:29AM (#8964472)
    having been a consultant at their data center a year or so back I can attest that they had well over 50,000 machines. I am not sure about the 80GB drive per machine because from what I understood was they bought whatever drive at the time was the cheapest MB/$ and would replace any dead ones with the larger ones. Also, at any given time machines just die and many of them are not replaced or repaird for months. Their cluster accounts for all this...
  • Huh? (Score:1, Informative)

    by lawrencekhoo ( 108310 ) on Sunday April 25, 2004 @09:30AM (#8964479) Homepage
    There are no answers in the article at all. Just the usual questions about how Google's publicized statistics don't add up.

  • Re:As a consultant (Score:5, Informative)

    by _Sharp'r_ ( 649297 ) <sharper@@@booksunderreview...com> on Sunday April 25, 2004 @09:36AM (#8964496) Homepage Journal
    But also realize that the data center you were at isn't their only one. I know of at least 7 physical locations and there are probably more out there.

    But yeah, their racks of 4 servers/1U is pretty impressive when you see them lined up in row after row of racks. Their data centers have to bring in extra cooling because they are so densely packed.
  • by richard_za ( 236823 ) on Sunday April 25, 2004 @09:38AM (#8964509) Homepage Journal
    Google already has spell check, and so does Gmail have a look at the screenshots [camara.co.za] on my blog [canara.co.za]. I believe they're looking at releasing it to the public in six months time, have a look at this article [eweek.com].
  • by Anonymous Coward on Sunday April 25, 2004 @09:44AM (#8964530)

    The only thing it's missing now (IMO) is spellcheck and an online translator, which I'm sure they're already planning. I'm also looking forward to Gmail being open to the public. After they conquer these 3 thing, whats next.. Google ISP? Google National Army?

    Google has had a builtin spellchecker forever and their translate tool is right here http://www.google.com/language_tools [google.com]
  • Re:Interesting (Score:5, Informative)

    by ShaunC ( 203807 ) on Sunday April 25, 2004 @09:48AM (#8964551)
    Google is definitely cracking down [webworkshop.net] on duplicate content [seochat.com]. In fact, they've recently patented [searchguild.com] the concept.

    Insert software patent debate (where Google is the default hero due to its geek factor) here...
  • How Google do that? (Score:4, Informative)

    by elpecek ( 712453 ) on Sunday April 25, 2004 @10:00AM (#8964599)
    For those who haven't read - there is an article written by Brin and Page - maybe a little outdated, but still interesting: The Anatomy of a Large-Scale Hypertextual Web Search Engine [stanford.edu]
  • by evilmonkey_666 ( 515504 ) on Sunday April 25, 2004 @10:05AM (#8964634)
    Umm is this a joke, they do have a spellchecker built into the search engine. I use it on a daily basis.

    And their online translator is here [google.com].

  • first casualty ?? (Score:5, Informative)

    by Sad Loser ( 625938 ) * on Sunday April 25, 2004 @10:13AM (#8964667)

    Recycling without attribution [technologyreview.com] is the first casualty of bad journalism.

    I thought I had read this article before, and then I realised, I had read it before...
    (although I now realise that you are not supposed to read the linked articles before posting comments - sorry)
  • by Waffle Iron ( 339739 ) on Sunday April 25, 2004 @10:15AM (#8964676)
    Yeah, those hundreds of PhDs they have working there will *never* figure that out. I hear they started with a 16 bit signed integer for their primary key and only after months of hard work upgraded it to 32 bit. Time to close down shop, it's impossible to fix.

    Actually, they already have the fix implemented, and it's currently in the process of being rolled out. The upgraded system makes use of a split primary key which comprised of a "selector" subkey and a "segment" subkey. The selector key is shifted left by four bits and then arithmetically added to the segment key. This clever scheme expands the index by a factor of 16; Google will soon be able to host over 64 billion pages!

  • by jvsanford ( 660375 ) on Sunday April 25, 2004 @10:28AM (#8964736)
    There is also a paper that describes their storage infrastructure (Google File System) here [rochester.edu]
  • With google: before I give them my money, I would like to know how many servers they have, how close to capacity they are, what softwares they use (compatibility issues).

    I agree it would be nice to know. But if those are your conditions for investing in Google, I think Google would probably tell you to keep your money. I imagine Google's quarterly reports would probably say something like:

    "Our operation depends on having the ability to increase our server and bandwidth resources as we grow our services. Business may be adversely impacted should capacity be unavailable. Our servers are also at risk for viruses, worms, and DDoS attacks which could put the operation of those servers at risk and adversely affect business." etc...

    That would give you, as an investor, the information you need to determine whether those risks are worth your money. In all likelihood you'll just have to rely on the fact that they have an army of PhDs who are smarter than you and I put together and know their shit when it comes to security, databases, clustering, etc.

    Now I could be wrong. Perhaps Google is waiting for the IPO and will then detail their server infrastructure, wow Wall Street (and geeks worldwide) with their amazing capacity, and their stock will skyrocket on the first day of trading. I'd wager that Google's stock is going to have amazing gains anyway given that it's a bit of an industry darling. Other tech companies which have been thinking of going public would be wise to time their IPO very shortly after Google's and ride the wave.
  • Tinfoil Hats (Score:5, Informative)

    by mfh ( 56 ) on Sunday April 25, 2004 @10:35AM (#8964768) Homepage Journal
    > 1) Why are their terms of service / Pirvacy Policy so vague?

    This is to keep it simple. Exacting legal language is the path to screwing people. Vague terms of service are good because both sides can wiggle. Has anyone been sued because of these terms of service? I'd like to see some refs to that, but I'm guessing it's just to protect the general public from a-holes who would exploit Google.

    > 2) Why does their cookie stay until the year 2038?

    Not to be funny, but someone at Google likely knows when the end of the world is coming and has set the cookie to reflect this. Seriously, who cares how long cookies stay alive for? You can block them if you like, but I think it's really just to keep Google more effective.

    > 3) Why does their Google search bar report information and auto-update without permission?

    I'm against Spyware, so I don't run it, but Google tracks searches anyway, so what's the point of getting upset about it? These technologies makes Google more user-friendly. Google doesn't have loads of popups trying to get you to install the bar -- it's not right in your face. People who want it likely don't care if it auto-updates because then they have the most recent version of it.
  • by MarkWatson ( 189759 ) on Sunday April 25, 2004 @10:39AM (#8964784) Homepage
    Here is a PDF file of the paper [rochester.edu].


    If that link gets slashdotted, here is another link of a PDF PowerPoint presenation [brandeis.edu].


    Good read! This paper (with the discusion of the goodness/fastness of file appends) made me more interested in Prevalence [advogato.org] - so much so that I am using it for my new project.

    -Mark

  • by reanjr ( 588767 ) on Sunday April 25, 2004 @10:46AM (#8964817) Homepage
    I don't know why he has numerous identical sites, but one reason is when a small company purchases several other companies that are in the exact same market. Since the companies are compatible, you merge all their operations into one. But you still want to keep brand identification with your customers so you keep two copies of the site, each branded differently.
  • by lunar_legacy ( 715938 ) on Sunday April 25, 2004 @10:46AM (#8964818)
    Another wonderful speculation about Google infrastructure which You can find it here [topix.net].
  • by svr0002 ( 536813 ) on Sunday April 25, 2004 @11:00AM (#8964893)
    and another good one - http://www.computer.org/micro/mi2003/m2022.pdf

    Interesting that a major problem for Google is managing power and cooling !

  • by orthogonal ( 588627 ) on Sunday April 25, 2004 @11:14AM (#8964979) Journal
    Actually, they already have the fix implemented, and it's currently in the process of being rolled out. The upgraded system makes use of a split primary key which comprised of a "selector" subkey and a "segment" subkey. The selector key is shifted left by four bits and then arithmetically added to the segment key. This clever scheme expands the index by a factor of 16; Google will soon be able to host over 64 billion pages!

    Ah, youthful mod!

    You've been (humorously) trolled. I suggest posting in this thread to remove your "+1 Informative", or getting a friend to mod it "Funny".

    What the parent is describing is not what Google will do, but what DOS did: the above scheme is how MS-DOS managed memory [internals.com], except that the "selector" and "offset" were both 16-bit numbers under DOS. (Although "segment" was the more usual term for "selector".) The segment number was shifted left four places -- or put more simply but less graphically, multiplied by 16 -- and then added to the offset number, to give the whole or "flat" address:
    segment (in hex): 0001
    offset ( in hex): 0002
    segment is multipled by 16 (shifted left 4 bits or one hex digit of multipled by 16)
    segment: 0001x
    offset: 0002
    ---------------
    total: 00012
    This allowed DOS to use 16-bit numbers to address 2^20 = 1 MB of memory, but since DOS reserved the upper 384 KB for the (remapped) BIOS and peripheral cards, programs were able to address at most 640 KB of memory; the parent's mention of "64 billion pages" is probably an allusion (increased several orders of magnitude) to this DOS limit.

    Of course, this was a kludge, pure and simple, required because DOS machines were 16-bit. Among other things, it allowed the same memory locations (all but the very top and bottom memory addresses) to be addressable by several different addresses, and discovering pointer aliasing it required calculations that, by their very nature couldn't be done wholly in the machines (16-bit) registers.

    Consider: segment 4, offset 0 is 4 * 16 + 0 = 64,
    and segment 3, offset 16 is 3 * 16 + 16 = 64,
    and segment 2, offset 32 is 2 * 16 + 32 = 64
    and segment 1, offset 48 is 1 * 16 + 48 = 64
    and segment 0, offset 64 is 0 * 16 + 64 = 64:

    so all five segment:offset pairs are apparently different but actually point to the same memory location.
  • One word. (Score:4, Informative)

    by Viceice ( 462967 ) on Sunday April 25, 2004 @11:34AM (#8965059)
    Robot.txt

    The Google bot respects it, so if you're up to no good, it's easy to get Google to not index your page.

    Anyway, I'd like to see a version of google that didn't respect robot.txt. You'd used to be able to dig up alot of infermation on peopel on google before they started to use robot.txt on alot of sites.
  • Re:first casualty ?? (Score:4, Informative)

    by platypussrex ( 594064 ) on Sunday April 25, 2004 @11:35AM (#8965062)
    Not sure why you say that. If you read all the way through Naughton's article, he says that the calculations come from Garfinkel, he mentions Technology Review, and then later directly quotes Garfinkel. Sounds like attribution to me.
  • Yes. (Score:3, Informative)

    by Ayanami Rei ( 621112 ) * <rayanami&gmail,com> on Sunday April 25, 2004 @11:40AM (#8965098) Journal
    very simple example [sun.com] of 15 servers in 3U. Many vendors are also offering a "dual dual" system in 1U... that is a two dual CPU motherboards that fit in one case.
  • by imroy ( 755 ) <imroykun@gmail.com> on Sunday April 25, 2004 @01:23PM (#8965732) Homepage Journal
    ...the above scheme is how MS-DOS managed memory.

    <sarcasm>Wow, I didn't know DOS managed memory at such a low level!</sarcasm>

    s/DOS/the 8086/g;

    You're really referring to the horrible segmented memory layout used by the Intel 8086 processor and its later derivitives. I did all this shit years ago in university. Almost every lesson my fellow students and I (and the lecturer as well) would end up cursing Intel for their whacky processor design. Interestingly Intel introduced a similar scheme in (IIRC) its Xeon processors to produce (IIRC) 36-bit addresses and access more than 4 gigabytes of physical memory on a 32-bit processor.

  • by NonSequor ( 230139 ) on Sunday April 25, 2004 @01:35PM (#8965823) Journal
    The 36-bit addressing extension began with the Pentium Pro.
  • by Anonymous Coward on Sunday April 25, 2004 @04:05PM (#8966882)
    or corporation #128264 has a complete web-viewable copy of the javadocs for version 1.2. lots of times i've done google searches for something code-related looking for examples/bugs/whatever and come up with a ton of hits on the same API documentation on different websites.
  • by XO ( 250276 ) <blade.eric@NospAM.gmail.com> on Sunday April 25, 2004 @04:50PM (#8967210) Homepage Journal
    Chill out, brother.

    Try clicking in the address entry bar on Safari, and typing in "www.lycos.com", or whatever other search engine you would like to use.

    Just because the menu bar's search function pulls up google, doesn't mean you have to use it. Or did using a Mac for this long rot your brain to the point where you can only do things either the Mac way or the Extremely Difficult way?

  • by _Sprocket_ ( 42527 ) on Sunday April 25, 2004 @05:25PM (#8967486)


    The problem is, I've never paid these people a single penny for ANY of this. How the hell are they going to make money?


    1) Google has an effective advertisement system

    2) My last two employers bought Google boxes for their intranet

"May your future be limited only by your dreams." -- Christa McAuliffe

Working...