
How does Google do it? 261
Doc Tagle writes "With Google reportedly on the verge of going public, more and more people want to know what makes Google tick. The Observer, serves up the answers to our questions."
You've been Berkeley'ed!
Openness is the first casualty of going public?! (Score:4, Insightful)
OK - I can (perhaps) see this as being the case prior to an IPO, but that statement can't be true after it has happened...
I mean....surely once they've gone public, they'll be obliged to detail and list the sort of information that the article postulates about? The shareholders would be entitled to know how many servers google has, what their specifications are, and what their current commercial strategy is.....surely?!
Re:Openness is the first casualty of going public? (Score:5, Insightful)
If you do not believe me, buy a share of GE. Pick up the phone, call Investor Relations and ask them how many Unix computers they have and what OS and patch level they run.
Re:Openness is the first casualty of going public? (Score:5, Insightful)
Going public WILL expose the siginificant portion of Google technology, more sp when it has to do with hardware.
Re:Openness is the first casualty of going public? (Score:5, Insightful)
With Google, their entire "business" - their means of generating cash flow - relies on sheer quantity of computing muscle and high performance software for their search databases. With GE, their business is making lightbulbs, dishwashers, hair dryers, electric motors and any more of thousands of different products used in residential, commercial and industrial settings. How many Unix computers they have in all their offices around the world is a causality of doing business, not their means of doing business.
I'm sure if you asked the GE Investor Relations department something relevant about how their business operates, you might get somewhere.
=Smidge=
Re:Openness is the first casualty of going public? (Score:3, Insightful)
The real fact of the matter is, they have custom software that they run. The number of systems, speed, memory and OSs are simply a byproduct of what they really offer: a service.
Google is no different. They offer a service. As long as they are profitable, as an investor, I could care less if the systems were running on Dell's, White Boxes, Mac, or Commo
Re:Openness is the first casualty of going public? (Score:3, Interesting)
Actually, their means of generating cash flow relies on how beneficial advertisers feel it is to advertise on Google.
Re:Openness is the first casualty of going public? (Score:5, Interesting)
Why would a shareholder care about server specifications? Investing is all about money. Read any quarterly report from a public company. Income statement, balance sheet, and cash flow are the primary interests on the numbers side as well as a general roadmap of where the company's heading. Warren Buffett doesn't care if each server has two 80 GB drives, or whether they have four 250 GB drives per server. The only thing that matters is that there are competent people to handle these kinds of "dirty details" that an investor doesn't give a rats ass about.
Take a look at the kinds of information [yahoo.com] you could expect from Google's quarterly reports.
Re:Openness is the first casualty of going public? (Score:3, Insightful)
With google: before I give them my money, I would like to know how many servers they have, how close to capacity they are, what softwares they use (compatibility issues).
Honest reportin
Re:Openness is the first casualty of going public? (Score:5, Informative)
I agree it would be nice to know. But if those are your conditions for investing in Google, I think Google would probably tell you to keep your money. I imagine Google's quarterly reports would probably say something like:
"Our operation depends on having the ability to increase our server and bandwidth resources as we grow our services. Business may be adversely impacted should capacity be unavailable. Our servers are also at risk for viruses, worms, and DDoS attacks which could put the operation of those servers at risk and adversely affect business." etc...
That would give you, as an investor, the information you need to determine whether those risks are worth your money. In all likelihood you'll just have to rely on the fact that they have an army of PhDs who are smarter than you and I put together and know their shit when it comes to security, databases, clustering, etc.
Now I could be wrong. Perhaps Google is waiting for the IPO and will then detail their server infrastructure, wow Wall Street (and geeks worldwide) with their amazing capacity, and their stock will skyrocket on the first day of trading. I'd wager that Google's stock is going to have amazing gains anyway given that it's a bit of an industry darling. Other tech companies which have been thinking of going public would be wise to time their IPO very shortly after Google's and ride the wave.
You have a Point here (Score:2)
Now figure I am selling my options.
Now add that more people will buy them at a higher price if they are impressed with the number of computers.
I think there is a big temptation for Google to expose whatever it has to expose if it means getting the option value up.
After they cash out their options - google can compete, not compete or whatever - it will be the publics problem.
AIK
Re:Openness is the first casualty of going public? (Score:2)
Re:Openness is the first casualty of going public? (Score:3, Insightful)
Be reasonable.
Financial information is important, their business plan is important, it is probably important to know that they are running Linux so that SCO-type problems can be factored in. The sort of fine technical details the Observer goes into are totally irrelevant, just an incidental business expense. We know that it all works and that Google are on top of what they do. That is what matters.
Re:Openness is the first casualty of going public? (Score:2, Interesting)
"When I visited the company in January, the screen said that Akamai was serving 591,763 hits per second, with 14,372 CPUs online, 14,563 gigahertz of total processing power, and 650 terabytes of total storage. On April 14 [2004], the number had jumped to a peak rate of 900,000 hits per second and 43.71 billion requests delivered in a 24-hour period."
From this article [technologyreview.com].
Re:Openness is the first casualty of going public? (Score:2)
This is just fun stuff that the company chooses to publish, it is not Investor Information'.
Re:Openness is the first casualty of going public? (Score:3, Funny)
Re:Openness is the first casualty of going public? (Score:3, Interesting)
I, for one, would. Now, unfortunately I don't have enough money to start investing on Wall Street, but hopefully that will change soon. So, why would I want to know technical details for a company? Obviously, because I'm a geek. But someone has to track this kind of stuff to produce a stock report. You can't have a company saying "We bought an IBM X Server and it now ballances our accounts and brokers international deals
first casualty ?? (Score:5, Informative)
Recycling without attribution [technologyreview.com] is the first casualty of bad journalism.
I thought I had read this article before, and then I realised, I had read it before...
(although I now realise that you are not supposed to read the linked articles before posting comments - sorry)
Re:first casualty ?? (Score:4, Informative)
Google is faltering (Score:3, Interesting)
Google has recently removed tens of thousands of "duplicate content" sites from its index - where "duplicate content" is as simple as being an affiliate site (e.g. Amazon) and having the same textual item descriptions as many other sites.
Google is now in the process of dropping millions of link records from its index, presumably to make room for more pages.
Google is wavering.
Gmail is a distraction, a venture into some other space to keep people from noticing that their search product is degrading.
May she last as long as possible...
Interesting (Score:4, Interesting)
Can you back up your assertions that Google's index is full? It's a rather interesting theory, and perhaps an explanation for all the tweaking they've done lately.
Re:Interesting (Score:5, Informative)
Insert software patent debate (where Google is the default hero due to its geek factor) here...
Re:Interesting (Score:5, Funny)
Re:Interesting (Score:3, Insightful)
How many times have you run a search and seen a link at the bottom that says something like "Google removed information from this search that is redundant to information already displayed on the page" (Can't remember exactly what it says right now). Usually, there's nothing valuable in the hidden links - why index them at all?
Re:Why Verbatim Clones??WAS:Interesting (Score:2, Informative)
Re:Google is faltering (Score:5, Insightful)
Hmm... are they using a 32-bit integer to keep the page count?
2^32 = 4.294 billion, pretty close to 4.285 billion pages.
Newbies...
Re:Google is faltering (Score:5, Funny)
On a side note I would really like to know which one is page number 1.
Diego Rey
Re:Google is faltering (Score:2, Insightful)
Re:Google is faltering (Score:3, Informative)
Actually, they already have the fix implemented, and it's currently in the process of being rolled out. The upgraded system makes use of a split primary key which comprised of a "selector" subkey and a "segment" subkey. The selector key is shifted le
Re:Google is faltering (Score:2, Funny)
Re:Google is faltering (Score:5, Informative)
Ah, youthful mod!
You've been (humorously) trolled. I suggest posting in this thread to remove your "+1 Informative", or getting a friend to mod it "Funny".
What the parent is describing is not what Google will do, but what DOS did: the above scheme is how MS-DOS managed memory [internals.com], except that the "selector" and "offset" were both 16-bit numbers under DOS. (Although "segment" was the more usual term for "selector".) The segment number was shifted left four places -- or put more simply but less graphically, multiplied by 16 -- and then added to the offset number, to give the whole or "flat" address: segment is multipled by 16 (shifted left 4 bits or one hex digit of multipled by 16) This allowed DOS to use 16-bit numbers to address 2^20 = 1 MB of memory, but since DOS reserved the upper 384 KB for the (remapped) BIOS and peripheral cards, programs were able to address at most 640 KB of memory; the parent's mention of "64 billion pages" is probably an allusion (increased several orders of magnitude) to this DOS limit.
Of course, this was a kludge, pure and simple, required because DOS machines were 16-bit. Among other things, it allowed the same memory locations (all but the very top and bottom memory addresses) to be addressable by several different addresses, and discovering pointer aliasing it required calculations that, by their very nature couldn't be done wholly in the machines (16-bit) registers.
Consider: segment 4, offset 0 is 4 * 16 + 0 = 64,
and segment 3, offset 16 is 3 * 16 + 16 = 64,
and segment 2, offset 32 is 2 * 16 + 32 = 64
and segment 1, offset 48 is 1 * 16 + 48 = 64
and segment 0, offset 64 is 0 * 16 + 64 = 64:
so all five segment:offset pairs are apparently different but actually point to the same memory location.
Moderate Up... (Score:2)
Re:Google is faltering (Score:3, Informative)
<sarcasm>Wow, I didn't know DOS managed memory at such a low level!</sarcasm>
s/DOS/the 8086/g;
You're really referring to the horrible segmented memory layout used by the Intel 8086 processor and its later derivitives. I did all this shit years ago in university. Almost every lesson my fellow students and I (and the lecturer as well) would end up cursing Intel for their whacky processor design. Interestingly Intel introduced a similar s
Re:Google is faltering (Score:2, Informative)
Also censoring... (Score:2)
Re:Google is faltering (Score:2)
Wouldn't it be the keyword index?
Image the slashdot front page? How many 'keywords' or 'whatevers' would you have to categorize and organize to maintain a searchable structure? 100? 1000?
So if you have 100 rows in a DB relating to 1 page then the DB would have been maxed out a factor of 10^2 ago instead of now...
Right?
Google full? Or just tweeking the algorithm? (Score:3, Interesting)
Google is now in the process of dropping millions of link records from its index, presumably to make room for more pages.
It's possible that the index is full, but I would imagine that they would have seen this coming long ago, as it "filled up", and taken measures. What
Why 4.285 billion? (Score:5, Interesting)
Results 1 - 10 of about 5,750,000,000 for the [definition]. (0.11 seconds)
Doesn't that imply more than 4.285 billion?
How does Google do it? (Score:5, Funny)
Re:How does Google do it? (Score:2, Funny)
http://labs.google.com/
http://www.google.com/
http://www.google.com/intl/xx-el
Here (Score:5, Insightful)
Maybe this is the reason after all, but I think it's more about Google being simple, smart and clean. They play fair (no browser interstitials, no sneaky crap, no registration necessary...etc); I would equate Google's victory thusfar to a kind of no-nonsense attitude to business, always, no-exception.
Re:Here (Score:5, Insightful)
And the fact that there are so many articles, from people that just can't understand why google is successful, just goes to show you how screwed we all are...
Practically everyone in business is determined to be as evil as possible torwards their customers (and employees) and assume that anybody doing anything else must be doing something wrong, no matter what all other indicators may say.
For a great example, read The Wal-Mart Myth [guerrillanews.com].
Re:Here (Score:2)
I went to tompaine.com, which had originally published it, and found more articles by the same author. Back to Basics [tompaine.com] was a very thoughtful look at the outsourcing debate.
OT (Good article) (Score:2)
Do yourself a favor, slashdot, and get a membership. It's worth it.
Tinfoil Hats (Score:5, Informative)
This is to keep it simple. Exacting legal language is the path to screwing people. Vague terms of service are good because both sides can wiggle. Has anyone been sued because of these terms of service? I'd like to see some refs to that, but I'm guessing it's just to protect the general public from a-holes who would exploit Google.
> 2) Why does their cookie stay until the year 2038?
Not to be funny, but someone at Google likely knows when the end of the world is coming and has set the cookie to reflect this. Seriously, who cares how long cookies stay alive for? You can block them if you like, but I think it's really just to keep Google more effective.
> 3) Why does their Google search bar report information and auto-update without permission?
I'm against Spyware, so I don't run it, but Google tracks searches anyway, so what's the point of getting upset about it? These technologies makes Google more user-friendly. Google doesn't have loads of popups trying to get you to install the bar -- it's not right in your face. People who want it likely don't care if it auto-updates because then they have the most recent version of it.
just read almost everything on google-watch.org (Score:2, Funny)
They have built an amazing system using Linux... (Score:2, Interesting)
Re:They have built an amazing system using Linux.. (Score:3, Insightful)
The service is free, and they're really good at what they do. I would say I'd be lost without google on the internet, but really this compliment goes for lots of search engines - I'm really very grateful this sort of service still exists for free (well, with ads.)
Unless you want to talk about cures for diseases through protien folding simulations, I can't think of a better way for this hardware to be u
Re:They have built an amazing system using Linux.. (Score:2)
1) They provide an alternative to Microsoft. Not only search, it looks like they will give a blow to hotmail as well. They prevent MSN from becoming the portal. I think this is very important, people see things can be done better than the Microsoft way, and it can be done with Linux
2) They make the communication within Open Source and Free Software community much easier. I keep a
Re:They have built an amazing system using Linux.. (Score:2)
Article didn't say much (Score:5, Interesting)
-Vic
Soon to be everything (Score:4, Interesting)
Re:Soon to be everything (Score:5, Informative)
Re:Soon to be everything (Score:3, Informative)
Re:Soon to be everything (Score:2, Informative)
And their online translator is here [google.com].
As a consultant (Score:5, Informative)
Re:As a consultant (Score:5, Informative)
But yeah, their racks of 4 servers/1U is pretty impressive when you see them lined up in row after row of racks. Their data centers have to bring in extra cooling because they are so densely packed.
Yes. (Score:3, Informative)
Two Thingies (Score:5, Interesting)
Two -- If you want your pages indexed faster and more frequently, sign-up and place a google adsense ad on your page. Many webmasters believe that google is having to index so many adsense pages... that is difficult for google to add many more non-ad driven pages.
Just sign up for adsense and run it a couple of weeks while you build your site. After google has spidered your site well, then just drop adsense.
Good luck. I would love to hear any of your google-related tricks.
AC
Re:Two Thingies (Score:2)
Try it out [texturizer.net]
Google does it with Linux :o) (Score:2)
(You may wish to take issue with the above..)
The "searching xxx web pages" count (Score:2, Redundant)
Re:The "searching xxx web pages" count (Score:2)
Anyone notice 4,285,199,774 just so happens to be ~99.8% of 2^32? Is this a 32bit counter about to overflow?
Re:The "searching xxx web pages" count (Score:2, Interesting)
the reason they keep their mouth shut (Score:5, Funny)
On the other hand, here's the conspiracy theory version: what if Google IS the NSA? The IPO is a smokescreen to try to avert attention. The reason they can't show their true capability is that when the company goes public, only 20% of their hardware will actually go into the public company "Google", the rest of the hardware will still be hidden and a part of the NSA's system. :-)
[For the humor impaired, I'm just joking, but it does make you wonder...]
One word. (Score:4, Informative)
The Google bot respects it, so if you're up to no good, it's easy to get Google to not index your page.
Anyway, I'd like to see a version of google that didn't respect robot.txt. You'd used to be able to dig up alot of infermation on peopel on google before they started to use robot.txt on alot of sites.
I've though about this a bit (Score:3, Interesting)
Putting on my computer scientist hat I would guess:
- instead of backup, hold data in multiple places at once
- use a "cascaded rsync" to trickle software changes to thousands of nodes
- then load software via NFS at node bootup
- use nodes just to store data; keep software in RAM for speed
Just a few thoughts.
Re:I've though about this a bit (Score:3, Insightful)
Even better, instead of backup just crawl the pages again in the event of a lost disk. Of course some data needs to be in multiple places for performance reasons, but not all data are accessed frequently. How often do you think they will need the page with the lowest rank? (OK, I know there will probably be a lot with exactly the same rank, but you get the idea).
load software via NFS at node bootup
There are better protocols for this than NFS. But
Re:I've though about this a bit (Score:2)
That'll be a no, then.
google instant messenger, or... (Score:4, Interesting)
So far we know they have just a cubic load of servers, the most on the planet most likely with one private company. The government probably has more, but it's a mish mash of them, not near as sleek or coordinated, AFAIK. What COULD be next with them, practical cheap 50 dollar thin clinets that you could do a TON on, using distributed computing, from games to communication to running any business? With tech savvy like they got and their already established heavy hardware base and heavy committment to R&D, they could just 'splode with an extra 25 billion in cash all of a sudden from an IPO. OR, the money could get to them and they become just another weird company that forgets it's roots as "brains come first" and switch to "marketing crap comes first" like certain other unnamed megacorps do now.
Interesting times
How Google do that? (Score:4, Informative)
Re:How Google do that? (Score:3, Informative)
Supplmental Result (Score:4, Interesting)
The best evidence is doing a search which returns results which say "Supplemental Result" next to them. That'll be coming from a second document store I'd guess.
Re:Supplmental Result (Score:5, Interesting)
By the way, for supplemental result... By doing a quick keyword search on Google using my domain name, I'm led to believe that pages marked "Supplemental Result" are pages that look like search results. That is, they aren't filled with any real content, other than search results from other engines. Results that could "supplement" your "result" from Google.
Re:Supplmental Result (Score:3, Interesting)
Linux needs more patching? Does it? (Score:2, Interesting)
Huh? Does it!? Since when? I like these throw-away lines the media people dish out. What is their basis for this statement? Even when they see Linux obviously succeeding, they dish out a statement like this.
I certainly don't have to patch my Linux boxes as frequently as my Windows boxes. Actually... no... wait, they're right! I only need to patch Windows once. Ctrl-Alt-Del -> Boot Debian CD.
Re:Linux needs more patching? Does it? (Score:3, Insightful)
Baumi
Public paper on Google File System (Score:5, Informative)
If that link gets slashdotted, here is another link of a PDF PowerPoint presenation [brandeis.edu].
Good read! This paper (with the discusion of the goodness/fastness of file appends) made me more interested in Prevalence [advogato.org] - so much so that I am using it for my new project.
-Mark
Re:Public paper on Google File System (Score:4, Informative)
Interesting that a major problem for Google is managing power and cooling !
No questions answered (Score:2)
The Google Might Be Falling (Score:2, Interesting)
The more and more I look at it, the more and more I fear Google is just nothing more then a very well calculated
Re:The Google Might Be Falling (Score:5, Insightful)
Um, you do realize that Google already makes a profit [businessweek.com], don't you? I daresay the IPO will puff the value of the company up beyond the rational amount, but that's not 'Enron' -- if you are going to use buzzwords, use the right ones. Enron was a case of internal actors in the company using financial games to siphon off profits and inflate the value of the company on the books. You accusing Google of financial fraud? If you are going to use a buzzword, use 'Yahoo' or something -- a solid company that got its stock price puffed up excessively due to investor mania.
How the hell did this get moderated up, except as 'Funny'?
Re:The Google Might Be Falling (Score:4, Informative)
1) Google has an effective advertisement system
2) My last two employers bought Google boxes for their intranet
Re:The Google Might Be Falling (Score:4, Insightful)
Correction, the ad model has proven to be of dubious effectiveness with companies that have no credibility.
Google is perhaps the most trusted company on the net today, and with the traffic they get, I'm not surprised at all that they can support all their financial needs with ad revenue, especially with some of the big bucks that large companies dump into advertising with Google. I challenge you to show evidence showing that their advertising business model cannot support their costs, because so far you've done nothing but toss up tin-foil hat ideas without any proof to back it up, and as someone else so kindly pointed out to you, Google is ALREADY in the black.
How do they do it? Two words (Score:2, Funny)
You may also find this interesting... (Score:5, Informative)
Leprechauns! (Score:2, Funny)
"serves up the answers to our questions"??? (Score:2, Insightful)
the article never answered any of our questions - heck, i even looked for a "Page 2" link after reading the entire thing, sadly, the article ended w/o even attempting to answer its own questions.
they don't have to path and update very often (Score:3, Insightful)
Come on, the nodes in their clusters are not desktop computers with office software on it.
The system running these machines are rather very stipped down: They only need very few applications and a very simple kernel (not many device drivers, maybe no graphic card driver, ...).
Furthermore there are no local users on the the machines -> many security flaws wont affect the integrity. And remote holes in the kernel occur not very often.
And above all these cluster nodes are certaily shielded by some sort of firewall. Therefore they don't have to care for network security themselves.
All in all: I believe that you need to update such machines rather infrequent. At least not for security reasons.
Titus
Re:they don't have to path and update very often (Score:2)
IPO signals more World Poker Tour participants (Score:3, Interesting)
So one has to assume the IPO is the first phase of the principals "cashing out". The press will probably signal this as a sign of the next dot com boom, and a bunch of nerds within the company will suddenly become millionaires, and subsequently quit their job and open up a Bed & Breakfast in some obscure town or join the World Poker Tour. There goes the talent.
Doing half as well as Google (Score:4, Funny)
"Google manages to achieve this with sophisticated techniques for rippling changes through the cluster, yet achieves 100 per cent uptime. This is serious stuff, and there are a lot of IT managers out there who would give their eye-teeth to be able to do it half as well."
Sigh...as an IT manager I can only dream of 50% uptime. Damn you, Google!
Re:Google Problems (Score:2, Funny)
Re:Additional questions (Score:5, Funny)
Neat. I wonder what doing a Google search would return for other letters:
"c" -- 299,792,458 hits
"e" -- 2.71828183 hits
"h" -- 6.626068 × 10^-34 hits
"i" -- sqrt(-1) hits
"k" -- 1.3806503 × 10^-23 hits
Looks like Google is definitely busted. They should fix these bugs.
Re:Another Rumor... (Score:2)
Larry Augustin (Score:2)
Re:Google can't do it: phrase searches (Score:5, Insightful)
"To be or not to be"
and I honestly can't see what you are going on about: of the first ten results, eight highlighted the phrase in the page synopsis, one used the phrase as a domain name, and one included the parital phrase "...Or Not To Be."
Note the elipsis on that last one: it alludes to a larger portion of text preceding the printed portion. And the domain-name was found even though the spaces were omitted.
Those aren't irregular results: those are highly intelligent results.
Just because they aren't deterministic enough for you to plug them into a piece of code of your own construction (without compensating Google) doesn't mean that they don't fulfill the purpose of the web search.
Re:Google can't do it: phrase searches (Score:2)
(You have to put quotes around a phrase to get results that contain it as you typed it)
Re:My theory: (Score:2)
Re:My theory: (Score:2)
Re:Google started to make me mad (Score:3, Informative)
Try clicking in the address entry bar on Safari, and typing in "www.lycos.com", or whatever other search engine you would like to use.
Just because the menu bar's search function pulls up google, doesn't mean you have to use it. Or did using a Mac for this long rot your brain to the point where you can only do things either the Mac way or the Extremely Difficult way?