Welcome to the New Server 114
The first thing was to split the SQL off from the httpd. The mysql server now resides on a dedicated dual P2. It runs all the programs that handle keeping the HTML up to date, and NFS exports a nice file system to 3 other boxes (each is a single processor P2) that run httpd (and thats about it). Your hits are routed through an Alteon which divides the hits up amongst the 3 boxes.
The end result is that almost no code changes need to be made. There are tons of things we could do to make it faster still, but we'll look at that when we need it. During load testing this setup was able to handle 3x the load of the existing box.
The major remaining bottleneck is the banner ad frame load time. Browsers like to delay page rendering until they have the HTML for any included layers. For this reason we're going to work out a way to embed the ad HTML directly into the page and sidestep the need for layers. This ought to provide a nice improvement in page render time as well.
Anyway, thanks to all the guys who helped load test the new system before it went live (and if any of you are still running any scripts, you can stop now ;). Thanks to Andover for making this possible, and especially thanks to Peter and CowboyNeal for all your work.
Let us know if you notice anything funky.
nice (Score:1)
/. is one of my primary references, so anything you can do to improve performance is greatly appreciated.
VAR still used? (Score:1)
My company is looking at VAR systems for handling some stuff, and we took note of this new system as a test case.
Christen the new server! (Score:1)
Bwuckatah bwuckatah bahhh, bwuckatah bwuckatah bahhh!
a bug? (Score:1)
Re:You should use Solaris or FreeBSD (Score:1)
Backwards links are funky (Score:1)
George
NFS export? Why not Coda? (Score:3)
--
Slashapp? (Score:1)
How to handle address transitions with the DNS (Score:3)
But next time you change the IP for your server, it might be a good idea to decrease the TTL for the IP of "slashdot.org" a few days before the change. That way, it won't take up to 24 hours for other sites to pick up the change after their DNS cache entry has expired :-)
Thanks to Andover (Score:1)
That's a lot of work. (Score:2)
-Rich
Re:Backwards links are funky (Score:1)
Re:Christen the new server! (Score:2)
-- Give him Head? Be a Beacon?
Nice improvement (Score:1)
No more adfu serving the banner ads? I was hoping the source code would be released soon...
Hmmm. (Score:1)
Anyway, attempts to access
"The page cannot be refreshed without resending the information. Click Retry to send the information again or click Cancel to return to the page you were trying to view."
Clicking Retry or Cancel just puts me into an endless loop. Only way out is to terminate IE.
Under Netscape, I see the text and am then redirected to
At first I thought it was funny since the article about the Windows backdoor was published this morning. But, IE won't let me get past it. Now, it's not funny (Hence, I'm using my Linux box now).
Either this is a crack or another exposed bug in M$ IE.
Shamless plug. (Score:1)
Re:Hmmm. (Score:1)
--
Netscape? (Score:1)
Somebody who knows more?
Eon.
I'm seeing this too.. (Score:2)
Re:Backwards links are funky (Score:1)
-cpd
About those new boxes... (Score:1)
Brought to you by Tony the Tiger.
Re:Hmmm. (Score:1)
-Nix
Re:Slashapp? (Score:2)
666 User Impatient (Score:2)
Also, what happens to sebastian now? Do the new servers have names yet?
Watch your language (Score:1)
In the spirit of all those Slashdot conversations on the topic of "why aren't there more women crackers", I want Mr.Malda to reread the first sentence of this post.
Last I heard, "guys" was more exclusive than inclusive. So why not use the underappreciated "y'all"?! It works better than you might think.
Re:Hmmm. (Score:1)
ultramode abuse (Score:1)
Re:Hmmm. (Score:1)
Re:Hmmm. (Score:1)
Nope... (Score:1)
How about an honorable mention do dN? (Score:3)
I usually don't like to shamelessly plug my employer, but our tech dept is quite overworked and unsung.
I will, however, point out that I now get ZERO LAG!! Yay!
GUYS is now depricated (Score:1)
Images on main page? (Score:2)
(nice job overall, tho). *clap clap clap*
Re:NFS export? Why not Coda? (Score:2)
"guys" includes women (Score:1)
these days. I call bevys of females "guys" all the time, and I know I'm not the only one who does this.
Re:You should use Solaris or FreeBSD (Score:3)
If the filesystem you're mounting changes frequently, as I presume the /. filesystems do, then cachefs will probably slow things down. In my testing, using Solaris clients and Solaris servers, it worked best on read-only, rarely changed filesystems and then it worked very well.
That said, I'm not too happy with Linux's client NFS performance, but my problems seem to only occur when I try and use a Linux client with a Solaris server and I haven't actually figured out which side the problem is on. Hopefully, I'll have time over this weekend to try the NFSv3 patches.
PeeWee
Re:Nope... (Score:1)
Alteon Products for Slashdot? (Score:1)
Re:You should use Solaris or FreeBSD (Score:1)
NFS (for all its filesystems) is faster than when it used to have a local SCSI disk. Linux's NFS is often broken (you get what you get with the kernel you pick), but with a little hacking it performs just as well as NFS between our Sun boxes.
--
Re:abuse of nat! (Score:1)
How about proxy servers. I recently worked for a company of over 50,000, but all of the http traffic was flowing through proxy servers. The individual boxes must have hundreds, if not thousands, of users accessing the web through them.
That said, I still think that it's probably a script.
joe
Re:"guys" includes women (Score:1)
Re:Watch your language (Score:1)
If you don't like Atlanta, go elsewhere. I love it here. =)
Of course just as you have the right to bitch about Atlanta, I have the right to defend it
Disclaimer: Born and raised here, so I'm biased.
Re:GUYS is now depricated (Score:1)
"you all" and, worse, "y'all" are NOT proper english. the plural of "you" is "you". if you say "you all" or "y'all", I will point my finger at you and laugh and say "look, an idiot!"
long live ultramode.txt (Score:1)
Re:long live ultramode.txt (Score:1)
Re:"guys" includes women (Score:1)
Well, I would, but yeah, good point. I think "guys" may be slowly becoming inclusive, but it ain't really there yet, in general.
FWIW, I tend to use "folks".
Re:How about an honorable mention do dN? (Score:1)
Re:I don't understand (Score:1)
It wholly depends on which a field of knowledge you are talking. It's time to talk about DNS entries but not the time a packet could be alive while walking from the source to the destination.
At a DNS case do TTL shorter means that a DNS entry on slashdot.org domain should be requested much often so new IP address for www.slashdot.org could populate faster over the Net.
a /. reader
Re:I don't understand (Score:1)
My Company uses a firewall. (Score:1)
We have many people using the
Re:Nope... (Score:1)
If I then shutdown IE, restart, and go back to
Want speed? Remember your WIDTH and HEIGHT. (Score:2)
Browsers like to delay page rendering until they have the HTML for any included layers.
Many browsers also like to know the size of the images before they attempt to render the page. This is good incentive to use those WIDTH and HEIGHT attributes on your IMG tags. I've seen lots of web pages which have been effectively "held hostage" by problematic ad servers. Good thing this never happends on Slashdot. Then again...
Re:Watch your language (Score:1)
Beer recipe: free! #Source
Cold pints: $2 #Product
reeeeallly slow now... (Score:1)
Of course, the ad banners loads instantly (ad banners are generally the fastest things on the net... just another thing that points to modern ideals of the "news hole"), the icons and Slashdot logo load fast, and then it takes a week for the rest of the page to come in. The news, the sidebars, etc. take forever to load. This is only since the new implementation.
don't use dns... use the base 256 number system! (Score:1)
But anyway, in case you didn't know, you can use http://255 = http://0.0.0.255, http://256 = http://0.0.1.0, etc. You get the idea. Use it, it's cool!
Re:Watch your language (Score:1)
We Texans are most definitely not southerners, with the possible exception of East Texans. Texans are Texans. Other categorizations ill suit us.
Therefore, when you hear someone use "y'all," you should not assume that the person is a "retard inbred southerner." There is a similar chance that the speaker is a retard inbred Texan, or a retard inbred yankee who has come to the realization that making a contraction of "you" and "all" is much more in line with accepted rules of English than pluralizing the already-plural "you" to yield "youse."
Please come up with a better argument for our next debate, which will concern "ain't."
Re:GUYS is now depricated (Score:1)
For the record, I was born In Sacramento, CA, raised in Virginia, and now live in Madison, WI. I will always be a southerner, though.
Re:Slashapp? (Score:1)
Re:My Company uses a firewall. (Score:2)
_once_ every 30 minutes, and having the users
all access your local copy?
it woudl do a lot to keep your site from being banned.
Re:Watch your language (Score:1)
Unfortunately, this tends to be the case. Unfortunately because it's a necessary construction for proper communication. Just about every language but English has an equivalent to "Y'all."
What makes someone sound stupid is when they address a single person as "y'all".
------
Adfu (Score:1)
Re:Alteon Products for Slashdot? (Score:1)
We're doing a similar setup using an Alteon AceDirector 2 (eight ports). Once you understand their terminology, it's horribly easy to configure.
I was about to recommend the ISP-LoadBalancing [isp-loadbalancing.com] list to you, but then realized that there's been about three messages on it in the last few months...
------
Re:abuse of nat! (Score:1)
Re:My Company uses a firewall. (Score:1)
But System administrators can be a bitch to work with if you place unauthorized software on systems/networks. They don't want anything to stop their time on Y2K. I love my
Re:How about an honorable mention do dN? (Score:1)
I get 1.003 ms to the new SlashDot so I'm happy as hell
Matthew
_____________________________________
multiple ip listening (Score:1)
Re:"guys" includes women - sometimes. (Score:1)
"Guys" as a form of address is widely accepted as a gender-nonspecific term.
Other uses of the word are *usually* intended to imply the male gender only.
Context is the key, and I think we are all intelligent enough here to glean the appropriate context from Rob's comment. He's run polls on the gender breakup of his audience, so he's obviously aware that he does not have a 100% male audience.
So stop being so damn precious!
Re:reeeeallly slow now... (Score:1)
Re:reeeeallly slow now... (Score:1)
didn't preview
Re:Watch your language (Score:1)
You is already plural, btw. Thou is singular.
Alteon? (Score:1)
What's an Alteon?
How does it work?
-jfedor
Why export a filesystem at all? (Score:1)
pc on the PC? (Score:1)
PC language is a semantic nightmare which makes a disgusting (or impersonal) mess of an otherwise very capable language: English.
I'm quite willing to refrain from using any of the 7 words, but I'm not about to start typing he or she or s/he.
Heinlein said it best: Whenever women have insisted on absolute equality with men, they have invariably wound up with the dirty end of the stick. What they are and what they can do makes them superior to men, and their proper tactic is to demand special privileges, all the traffic
will bear. They should never settle merely for equality. For women, 'equality' is a disaster.
Re:GUYS is now depricated (Score:1)
Re:NFS export? Why not Coda? (Score:1)
Re:Watch your language (Score:1)
Does Slashdot update? (Score:1)
Linux NFS client performance (Score:1)
Check your rsize and wsize options that you use on Linux to mount the NFS stuff. I've read that increasing these to 4096 can help with Solaris interoperability.
Check the NFS-HOWTO for Linux for details.
James
Solaris DID invent NFS ... (Score:1)
Lightning fast (Score:1)
Seriously, a great improvement. Well done.
Re:Linux NFS client performance (Score:1)
My reads are fine, so I've left rsize at 8192.
I have played with the wsize. I've tried 1024, 2048, 4096 and 8192. I've been through the NFS HOWTO and the Ethernet HOWTO. I've been through DejaNews many times and all I've really found is that I'm not the only one having problems with Solaris NFS servers. This isn't a new problem for me, every Linux box I've set up seems to have it, from kernels back in the 1.2.x days. I'm a Solaris admin for a living, so all of my NFS servers have been Solaris, including the Ultra 5 I use as my home server.
Playing on my home systems after writing my original post, I made some progress. If I specify the 'tcp' option to mount, my writes get noticably faster, but are still slower than I think that they should be. So far, my best performance seems to come from these mount options: rw,rsize=8192,wsize=1024,tcp.
I'm not dissing Linux or its NFS implementation, by the way. I've been adminning UN*X servers for living for about five years and I've been using UN*X for about four years longer than that, so I'm used to quirks and interoperability glitches. I expect them. They're OK. It just happens that none of the Linux boxes I've deployed in production environments have depended on NFS for anything more than my convenience, so I've never been terribly motivated to fix it.
BTW, thank you for your suggestion. I do appreciate it, even if I did seem to tear it down.
PeeWee
P.S. And no, I won't run Linux on the Ultra 5, in case someone was going to suggest that. ;)
I can't believe it! (Score:1)
Some answers to your questions... (Score:2)
There have been lots of questions posted over the past two weeks since we first hinted that a move was imminent but we have all been just too busy to answer them individually. We do intend to write up a description of the current system and document our trials and tribulations along the way. If there is any interest, perhaps we can do an official Ask Slashdot as well.
But in the meantime, here are the answers to some of the questions you have already asked...
Paul Crowley writes:
I'm surprised that you went for NFS rather than Coda - NFS is a bit suckful, and Linux's implementation doubly so. Coda would have given you a more secure and more efficient protocol for talking to the other servers. Get Andover to buy you a duplicate setup for testing new configurations, and benchmark the two against each other.
Macphisto writes:
Coda smacks my bitch up tho. A non-sucking nfs. With fault handling, redundancy, good performance, a light kernel footprint... drool. It would be cool for /. to go for it but it ain't gonna happen, too beta still... and seeing as this place is just another corporate shop now, they can't take risks.
Tadpol writes:
Get rid of nfs. There are much better ways of distributing filesystems out there. Like GFS
There are many other network filesystems and NFS does have some serious drawbacks, but the requirements and demands of Slashdot are quite minimal. My philosophy is always to try what is quick and easy first and then optimise out the bottlenecks. I believe that we served something like six million pages over our three days of testing and NFS was never a bottleneck. NFS provides far more functionality than we really need and doubling or even trippling speed would show little effect on the overall system.
However, after listening to Peter Braam's talk on Coda and InterMezzo at Linux Expo back in May, I am very excited about the InterMezzo package for use in distributed web hosting. If you ever get a chance to hear Peter speak, do not pass it up -- his talk was one of the most informative conference presentations that I have ever heard. Unfortunately, there is very little information about InterMezzo available on the web and the conference proceedings focused more on Coda than InterMezzo.
Decibel writes:
Couldn't the perl scripts just connect directly to the database server? If they can, that should be much faster than serving the data out of the database machine via NFS, or any other filesystem.
The current system consists of six machines. One dedicated for Ad-Fu, one dedicated for images (no change so far from the old setup) one machine serving MySQL and NFS and three machines serving HTTP requests. We arbitrarily chose three machines for HTTP, but we can bring additional machines on-line in about an hour. The machines that serve HTTP requests do not run MySQL, they make a database connection over the network to the MySQL server.
NFS is only used to serve static pages that are generated directly on the MySQL server to the HTTP machines. Caching them locally would reduce internal network traffic, but that is not really an issue since we have gobs of internal bandwidth to spare. Btw, InterMezzo is my solution for people who cannot afford a private 100 Mbps switch or who would max one out.
Those paying close attention will notice that we are using a mod_perl enabled server to deliver static pages. We can theoritically obtain a performance gain by dedicating certain httpd processes with and without mod_perl and we are considering this as a future project.
Anonymous Coward writes:
Great job! Congrats!
But next time you change the IP for your server, it might be a good idea to decrease the TTL for the IP of "slashdot.org" a few days before the change. That way, it won't take up to 24 hours for other sites to pick up the change after their DNS cache entry has expired
We did initiate this about 4-5 days before the cutover, but there were some problems with Rob Malda's NIC handle. As things turned out, we got the TTL update pushed out about 30 hours before the cutover which should have been sufficient since the previous TTL was 24 hours.
Btw, several people have mentioned this and I am looking into the problem. All I can say at this point is that all of the servers that I have access to updated properly. Is it possible that some caching DNS servers ignore TTL values less than 24 hours to avoid DoS attacks?
ChiChiCuervo writes:
It would have been nice if Rob also mentioned the gurus from DigitalNation who put the servers together and provide the bandwidth to Andover (and now also /.).
I usually don't like to shamelessly plug my employer, but our tech dept is quite overworked and unsung.
Special thanks do go out to all the guys at DN (Chris, Brad, Brian and Gordon) whom I have worked personally, as well as those I have not. Rob did not mention you because I have been the sole networking contact.
Although everyone I worked with directly was intelligent, helpful and courteous, there were some fundamental problems which occured that prevent me from giving a more praise to the company as a whole. (Anyone in a position of power at Digital Nation should feel free to contact me directly regarding these issues, btw.)
I will, however, point out that I now get ZERO LAG!! Yay!
Let's hope that you continue to maintain a good set of peering arrangements so that the rest of us get as close to the same performance as possible. ;-)
Anonymous Coward writes:
Also, what happens to sebastian now? Do the new servers have names yet?
The status of the old hardware is somewhat unknown, at least to me anyway. As best as I can guess, the three machines that were running Slashdot became the property of Andover as part of the purchase. I think they have been unofficially gifted back to Blockstackers to run the Everything project. Despite some of the comments that I have read otherwise, Andover really is a cool company and doesn't quibble over little things like this. (Although, the next batch of SGI flatscreens get set up in our offices. ;-)
The new servers have dull, boring, and unexciting names -- in the DNS anyways. When you are responsible for 30+ machines, you go for descriptive over cute. Besides, they can always be CNAMEd to something more interesting.
marnold writes:
Browsers like to delay page rendering until they have the HTML for any included layers.
Many browsers also like to know the size of the images before they attempt to render the page. This is good incentive to use those WIDTH and HEIGHT attributes on your IMG tags. I've seen lots of web pages which have been effectively "held hostage" by problematic ad servers. Good thing this never happends on Slashdot. Then again...
To the best of my knowledge, all of the images on Slashdot are properly tagged with width and height tags. If you ever find a page in error, just contact Pater.
The real problem is with the IFRAME ads. We are going to try to address this problem as best we can. Another problem related to IFRAME ads in general is that some advertisers require that the HTML live on their servers. This really isn't too bad per se, but we have seen them serve up some FUBARed HTML and their servers bog down under load.
angelo writes:
I was wondering if VA research is still being used, if the NFS connections are running over seperate Net cards, and if we can see more detailed specs for the httpd boxes. It's always nifty to read technical stuff.
My company is looking at VAR systems for handling some stuff, and we took note of this new system as a test case.
Digital Nation does not allow the use of third-party equipment in their facility. They build everything in-house from a common set of components so that their tech staff can diagnose and repair all problems directly. A very good idea, IMHO, as this makes hardware failures such as the ones that occured during Linux Expo much easier to deal with. Personally, I do everything possible to keep my telephone from ringing at 3:00 AM.
I've seen some of the new VA hardware and it does look pretty sweet. I particularly like the new Intel motherboards they use with the remote administration serial port. In the future, I'd like all of my servers to have this feature. (Hint hint!)
In any case, a much more detailed overview of the setup and transition to the new facility will be forthcoming.
Indomitus writes:
I'd just like to say Thanks to Andover for making this new setup possible. And of course, thanks to Rob and Co. for making Slashdot the kickass site it is.
You're welcome!
Oscarfish writes:
This past week the old server has been awful from where I am...this one is a great improvement!
No more adfu serving the banner ads? I was hoping the source code would be released soon...
Indomitus writes:
I was hoping the adfu code would be released too. Maybe with the Andover deal the guys don't need the money they were getting from adfu (however much it was, probably not much) so they're not going to have it around anymore.
Ad-Fu is still currently serving banner ads and the release of Ad-Fu source is really up to Rob. Andover is in the process of merging their own advertising system with Ad-Fu and integrating a delivery mechanism using a compiled-in Apache module. Whether this code will be released as open source is unknown at this time.
Ronin Developer writes:
Maybe it's just me...if that's the case, then I've been hacked (I'm sure I upset one or two people the other day). Or, maybe it's just that my DNS server hasn't caught up yet.
There were some DNS issues, already addressed above.
Anyway, attempts to access /. results in a redirection. Okay. No problem. But, when I try to log in to post, I am given an html page with single line of text that reads "You really want to be on now." And then MS IE5 brings up a dialog box that reads...
"The page cannot be refreshed without resending the information. Click Retry to send the information again or click Cancel to return to the page you were trying to view."
Clicking Retry or Cancel just puts me into an endless loop. Only way out is to terminate IE.
Under Netscape, I see the text and am then redirected to /.
I'm guessing that MS-IE has some problems with either the redirection or the change of IP or a combination of the two. We've noticed several MS-IE problems are are working to correct them. All of us develop and test using Netscape, so we rely somewhat on the "Open Source Browser Testing" model. If you ever notice a problem, please send a detailed description off to Pater.
Eon78 writes:
Early this morning (CET) I found that, although my DNS server & cache gave the correct values, Netscape led me to the redirection page. Does Netscape has a DNS cache of its own? I tried cleaning up the cache (disk & memory) but it didn't help. Now it displays correctly, but nslookup already gave me correct values in the morning...
Netscape does, in fact, cache DNS lookups. I do not know how they flush the cache internally, but the only way that I know of to flush it immediately is to exit and restart. I assume MS-IE functions similarly.
Anonymous Coward writes:
Is it possible that the people updating every 5 seconds are actually in a larger company using NAT so all 10000 /. readers in the company are sharing an IP?
joe52 writes:
How about proxy servers. I recently worked for a company of over 50,000, but all of the http traffic was flowing through proxy servers. The individual boxes must have hundreds, if not thousands, of users accessing the web through them.
That said, I still think that it's probably a script.
When requests come in at a very regular basis, we know that it is a script. ;-)
We know because we have been personally monitoring the logs for any sign of problems. If you were banned, it is because a human (probably Rob) decided to ban you. We hope that a system to automatically ban abusive users will be unnecessary, but it is under consideration.
Reverse Corruption writes:
My Company uses a firewall from VHAsecrue.net We have many people using the .xml file on there windows boxes. I am sorry if this is a huge problem. I wouldn't like to have our IP be placed on lock out. Thanks.
If you ever get banned by accident, please contact Pater to resolve the problem. As mentioned previously, the process is all done manually so for now at least it shouldn't happen without a good reason.
mkasei writes:
I am curtious as to which Alteon products you decided yo use. Can you be specific? For people interested in load balancing this could be enlightning.
jfedor writes:
What's an Alteon? How does it work?
Alteon is the name of a company that makes dedicated, high-speed, load-balancing routers. Given their design, they almost function more like a switch than a router. Slashdot runs off of one port on a shared ACEDirector managed by our ISP, Digital Nation.
The slashdot.org name resolves to an IP address on the Alteon ACEDirector switch. This switch then does some masquerading and hands the request off to an individual web server using a fairly complicated algorithm to attempt to deliver the request to the least-loaded machine.
This is a fairly simplistic model both physically and conceptually, especially since there actually two switches running in a master-slave arrangement to keep things running in case one unit fails. (Can you say single point of failure? I knew you could...)
m3000 writes:
Something is wrong here, according to my browsers, the latest story is the "Business Software for Linux" one that was posted at 12:44 PM EDT. It's now 3:24 AM EDT and NOT ONE story has been posted between those times? Is it just my computer, or is there a serious lack of stories?
This could be due to one of two reasons, or perhaps both. Stories were held up during the move in an attempt to reduce traffic at the cutover point, but there was also a bug in the code that was not dating articles properly. The gap may be explained by software fix -- if you had been hitting the site hourly (why aren't you?) you may have seen a regular flow of articles.
Text advertisements (Score:1)
What I'm thinking of is a paragraph immediately after the story on each comments page, that says, 'ADVERTISEMENT: Fongrel Inc. have just launched a new range of dual-Athlon Fongrix Linux-based workstations.', with links as appropriate. For things like job adverts, this could work really well. And Lynx users would be able to see them.
Whoa, scary corporate guy... (Score:1)
That's all really