Introducing The New Slashdot Setup 306
the original version of this document was written by Andover.Net Alpha Geek Kurt Grey. The funny jokes are his. The stupid jokes are mine.
The Backstory
We realized soon that our setup at Digital Nation was
very flawed. We were having great difficulty administering the machines and
making changes. But the real problem was that all the SQL traffic was flowing
over the same switch. The decision was made to move to Exodus to solve these
problems, as well as to go to a provider that would allow us to scatter
multiple data centers around the world when we were ready to do so.
Meanwhile Slashcode kicked and screamed its way to v1.0 at the iron fists of Pudge (Chris Nandor) and CaptTofu (Patrick Galbraith). The list of bugfixes stretches many miles, and the world rejoiced, although Slashdot itself continued to run the old code until we made the move.
The Co-Loc
Slashdot's new co-location site is now at Andover.Net's own (pinky
finger to the mouth) $1 million dedicated datacenter at the Exodus
network facility in Waltham, Mass, which has the added advantage of being
less than 30 minute drive for most of our network admins -- so they don't have to fly
cross-country to install machines.
We have some racks sitting at Exodus. All boxes are networked together through a Cisco 6509 w/ 2 MSFCs and a Cisco 3500 so we can rearrange our internal
network topology just by reconfiguring the switch. Internet connectivity
to/from the outside world all flows through an Arrowpoint CS-800 (which
replaced the CS-100 that blew up last week) switch which acts as both a
firewall load balancer for the front end Web servers. It also so happens that
the Arrowpoint shares the same office building with Andover.Net in Acton so
whenever we need Arrowpoint tech support we just walk upstairs and talk to the
engineers. Like, say, last week when the 100 blew up ;)
The Hardware
- 5 load balanced Web servers dedicated to pages
- 3 load balanced Web servers dedicated to images
- 1 SQL server
- 1 NFS Server
All the boxes are VA Linux Systems FullOns running Debian (except for the SQL box). Each box (except for the SQL box) has LVD SCSI w/ 10,000 RPM drives. And they all have 2 Intel EtherExpress 100 LAN adapters.
The Software
Slashdot itself is finally running the latest release of Slashcode (it was pretty amusing being out of
date with our own code: for nearly a year the code release lagged behind
Slashdot, but my how the tables have turned).
Slashcode itself is based on Apache, mod_perl and MySQL. The MySQL and Apache configs are still being tweaked -- part of the trick is to keep the MaxClients setting in httpd.conf on each web server low enough to not overwhelm the connection limits of database, which in turn depends on the process limits of the kernel, which can all be tweaked until a state of perfect zen balance has been achieved ... this is one of the trickier parts. Run 'ab' (the apache bench tool) with a few different settings, then tweak SQL a bit. Repeat. Tweak httpd a bit. Repeat. Drink coffee. Repeat until dead. And every time you add or change hardware, you start over!
The Adfu ad system has been replaced with a small Apache module written in C for better performance, and that too will be open sourced When It's Ready (tm). This was done to make things consistant across all of Andover.Net (I personally prefer Adfu, but since I'm not the one who has to read the reports and maintain the list of ads, I don't really care what Slashdot runs).
Fault tolerance was a big issue. We've started by load balancing anything that could easily be balanced, but balancing MySQL is harder. We're funding development efforts with the MySQL team to add database replication and rollback capabilities to MySQL (these improvements will of course be rolled into the normal MySQL release as well).
We're also developing some in-house software (code named "Oddessey") that will keep each Slashdot box sychronized with a hot-spare box, so in case a box suddenly dies it will automatically be replaced with a hot-spare box -- kind of a RAID-for-servers solution (imagine... a Beuwolf cluster of these? *rimshot*) Yes, when it'll also be released as open source when its functional.
Security Measures
The Matrix sits behind a firewalling BSD box and an
Arrowpoint Load balancer. Each filters certain kinds of attacks and frees up
the httpd boxes to concentrate on just serving httpd and allows the dedicated
hardware to do what it does best. All administrative access is made through a
VPN (which is just another box).
Hardware Details
Type I (web server)
VA Full On 2x2
Debian Linux frozen
PIII/600 Mhz 512K cache
1 GB RAM
9.1GB LVD SCSI w/ hot swap backplane
Intel EtherExpress Pro (built-in on moboard)
Intel EtherExpress 100 adapter
Type II (kernel NFS w/ kernel locking)
VA Full On 2x2
Debian Linux frozen
Dual PIII/600 Mhz
2 GB RAM
(2) 9.1GB LVD SCSI w/ hot swap backplane
Intel EtherExpress Pro (built-in on moboard)
Intel EtherExpress 100 adapter
Type III (SQL)
VA Research 3500
Red Hat Linux 6.2 (final release + tweaks)
Quad Xeon 550 Mhz, 1MB cache
2 GB RAM
6 LVD disks, 10000 RPM (1 system disk, 5 disks for RAID5)
Mylex Extreme RAID controller 16 MB cache
Intel EtherExpress Pro (built-in on moboard)
Intel EtherExpress 100 adapter
Re:NFS: What are you using it for? (Score:2)
Batch pushes onto local storage on the HTTP machines wins hands down as far as I'm concerned.
And, as others have said, if you must use central storgage for files, you may well be better off with SMB than NFS.
MySQL in General. (Score:2)
-S
Scott Ruttencutter
Re:Interesting Setup (Score:2)
The site recieves somewhere between 1 and 2 million page views a day. The mySQL server has to be 'beefy' to take that kind of load under mySQL while we wait for replication to be stable and complete.
If you ask "why not __X__" for the database.. our brave programmers are working on a general abstraction layer to allow different databases to be used with slash that is NotReadyYet(tm). When that is complete, people will be able to port slash to the database of choice. For now, the only db supported is mySQL.b
Martin B.
CDROM Booting (Score:2)
The idea is if the system ever crashes, it will automaticly boot off a CDROM, format the hard drive, and install a new copy of the operating system with your modifications to it. This is not a good solution for a database computer, but any of the web servers that contain nothing dynamic themselves, its a great solution. If someone hacks the internal server, just reboot.
We also did the same thing with the multiple datacenters. The computers talked to each other over a VPN network, backed themselves up to one computer that was ONLY accessible over the VPN network, etc. Then from the location of the "secure" severs (only on VPN) we would write and test new versions of our distribution (modified Slackware). Then just burn it to a CDROM and mail it out to the systems around the country. Put it in each system, and again reboot. Drop me an e-mail for more information.
Nicholas W. Blasgen
Re:Why Debian/RH? (Score:2)
No.
If you have compiled Apache to do dynamically loaded modules, it doesn't. That's how the Debian packages work; you have several versions of the mod_php package, each for a different database, and you install the one you want. It's easy, and it works (which is pretty much true of everything Debian; that's why I use it)
As another poster pointed out, they most likey do compile Apache themselves (which I did not find that difficult when I tried it to check out mod_ssl [wanted to use rsaref to stay legal], but then I wasn't using Mandrake). Even if you compile it yourself, with dynamic modules you can always compile more modules later without recompiling Apache.
Also, I don't understand your comment of "RH runs Postgres, not MySQL". Both of these should run on all Linux distros.
Re:Suggestions for the future... (Score:2)
But why's it so expensive? (Score:2)
Why would people use Exodos instead?
D
----
Re:The Return of the Server (Score:5)
s/KURT/MARTIN/;
s/Andover/Adam/;
}
ROBLIMO: Not after we demonstrate the power of this station. In a way,
you have determined the choice of the web site that'll be slashdotted
first. Since you are reluctant to provide us with a URL, I have chosen
to test this station's slashdotting power...
on your home page on iVillage!
AC: No! iVillage is peaceful. We don't flame Linux on iVillage.
We only discuss travel and mystery novels. You can't possibly...
ROBLIMO: You would prefer another target? A commercial target? Then name the URL!
Roblimo waves menacingly toward AC.
ROBLIMO: I grow tired of asking this. So it'll be the last time. What is the URL?
AC: (softly) pcweek.com.
AC lowers her head.
AC: The FUD piece was posted on pcweek.com.
ROBLIMO: There. You see Lord Taco, she can be reasonable. (addressing
Hemos) Continue with the operation. You may post when ready.
The Return of the Server (Score:5)
[MOFF KURT, a tall, confident technocrat, strides through the assembled geeks to the base of the shuttle ramp. The geeks snap to attention; many are uneasy about the new arrival. But Moff Kurt stands arrogantly tall.]
[The exit hatch of the shuttle opens with a WHOOSH, revealing only darkness. Then, heavy FOOTSTEPS AND MECHANICAL BREATHING. From this black void appears DARTH TACO, LORD OF THE SITH. Taco looks over the assemblage as he walks down the ramp.]
MOFF KURT:
"Lord Taco, this is an unexpected pleasure.
We're honored by your presence."
DARTH TACO:
"You may dispense with the pleasantries, Commander. I'm here to put you back on schedule."
[The commander turns ashen and begins to shake.]
MOFF KURT:
"I assure you, Lord Taco, my men are working as fast as they can."
DARTH TACO:
"Perhaps I can find new ways to motivate them."
MOFF KURT:
"I assure you, this station will be operational
as planned."
DARTH TACO:
"Andover does not share your optimistic appraisal of the situation."
MOFF KURT:
"But he asks the impossible. I need more geeks."
DARTH TACO:
"Then perhaps you can tell them when they arrive."
MOFF KURT: [aghast]
Andover's coming here?
DARTH TACO:
"That is correct, Commander. And they are most displeased with your apparent lack of progress."
MOFF KURT:
"We shall double our efforts."
DARTH TACO:
"I hope so, Commander, for your sake. Andover is not as forgiving as I am."
"This server is now the ultimate power in the universe. I suggest we use it!"
Re:MySQL Server. (Score:2)
Don't forget that GPL'd versions of MySQL (older releases) are always made available as well.
Seems faster. (Score:2)
I wonder how many peoplehours go into running slashdot each week?
tcd004
Here's my Microsoft Parody [lostbrain.com], where's yours?
Next on the list... (Score:2)
-Tommy
------
"I do not think much of a man who is not wiser today than he was yesterday."
Re:What is all the hardware needed for? (Score:2)
Add to that the ad serving, graphics, and so on. I don't know how much tracking they do, but ads usually only pay for each unique user - loading the same ad on the same user's screen ten times only counts for one impression. So most sites have to set a cookie just to avoid double-billing, and check it on each pageload to rotate the ads correctly. That adds cpu and memory too.
Re:Why spend all that $ to fix MySQL? (Score:4)
In MySQL, you do not have the choice turning on transactions and atomicity.
You have the choice of turning on features that they mistakenly label transactions and atomicity, but let's call a spade a spade here.
You use MySQL if you care about speed a lot, and don't care much about data integrity. That's a perfectly valid position, but let's not pretend it's some other position.
If you do care about data integrity, you use something other than MySQL, and find another way to achieve the speed.
--
Re:Important Question... (Score:2)
Unless you're going to colocate offshore, which I would guess carries its own risks, I'm not sure that any American colocation facility can protect a site against a search warrant (which I understand was issued in the Steve Jackson case, whether rightly or wrongly).
I would be interested in knowing, however, what contract terms Exodus and other colos offer with respect to warrantless requests for searches. (That is, what has Exodus agreed to do when Agent Foo shows up without a warrant and asks to have a look at the machines hosting website Bar? Is Exodus in breach of contract if it says, "Right this way, Agent?" Or has it agreed to take some other action?)
GAH! (Score:2)
Re:MySQL (Score:2)
IIRC the big hit is in write performance, but one would hope large RAID caches should help this somewhat. Still, a 0+1 rig would probably provide the best overall performance and redundancy (and cost the most
Your Working Boy,
Re:MySQL Server. (Score:3)
This is really important for the web, because a typical web program will start by opening a connection and end by closing it. So you effectively have one connection for every hit that occurs.
Unless you do some fancy sharing of connections, this is going to be a big problem when you use Oracle. This forces Philip Greenspun to use TCL/AOLServer for his work, since it allows connections to stay up between CGI invocations.
In the mean time, I can open and close as many mySQL connections as I need to.
In addition, as I said in another post, he would probably have to rewrite the Slash engine to use another database; it's most likely very dependent on the mySQL API (as my programs are as well). We get a big payoff from this - far greater speed - so we pay the penalty of being stuck on one database unless we want to make a herculean effort to convert all the software we've already written.
D
----
Re:The Return of the Server (Score:2)
ANDOVER:
"Rise, my friend."
DARTH TACO:
"The server cluster will be completed on schedule."
ANDOVER:
"You have done well, Lord Taco! And now... I
sense that you wish to resume your search for
young hot-grits guy.
DARTH TACO: "Yes, my Master."
ANDOVER:
"Patience Lord Taco, in time he will come to you."
DARTH TACO:
"He will come to me?
ANDOVER:
"He will come to you, and you will bring him
before me... I have foreseen it."
DARTH TACO:
"Yes, my Master."
ANDOVER:
"Everything is proceeding according to my design!"
*cackle*
Re:MySQL Server. (Score:2)
Also, while the source code may be available, the licesnse that MySQL is under does not come anywhere close to meeting the open source definition [opensource.org] - with the exception of older versions as you mention.
Re:Yes, Why MySQL Server? Why not PostgeSQL? (Score:2)
That's right, the latest versions of MySQL support a new DB file format (based upon the well recognized Berkeley DB format), and has FULL COMMIT/ROLLBACK capabilities. So no more bitching about that. They've also incorporated support for HEAP tables (emminantly useful for *very* fast cache tables), added better stored function loading, and they almost have sub-selects at the (very fast) speed they want.
I like postgres, it's very powerful for more traditional DB work... but MySQL is still the DB to go to when you feel that need for speed. So check out the feature list of the 3.23.x series. You won't be dissapointed.
Re:MySQL Server. (Score:2)
Well, I can't speak for the
--Shoeboy
(former microserf)
Re:But why's it so expensive? (Score:2)
* They have 2 Generators and fuel to last indefinately.
* They have cold standby network equipment
* They have 24h security, video cameras & biometric access.
* We can (socially) network with our peers at the datacentre
* Good Fire & Environmental Protection
* Setting up a datacentre is a lot of hassle for 5 boxes...
* Don't need to worry about real estate.
* Datacentre has better connectivity than redundant T3's to your office.
* Network capacity is cheaper at a datacentre
* Better SLA terms at a datacentre
Beware the Intel EtherExpress Pro w/linux (Score:4)
Damn (Score:2)
Slashdot Commerical? :) (Score:4)
NFS servers: $21120
Database server: $25739
Being THE place for Natalie Portman and Hot Grits on the Web: priceless
There's some things money can't buy. For everything else, there's Slashdot.
---
Legacy Support (Score:3)
Re:NFS Question (Score:2)
D
----
Because it breaks the browser, thank you. (Score:2)
If I were as lazy as you suggest, I wouldn't have visted their page at all.
Re:Why spend all that $ to fix MySQL? (Score:4)
Re:MySQL Server. (Score:2)
Oracle may indeed me somewhat slower - these guys have the money to throw at more hardware! And they need the features that Oracle provides.
As for code modifications - make a fork that supports Oracle. There appears to be people working on Slash full time now. It isn't like back in the day when Rob had to make the changes himself.
My best guess is that this is a result of 'do what you know best'. Oracle isn't well known in the open-source world, and there may be nobody at Andover who has real experience running it. I think their money would be better spent on setting everything up to run Oracle and hiring an Oracle admin, but that is just my humble $0.02.
Re:Interesting Setup (Score:2)
The economics of life are a lot different when you're a major corporation. I was able to get my mid-sized 100-employee company to buy a $ 10,400 VA Linux FullOn 2x2 server even though it was overkill; the money simply wasn't worth thinking about in comparison to the costs associated with having a slow server.
D
----
Your points... (Score:2)
2. Don't know...
3. Well, it all depends on what you are trying to achieve... it does speed things up... but it depends on load... if you're worried about security... well, don't punch those ports through the firewall...
Re:Why Debian/RH? (Score:2)
Re:Important Question... (Score:5)
Well, a typical Exodus facility isn't nuke-proof, but it's pretty damn close. I've toured one (in Herndon, VA) because our company is about to co-loc at it. Here's a brief rundown of the physical security:
You run into all this before you even see anything resembling a computer, apart from the terminals in the receptionist's enclosure. In the actual computer pens, you have the cages, and for the really paranoid, you can get a steel box with a biometric lock instead of a conventional cage.
To sum up...it would take a truly concerted effort to physically breach one of these facilities.
Aero
no, silly, they mean shadouts (Score:2)
George
what's funny is that... (Score:3)
(that is not to say the trollers and such are not fun, which they are.. they're just not useful outside the context of
Why Debian/RH? (Score:4)
Forgive me if this has been asked elsewhere, but why did y'all choose those distributions for those servers? I'm genuinely curious; I'm unfamiliar with the large-scale differences between distributions. (My computer runs Mandrake... that decision was based on the single factor that my friend happened to have a Mandrake CD on him.)
Yes, Why MySQL Server? Why not PostgeSQL? (Score:3)
Exactly. So why not move to a product that has it, like Oracle/Informix/...., or, if you are going to spend the $, why not invest the $-time in PostgreSQL, a database that IS opensource?
Is there any reason beyond: MySQL is what we have been using, so now we will continue to use it?
MySQL has said:
On Roll-Back [mysql.com]
"MySQL has made a conscious decision to support another paradigm for data integrity, "
Ok, fine, that is a design choice. If they wanted it(rollback), they would have designed it in.
PostgreSQL has rollback, and just needs database replication, and they would LOVE to see that feature.
So, why work with MySQL, other than "it is what we have always done" or "We didn't think of another option"? Are you hoping to have them change the licence?
Re:Why Debian/RH? (Score:2)
CmdrTaco: "The admins were having a problem with debian and the quad xeon. It was
just quicker to install red hat. I think they're silly
--hunter
Re:But why's it so expensive? (Score:2)
What's are SLA Terms?
D
----
For the record -- "Exodus"?? (Score:2)
All I can see from their homepage is that they're vaguely ISP-looking. But I have a personal policy of Not Bothering With Flashy Graphics-Laden Web Pages, so I didn't push any further through the morass.
From the context of the previous play-by-play article, I take it Exodus provides physical storage space and connectivity for your machines, and not much else...?
Re:Nothing to do with data modeling (Score:2)
In the end, it's probably the speed, and since they're used to it and its various quirks.
D
----
Link here from About (Score:2)
__
Re:MySQL Server. (Score:2)
Here's where I speculate cause I've not checked out the code itself.
The SQL server (whatever variety they want to run) needn't be so beefy if there was some sort of caching integrated into the code. Are we hitting the SQL server every time a page is generated? Why not generate the index (for each of the kind of views like different thresholds) once a minute? Same with Slashboxes, they can be cached like nobody's business.
Plus, if you base the page assembly around caching, you can have the assembly later spread around the internet, and only have a database in one location (I'm being simplistic here).
Caching gives you sooo much performance if you use it sensibly.
cheers,
-o
MySQL Server. (Score:3)
Scott Ruttencutter
NFS Question (Score:2)
Interesting Setup (Score:3)
I can understand the need for failover for the MySQL which is a major requirement, but the computer itself is quite a major overkill. I do adminstration for over 16 different servers over 3 different clients and we uses Mandrake Distros on those and only problem I usually finds is the MySQL can be really stressful when there is many people using it on the same server with the apache server. I am in process of moving the MySQL over to new delicated server so it can handle that among several servers, and maybe that might call for beefy processor to handle the load.
If the site is setup to handle... let say.... 100,000 hits a day, what server configuration is needed for this? With MySQL on it own server, of course.
Pictures!! (Score:3)
"We Want Pictures.. We want Pictures.. We want Pictures"
Seriously though: it would be nice to actually see this setup. Don't forget to have CowboyNeal give us an oh so sexy pose near the almost as sexy VA SQL server...
Thank you Slashdot! (Score:3)
Lots of varying methods are discussed of how to properly protect or run a server, and now we get a real life scenario of what happens when the shit hits the fan.
Don't just publish the Hellfire series, package up this one too.
Exodus anecdotes (Score:2)
As far as costs and security... a service like Playboy.com would probably be paying about the amount they were paying for the cage at Exodus just for proper bandwidth. Exodus offers all that bandwidth, but with a lot of added warranties against any type of failure. I think that's probably worth the price being paid.
Sure, Playboy.com probably doesn't need kevlar lined walls, but I can imagine a certain amount of chaos would ensue if someone decided to go after, say, Etrade's physical setup.
Some other fun things to do at Exodus:
Make friends with the guy running the shipping area. While I was there, I needed an extra fiber channel cable - I was shorted one in shipping, and fiber channel cables aren't something you can just go buy at the local computer shop. I asked the shipping manager if maybe I missed a box or something, and told him what I was looking for... he had at least a couple dozen cables - along with several high-end disk drives and an entire Dell server, which had all become the property of Exodus, 'cause no one had come back and picked them up in a decent period of time.
Exodus is a good place to dumpster dive. Their facilities are very low key. I doubt that there's a listed phone number for any of them and the one I visited didn't even have a sign indicating what it was, but if you can identify one... there's lots of interesting stuff being thrown away there.
I picked up an entire APC rackmount enclosure while I was there, and a dead Compaq server (which had overheated inside the enclosure). I just happened to be outside when someone was bringing it to the garbage. That was a "kid at christmas" moment for me. That Compaq had a working RAID controller and a dual-port Ethernet card as well (too bad they took all the disks out before they trashed it).
If you're ever working in one, or setting up a cage in one of their facilities, do yourself a BIG favor and bring some kind of chair or stool. There aren't any in the cages, nor any to loan out, and if you're there more than a couple hours, you'll probably want one. Also, a well appointed cage has a land-line phone. Cellular reception in the facility I visited was terrible (go figure), and if you need another phone, well, there was one in the ops area that customers could use, but that's not really a good thing if you need to talk to someone while you're working in a cage.
Walking among the cages at the facility I visited was extremely educational. I would guess that at least 60% of the computers I saw were either Compaqs x86 servers or Compaq Alphas. Probably another quarter were Suns. Almost everything else was Dell. I only saw two SGIs (both in the same cage), one IBM machine (in one of those cute U6 form factors), and maybe a half-dozen VA Linux boxes. One cage I walked past had 42(!) Sun Enterprise 4500s, and 6 Storage Arrays. One has to wonder what was going on in there.
BSD Firewall.... (Score:2)
I have a couple of questions and a comment. (Score:3)
I wonder why there is a red hat box in the mix? what is the reason? Now I am a Debian bigot, but my guess is that so are you guys, is there somthing specific about Redhat and mysql that I don't know?
Second why not mylex cards in all the box's? mylex's new DAC110 SCSI cards are simply the fastest I have ever seen.
Why not Gigabit? I use it with Linux it works, it makes all that heavy duty hardware sing, 100mbit is just a passe :>
I am proud that you chose Potato for most of your box's
Re:I got two words for ya... (Score:2)
Christopher A. Bohn
Because they have lots of expensive stuff (Score:2)
I'm sure you could. But Exodus doesn't deal with anything as slow as multiple T3's. They advertise the fact that each of their data centers has multiple OC-12 lines. Each one of those is capable of 622.08 megabits per second. A T3 gets you only 44.736 Mbps.
They have huge battery systems, power conditioners, and multiple disel generators at each site. They are usually connected to more then one power grid. They can function without commercial power indefiniately.
They have redundent everything. Network feeds. Routers. Switches. Power. Cooling.
They have very tight security. Armed guards. Biometric (e.g., hand print) locks. Cameras. Steel doors. Double-walls. Personnel locks.
You use Exodus if you absolutely, positively cannot afford downtime due to third-party service failures. If you have to ask, you can't afford it.
Re:For the record -- "Exodus"?? (Score:2)
Re:what's funny is that... (Score:3)
Why spend all that $ to fix MySQL? (Score:5)
Just out of curiosity, wouldn't it be easier to use something like PostgreSQL [postgresql.org] (which is just as freely available) that already has rollback & atomicity than to pay the MySQL people to develop it? Didn't y'all read the article on here a few weeks ago, "Why not MySQL? [openacs.org]"
__________________________________________________ ___
Re:For the record -- "Exodus"?? (Score:4)
I'd tell you, but I have a personal policy against helping lazy luddites who think they're taking some kind of principled stand because they don't visit sites that use <img> tags. At least read the damn FAQ linked to on their home page.
Cheers,
ZicoKnows@hotmail.com
3 Image servers? (Score:2)
,i>>3 load balanced Web servers dedicated to images
Why three image servers? Slashdot isn't exactly the most graphics intensive site I've seen. A few icons and the banner ads. Are you planning more graphics/art? Or, is this just to ensure that the Ads are loaded quicker than the rest or the page (jab to the ribs!).
Re:Oddessey? (Score:2)
The goal of Odyssey is to dispense with having each of our admins ssh into each box at Exodus and manually make changes. Instead, an admin here at Andover.net will point their Web browser at our secure server, login to Odyssey, tell it (for example) to change the MaxClients configuration parameter on the Apache servers running on boxes W, X, Y, and Z, click "Make It So", and the change is archived, validated for correctness, checked for collision with other admins making related changes, and performed. Other tasks can also be done the same way: power-cycling boxes remotely, hot-swapping a live spare for a dead box, etc. Changes can be backed out by Odyssey too: just find it in the archive and click on "Revert" -- as long as it can be reverted in a sensible way, it will be done automatically. It streamlines many administration tasks and gives an audit trail of who did what when.
As for monitoring, Odyssey will do both black-box and white-box monitoring of network services and host resources (i.e., instead of just verifying the Web server is listening on port 80, it can also send GET requests and validate the responses).
It's being written in Perl.
Open source? (Score:2)
Why following a "cathedral" development model when there are many folks out here willing and eager to help out with your code?
and what's the root password, too? (Score:2)
Re:What is all the hardware needed for? (Score:3)
--
More important question (Score:2)
More importantly, how many people-hours are lost due to reading Slashdot each week? :-)
Ford Prefect
Waltham, MA? (Score:2)
-Mark
Total cost (Score:3)
webservers (type 1) = $4505 each
NFS server (type 2) = $7040
database server (type 3) = $25739
So the grand total is $68819. I haven't found the prices for the switches and firewall. I would suspect that the BSD box is close in price to the webserver (prob. a bit less).
Re:1 Gig on web servers? (Score:3)
Not true at all. I'm running a slash server [baked.net] which doesn't get very many hits (~3000 in the last two days, chump change compared to Slashdot) and right now httpd is using 270MB of RAM.
--
Re:For the record -- "Exodus"?? (Score:2)
Re:MySQL Server. (Score:5)
It's not like it is for some sort of open-source reason - MySQL isn't released under an open-source license. I'm curious why slashdot/Andover are spending money funding a closed source project rather than funding an open-source one or forking over the $ for a more capable database like Oracle.
Re:Why did you choose MySQL? (Score:2)
In one form or another it actually dates back to the late 1970's. Have a look at http://www.postgresql.org/docs/awbook.html for some background information.
Unfortunately, the academic roots of PostgreSQL meant that the codebase was very complex and unstable for years. Apart from commercial offerings based on the codebase, it was mainly a platform for academic research with emphasis on features rather than stabililty.
The release of PostgreSQL 7.0 will see it finally come of age in comparison to it's commercial alternatives.
Chris Wareham
Re:MySQL Server. (Score:2)
You usually go with multiple web servers because the box doesn't necessarily need to be all that fast. It doesn't make sense to spend 5 figures on a beefy multi-processor box that you use as a front-end for querying a beefy multi-processor database box when a few cheap $400 eMachines (yes, I know they're shit, from personal experience) will give you equivalent performance. So you put a load-balancer in front of your multiple web servers, and they all serve, and query the SQL box.
OTOH, you really have to have a database all in the same box. You have multiple database processes, all with different information, writing and updating to disk. Not to mention the info that's cached in memory to speed up performance (so you don't have to keep hitting the disk). In this case, you want the largest box you can afford, with multiple processors, a gig or two of RAM, SCSI disks, etc, to make it as fast as possible, since it really needs to be one box.
HTH.
--
Exodus is a *BIG* ISP (Score:5)
Exodus is one of the world's biggest (in terms of service capacity available) Internet Service Providers.
"We're going to need bandwidth. Lots of bandwidth."
Exodus specializes in having more bandwidth then most of the third world. They've got NAPs (Network Access Points, i.e., backbone connections) all over the continental United States, and a few outside the US as well. They link this all together using both external and internal networks. The end result is, most anywhere on the net that has a good connection, has a good connection to Exodus.
They provide servers. Do you need to host downloads for ten million users? Exodus can give you servers to do so.
They provide co-location space. If their standard server packages just won't cut it -- bring your own. They'll give you a rack, a dedicated co-loc cage, or a dedicated high security vault.
Their web page [exodus.net] has a lot of graphics because they have a lot of pictures of their equipment and graphs of their capacity. It is actually justified. You may want to make a return trip.
Re:Blown up arrowpoint? (Score:2)
Essentially the CS-100 had content checking on, combined with the SYN (and the fact it was a loaner unit/refurb until the 800 came) it died.
Arrowpoint is doing a post-mortem now.
-Pat
Interesting... (Score:2)
All I've ever heard were horror stories about its poor implementation. Water under the bridge? Righty right?
I assume your sharing static content and image serving using the NFS shares. Perhaps you've even gone so far as to utilize qmail or the sendmail/procmail NFS mail hack in order to store your email on the NFS share?
What shyed you away from the Linux Virtual Server Project's implementation of load balancing? Or the mon+hearbeat+fake+coda solution for HA? Or is your "Oddessey" work based on this in any way?
Are you loadbalancing your firewalling BSD box? Or have we reached a critical point of failure at the firewall?
Very interesting stuff, I'd like to know more details though.
Re:For the record -- "Exodus"?? (Score:3)
They have, literally, 1000's of racks and TONS of machines. At the Sterling, Virginia facility which we moved into (which, BTW, is one of their newest and flagship facilities) I saw SGI Origin machines, countless Sun enterprise-level machines, mainframes, small machines, etc, etc, etc. They run OC-12, 48, or more between each of their centers. They offer "Datacenters within the Datacenter" which are little rooms constructed on the raised floor which offer a secure environment that companies can pay (lots!) for. They have fibre coming in to multiple points in most buildings, many generators, huge UPS's, fingerprint readers to get into the network rooms, etc, etc, etc.
It's pretty phat. Their service is top notch as well. I had to fly up there from Dallas and it was an immense pleasure (even though I had to work.
They're pretty awesome (and no, I don't work for them).
Jeff
Re:dual nics? (Score:2)
Most likely one nic connects to the internet (through the firewall), and the other to the database backend.
Assuming the above is true, there are several advantages. First, data traffic to the sql server isn't generating colision with data from the internet, this allows both less ethernet collisions for more bandwidth, and more security. The Sql machine can run the every buggy service imanginable. It won't matter because nothing can get to it. The web servers only listen on ssh and httpd ports, and don't forward anytying. The sql network can be a private (192.168.x.x) network which isn't routable. You can still break it I suppose, but much more difficult.
Cachedot? (Score:3)
Of course such a thing would not need to be as powerful as the main slashdot systems, but would provide some additional backup in case of another DDOS or a network outage of some sort.
Sort of a "battle bridge" for those of you who remember the days when Star Trek was good. (startrek.version = ST:TNG)
Mycroft-X
Debian not on the SQL server? (Score:4)
Why isn't the SQL server Debian as well?
If there's any problem with Potato's MySQL, I think Debian would be pleased to hear, whether it's a bug report in the BTS or whatever.
Thanks
Re:NFS Question (Score:2)
Re:1 Gig on web servers? (Score:3)
The web servers are running mod_perl, each process takes up alot of RAM (hrrrm...something to streamline with mod_perl/slash?)
So as a result, the machines that need the most amount of RAM are the webservers and MySQL machines.
essentially we need enough RAM to run up to the MaxClients set in apache *and* have file cache
-Pat
Quick question... (Score:2)
Red Hat Linux 6.2 (final release + tweaks)
Is that only a reference to tweaking the max number of processes in the kernel or did you apply some alien-technology-from-outer-space-experimental kernel patches?
If so, details pleeze!
Re:For the record -- "Exodus"?? (Score:2)
Basically, they have connectivity coming out their ears (one or more OC48s to the other Exodus facilities, OC3s to "most" of backbone carriers, at each site) the in a number of buildings that offer things like highly secured access (full time security staff, kevlar lined walls, 2" thick electronic-keyed steel doors), mutliple power circuits and UPS, extremely regulated environmental conditions. You have to sign an NDA to even walk in their building.
Basically, the Exodus facilities are going to be better than anything a single company could afford to do on their own.
They host sites like Ebay and (at the one I was at) all the Playboy.com stuff.
To me the best part was that everything in their
vending machines only cost a dime. =)
Re:Why Debian/RH? (Score:2)
Here at work I'm setting up a Linux server for each of our locations to use for sendmail and apache. Each of our machines is using Red Hat 6.2 (final) because Red Hat was easier for me to explain to my non-linux enabled co-workers.
But why would you want two of your systems to be Debian and one to be Red Hat? Wouldn't that just make it harder to figure out when something goes wrong with one of them. Personally I like to be able to look at a machine that works and compare in case I miss something.
Is Red Hat better prepared for mySQL than Debian?
Devil Ducky
Re:For the record -- "Exodus"?? (Score:4)
Others have said it well, but I'll add this: Exodus hosts Yahoo. 'Nuff said.
--
Suggestions for the future... (Score:2)
2. Switch from NFS to SMB - even the apache site recommends this for speed.
2a. Get rid of NFS and just sync all your web-servers from one server - hence having local copies of all the code.
3. Look into having local instances of SQL running on the web-servers - read-only copies that are replicated from the main DB... then the central DB would only be used for write (aka comments and postings...)
Just my less than humble ideas...
Lies (Score:4)
Re:Interesting Setup (Score:2)
Slashdot is the 2779th most visited website on the internet, according to pcdataonline [pcdataonline.com]
Check it [pcdataonline.com] out. Only 2736 places to go before they tie with porncity.com.
Of course yahoo [yahoo.com] is #1. Google is better though
Re:Why did you choose MySQL? (Score:2)
1) It's very simple to install
2) It's very fast
3) PostgreSQL only really came of age recently
The lack of features in MySQL are only now biting them, but rather than switch to PostgreSQL they're funding the MySQL guys to add those features. Rather a nice way of rewarding them for producing the software at the heart of Slashdot, and one that will benefit others who may find themselves in a similar position one day.
Chris Wareham
Network Topology (Score:4)
Re:Beware the Intel EtherExpress Pro w/linux (Score:2)
For some bizarre reason, problems (TX: Transmit Timed Out) disappeared when I tried to tcpdump what was happening. As a total kludge, if you ifconfig eth0 arp promisc up, it doesn't happen anything like so often (if at all).
This was a Saturday afternoon workaround - thinking why is a week-day activity
~Tim
--
Post bug reports at SourceForge (Score:2)
__
Re:Why spend all that $ to fix MySQL? (Score:2)
MySQL is extremely fast if you are doing mostly SELECTs (versus a lot of INSERTs and UPDATEs), and that's what Slashdot tends to do. The site spends much more time retrieving messages (and displaying them) than it does storing new messages. Even after the MySQL team spent lots of time tweaking benchmarks to make *PostgreSQL* look better (!), MySQL continues to be much faster, especially in SELECTs.
But as the friendly MySQL [mysql.com] team will tell you (on the excellent, excellent mailing list), there are many scenarios where PostgreSQL is the best choice. And hey, there are situations where industrial-size Oracle is the best choice (although it will cost you 5-7 figures).
Re:Open source? (Score:2)
Why following a "cathedral" development model when there are many folks out here willing and eager to help out with your code?
---
Because you can't pile a load of crap on someone's doorstep and expect them to suddenly start fixing everything. You need to give them something to work with.
- Jeff A. Campbell
- VelociNews (http://www.velocinews.com [velocinews.com])
Re:Why spend all that $ to fix MySQL? (Score:3)
I can imagine with the loads on slashdot on order of magnitude speed difference makes a world of difference. Second there's already an complete MySQL code base for slash, so you get stuck with "industrial inertia".
VA owns them, so isn't hardware at cost? (Score:2)
Re:Debian not on the SQL server? (Score:3)
The important question (Score:3)
What? servers don't need a GUI? Don't let
redmond find that out...
Re:Why spend all that $ to fix MySQL? (Score:2)
Besides, what's wrong with helping the mysql team? pretty soon, mysql gets most of the features everyones been bitching about (including the replication, thanks Andover!), and it still blows the pants off other dbs in terms of raw speed, and suddenly, the recently much maligned mysql does kick some serious arse.
Re:Suggestions for the future... (Score:2)
Frankly, NFS sucks. It's old, very old, feature poor, inherits Unix's dubious authorisation layer and generally bites. It's dog slow, too.
SMB in general and Samba in particular offer more features, better stability and much faster performance.
And yes, FTP is probably fastest of the lot, and HTTP even faster. But by then we are talking REALLY feature poor, aren't we