Slashdot.org Self-Slashdotted 388

Posted by kdawson on Tuesday February 10, 2009 @12:08AM from the disturbances-in-the-fabric dept.

Slashdot.org was unreachable for about 75 minutes this evening. Here is the post-mortem from Sourceforge's chief network engineer Uriah Welcome. "What we had was indeed a DoS, however it was not externally originating. At 8:55 PM EST I received a call saying things were horked, at the same time I had also noticed things were not happy. After fighting with our external management servers to login I finally was able to get in and start looking at traffic. What I saw was a massive amount of traffic going across the core switches; by massive I mean 40 Gbit/sec. After further investigation, I was able to eliminate anything outside our network as the cause, as the incoming ports from Savvis showed very little traffic. So I started poking around on the internal switch ports. While I was doing that I kept having timeouts and problems with the core switches. After looking at the logs on each of the core switches they were complaining about being out of CPU, the error message was actually something to do with multicast. As a precautionary measure I rebooted each core just to make sure it wasn't anything silly. After the cores came back online they instantly went back to 100% fabric CPU usage and started shedding connections again. So slowly I started going through all the switch ports on the cores, trying to isolate where the traffic was originating. The problem was all the cabinet switches were showing 10 Gbit/sec of traffic, making it very hard to isolate. Through the process of elimination I was finally able to isolate the problem down to a pair of switches... After shutting the downlink ports to those switches off, the network recovered and everything came back. I fully believe the switches in that cabinet are still sitting there attempting to send 20Gbit/sec of traffic out trying to do something — I just don't know what yet. Luckily we don't have any machines deployed on [that row in that cabinet] yet so no machines are offline. The network came back up around 10:10 PM EST."

Slashdot.org Self-Slashdotted

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 388 Comments Log In/Create an Account

Comments Filter:

Do you get the pink screen? (Score:4, Funny)

by BadAnalogyGuy ( 945258 ) writes: <BadAnalogyGuy@gmail.com> on Tuesday February 10, 2009 @12:10AM (#26793423)

So if you hammer your own servers, do you have to send an email to krow to get your privileges restored?

- Re:Do you get the pink screen? (Score:5, Funny)
  
  by MindlessAutomata ( 1282944 ) writes: on Tuesday February 10, 2009 @12:13AM (#26793463)
  
  The manager that did that at a restaurant I used to work at got his privileges revoked, instead.
  
  - - Re:Do you get the pink screen? (Score:5, Informative)
      
      by TheLink ( 130905 ) writes: on Tuesday February 10, 2009 @12:46PM (#26798839) Journal
      
      core = core switch = a main switch that most of the edge switches/devices are plugged into.
      reboot core = reboot a core switch.
      
    - Re:Do you get the pink screen? (Score:5, Funny)
      
      by BunnyClaws ( 753889 ) writes: on Tuesday February 10, 2009 @03:01PM (#26801183) Homepage
      
      I read the article submission, now I have a headache.. You can reboot individual processors in a computer?
      This comment made me laugh. No, I am not laughing with you, I am laughing at you.
      
Wow, that sucks (Score:3, Interesting)

by drachenstern ( 160456 ) writes: <drachenstern@gmail.com> on Tuesday February 10, 2009 @12:11AM (#26793427) Journal

So why didn't ya'll have access from the home office?

- Re:Wow, that sucks (Score:4, Insightful)
  
  by Arthur Grumbine ( 1086397 ) * writes: on Tuesday February 10, 2009 @12:41AM (#26793653) Journal
  
  And "access from the home office" would allow them to do what exactly?!?
  
  - Re:Wow, that sucks (Score:5, Funny)
    
    by jd ( 1658 ) writes: <imipak@yaho[ ]om ['o.c' in gap]> on Tuesday February 10, 2009 @01:46AM (#26794027) Homepage Journal
    
    Act as a data source to Excel.
    
  - Re:Wow, that sucks (Score:5, Funny)
    
    by Dan East ( 318230 ) writes: on Tuesday February 10, 2009 @01:14PM (#26799263) Journal
    
    And "access from the home office" would allow them to do what exactly?!?
    Guaranteed first posts.
    
  - - Re:Wow, that sucks (Score:5, Informative)
      
      by goaliemn ( 19761 ) writes: on Tuesday February 10, 2009 @11:30AM (#26797723) Homepage
      
      The point is, they hadn't already given him direct access to those connections before yesterday, and he had to spend a large chunk of those 75 minutes getting the authorization to access the equipment so he COULD fix it.
      That's not how I read it at all. The switches were so overloaded that he had to "fight" to get into the box. He, more than likely, already had access to the box, but the network was working against him.
      
    - - Re:Wow, that sucks (Score:5, Informative)
        
        by Achromatic1978 ( 916097 ) writes: <robert@chroma b l u e.net> on Tuesday February 10, 2009 @01:50PM (#26799849)
        
        He (she?)
        For Slashdot staff, I think the generally accepted nominal is "It"...
        
- Sometimes You Have To Be There (Score:5, Interesting)
  
  by maz2331 ( 1104901 ) writes: on Tuesday February 10, 2009 @03:59AM (#26794619)
  
  It may be strange for those not in the networking field, but when things really go bad, the only place to be is physically in the data center.
  That means looking at the LEDs on switches for traffic indications. If you see a single port is spewing a LOT of activity during an outage, disconnect it. No, don't make it "down" but pull the cable out of the port.
  Then go downstream and repeat until the potential problem set is reduced to an understandable level.
  What really sucks about these kind of outages is that you can't remotely log in to various hosts or switches - you have to pull wires out of ports to break the "spew" that is taking things down.
  I have to remember to charge a 100-X surcharge the next time I troubleshoot one of these... (300X if after-hours)
  These sort of problems are REALLY hard to find, but trivial to fix.
  
  - Re:Sometimes You Have To Be There (Score:5, Informative)
    
    by amorsen ( 7485 ) writes: <benny+slashdot@amorsen.dk> on Tuesday February 10, 2009 @04:52AM (#26794799)
    
    Depends how good your out-of-band management is.
    
    - Re:Sometimes You Have To Be There (Score:5, Insightful)
      
      by dkf ( 304284 ) writes: <donal.k.fellows@manchester.ac.uk> on Tuesday February 10, 2009 @11:10AM (#26797489) Homepage
      
      Depends how good your out-of-band management is.
      And whether anyone's been "smart" enough to decide to run the out-of-band management access over the same network as the production networking "to save resources"...
      
    - Re: (Score:3, Informative)
      
      by account_deleted ( 4530225 ) writes:
      
      Comment removed based on user account deletion
  - Re:Sometimes You Have To Be There (Score:5, Interesting)
    
    by INT_QRK ( 1043164 ) writes: on Tuesday February 10, 2009 @08:40AM (#26795957)
    
    I don't know if this is relevant, but at 1351 (EST) I was (attempted) port scanned by 216.34.181.45, which "Who Is" says belongs to Source Forge... wow...coincidence, just got hit again time 0738 same IP
    
    - Re:Sometimes You Have To Be There (Score:5, Interesting)
      
      by sentientbeing ( 688713 ) writes: on Tuesday February 10, 2009 @01:53PM (#26799881)
      
      Those times coincide with recent posts you made at slashdot (216.34.181.45) I think after each post slashcode quickly scans the originating IP to check for proxy trolling.
      
  - Re:Sometimes You Have To Be There (Score:5, Informative)
    
    by jamie ( 78724 ) * writes: <jamie@slashdot.org> on Tuesday February 10, 2009 @09:52AM (#26796535) Journal
    
    Our network engineer lives a couple of states away from the data center. The work he's talking about doing, he did from home.
    
  - - - Re: (Score:3, Interesting)
        
        by flappinbooger ( 574405 ) writes:
        
        If it's a hardware fault software management won't help.
        
        A bad NIC brought down a whole airport a while back, read it on here, IIRC.
        
        That might have been bad design, but who woulda thought that a NIC card can hose a network? A bad switch.... even worse.
  - - Re:Sometimes You Have To Be There (Score:5, Funny)
      
      by Bearhouse ( 1034238 ) writes: on Tuesday February 10, 2009 @11:00AM (#26797387)
      
      It may be strange for those not in the networking field, but when things really go bad, the only place to be is physically in the data center.
      Heh. I've heard that in the old day you could find broken Token ring hardware by listening after a high pitched whining noise. Guess one really has to be there for stuff like that.
      Was there, and confirm true. Whining noise normally came from IBM SE who was trying to fix problem.
      
  - - Re:Sometimes You Have To Be There (Score:4, Funny)
      
      by Critical Facilities ( 850111 ) * writes: on Tuesday February 10, 2009 @11:40AM (#26797843)
      
      Man, your poor slash key has a hard life.
      
  - - Re: (Score:3, Funny)
      
      by Achromatic1978 ( 916097 ) writes:
      
      I have been faced with personnel that barely know more about networking than the security guard.
      That's not a nice or polite way to talk about your manager.
Thanks for the information (Score:5, Funny)

by sleeponthemic ( 1253494 ) writes: on Tuesday February 10, 2009 @12:11AM (#26793429) Homepage

Now if you could just post the link to the form where I can claim my full refund (for time not wasted incurred) I'll go back to being a loyal "customer".

- Re:Thanks for the information (Score:5, Funny)
  
  by Anonymous Coward writes: on Tuesday February 10, 2009 @12:15AM (#26793479)
  
  Okay, here is the link: http://slashdot.org/subscribe.pl
  You probably owe about $10 for your time not wasted.
  
- Re:Thanks for the information (Score:5, Funny)
  
  by Arthur Grumbine ( 1086397 ) * writes: on Tuesday February 10, 2009 @12:43AM (#26793681) Journal
  
  I don't know about you, but I'm suing for punitive damages. Do you have any idea much pain and suffering the work I did in that time caused me?!
  
  - Re:Thanks for the information (Score:5, Funny)
    
    by Atario ( 673917 ) writes: on Tuesday February 10, 2009 @03:58AM (#26794613) Homepage
    
    Trust me, it's nothing compared to the pain and suffering your work caused us.
    -- The testing staff
    
  - Re:Thanks for the information (Score:5, Informative)
    
    by spartacus_prime ( 861925 ) writes: on Tuesday February 10, 2009 @09:42AM (#26796415) Homepage
    
    I don't know about you, but I'm suing for compensatory damages. Do you have any idea much pain and suffering the work I did in that time caused me?!
    Fixed that for you. Sorry, law student.
    
In Soviet Russia (Score:5, Funny)

by MindlessAutomata ( 1282944 ) writes: on Tuesday February 10, 2009 @12:11AM (#26793431)

In Soviet Russia, Slashdot slashdots Slashdot!

- Re:In Soviet Russia (Score:5, Funny)
  
  by ocularDeathRay ( 760450 ) writes: on Tuesday February 10, 2009 @12:33AM (#26793599) Journal
  
  the headline is confusing, was the problem caused by a recursive dupe or something?
  
  I didn't read the rest of the summary cause it is longer than my finger and that is how we used to roll on the dialup BBSs... never read anything longer than your finger held up to the screen. this message is only intended for people of all finger sizes.
  
  - Re: (Score:2)
    
    by Captain Splendid ( 673276 ) writes:
    
    I'm thinking it was my fault. I was reading Shakrai's journal, went to post a reply, and bam, no more slashdot.
    
    So yeah, sorry about that.
- Re:In Soviet Russia (Score:5, Funny)
  
  by Anonymous Coward writes: on Tuesday February 10, 2009 @02:47AM (#26794297)
  
  Yo dawg, I herd u like Slashdot so I slashdotted your Slashdot!
  
- - Re:In Soviet Russia (Score:5, Informative)
    
    by robophilosopher ( 847226 ) writes: on Tuesday February 10, 2009 @12:42AM (#26793665)
    
    I believe you mean: Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo. The caps matters. In other words, Buffalo from the city of Buffalo that are pushed around by (other) buffalo from the city of Buffalo in turn push around (still more) buffalo from the city of Buffalo. And you thought this was unrelated to the recursive dupe comment.
    
    - Re:In Soviet Russia (Score:5, Informative)
      
      by jez9999 ( 618189 ) writes: on Tuesday February 10, 2009 @05:11AM (#26794893) Homepage Journal
      
      There are no buffalo living in the US. Only bison. ;-)
      
    - Re:In Soviet Russia (Score:4, Insightful)
      
      by thePowerOfGrayskull ( 905905 ) writes: <marc.paradise@NosPAM.gmail.com> on Tuesday February 10, 2009 @10:06AM (#26796695) Homepage Journal
      
      Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo.
      
      What ever happened to "Duck duck duck goose"?
      
- - Re:In Soviet Russia (Score:5, Funny)
    
    by Zarf ( 5735 ) writes: on Tuesday February 10, 2009 @06:57AM (#26795365) Journal
    
    In Soviet Russia ...
    1. Meme Very Tired. No Longer Wired.
    2. 'Soviet Russia' ceased to exist last century.
    3. Profit!!!
    I for one welcome our previous-century-meme based overlords.
    
- - Re:In Soviet Russia (Score:5, Funny)
    
    by Zarf ( 5735 ) writes: on Tuesday February 10, 2009 @06:59AM (#26795373) Journal
    
    Was it maybe a feedback loop of that very thing that caused the slashdotting?
    I think the switch was trying to get first post.
    
A.I. (Score:5, Funny)

by gmuslera ( 3436 ) writes: on Tuesday February 10, 2009 @12:12AM (#26793451) Homepage Journal

probably the biggest proof that Slashdot has become sentient is that is willing to suicide self before seeing again another batch of Idle videos.

- Re:A.I. (Score:5, Funny)
  
  by BLT2112 ( 1372873 ) writes: on Tuesday February 10, 2009 @12:19AM (#26793511)
  
  Like the poet from HHGG whose own intestines leaped out of his throat to strangle himself...
  
*Sniff* they grow up so fast! (Score:4, Funny)

by exley ( 221867 ) writes: on Tuesday February 10, 2009 @12:12AM (#26793455) Homepage

Slashdot has apparently learned how to masturbate, because it is now fucking with itself!

- - Re:*Sniff* they grow up so fast! (Score:5, Insightful)
    
    by adolf ( 21054 ) writes: <flodadolf@gmail.com> on Tuesday February 10, 2009 @12:42AM (#26793663) Journal
    
    Naw. Stuff sometimes, yaknow, happens. People sometimes make mistakes, and hardware sometimes just breaks. It's not always ignorance -- especially, I'd guess, at the level of Slashdot's back end.
    I once implemented a VoIP phone system at a factory in an evening. (This, in itself, was an undertaking - close to 200 extensions, up and running, between Wednesday at close of business and Thursday when folks started showing up, including three hours on the phone with Sprint to get the PRI and T1 circuits reconfigured at 2:00AM.)
    We left, tired and groggy, with an IP phone placed in a common area for the facilities network admins to train any staff who needed training, at about 7:30AM. At 8:30, after I finally got home and managed to close my eyes, my phone rang. It was the network admin. He had a few minor issues which could've waited, but the real problem was that their network was totally fucked: Packets everywhere. No capacity to do anything. An amazing cascading failure of the sort that one hopes to never see.
    And it wasn't any hodge-podge network, either. HP Procurve switches configured in a redundant fabric mode with gigabit fiber links - hot stuff or the time, especially for a factory. The wiring was all new, and was all good. The network had been designed specifically to avoid the limitations of Ethernet, and was successful to that end (a non-trivial task in an existing building complex). But it was tripping all over itself.
    Turns out that someone had taken that fancy IP phone in the common area with its built-in unmanaged switch, and plugged both of its 10/100 Ethernet jacks into the wall. (Nobody knows who.)
    The ensuing packet storm broke everything. Unplugging one of them fixed the problem pretty much immediately.
    I wrote about this here once before, and everyone's immediate reply was this: "Well, duh. They should've turned the Spanning Tree Protocol on, and this wouldn't have happened. They're obviously idiots."
    But the truth is so much more simple: People make mistakes. It was a mistake to keep STP turned off in that environment, and it was a mistake to plug two fancy ports of a Procurve switch into two dumb ports on an IP phone. Had either of those mistakes not happened, things would've been fine.
    But mistakes happen anyway. We do our best, as IT professionals, to minimize these mistakes, or at least keep them away from production. But sometimes, despite having the best people and the best tools and all the knowledge it takes to make stuff work, shit just happens.
    
    - Re:*Sniff* they grow up so fast! (Score:4, Interesting)
      
      by Vidar Leathershod ( 41663 ) writes: on Tuesday February 10, 2009 @01:12AM (#26793853)
      
      I'm surprised STP was off by default. I remember in 1999 or so I had some trouble that resulted in my having to turn STP off on Cisco switches (they shipped with it on (these were 3524s and a 5505). I can't actually remember why. I think it had something to do with a Novell server?
      In any case, I remember saying to the Cisco phone support guy, who had been baffled for 4 hours or so before he told me to turn it off (and things started to work) "Who the heck would plug in two ports from one device into the same network?"
      Since then, I have seen exactly that situation many times in small office environments. Also, the classic plugging in while also being on the wireless side of the network.
      
      - Re:*Sniff* they grow up so fast! (Score:5, Informative)
        
        by Florian Weimer ( 88405 ) writes: <fw@deneb.enyo.de> on Tuesday February 10, 2009 @02:13AM (#26794151) Homepage
        
        I'm surprised STP was off by default. I remember in 1999 or so I had some trouble that resulted in my having to turn STP off on Cisco switches (they shipped with it on (these were 3524s and a 5505). I can't actually remember why. I think it had something to do with a Novell server?
        The problem likely was that the machine required network at boot (typical Netware clients were like that, I've been told). STP started when the link went up, but it took a rather long time, so forwarding had not been enabled when the client required the network.
        Since then, I have seen exactly that situation many times in small office environments. Also, the classic plugging in while also being on the wireless side of the network.
        Port security helps a lot.
        STP is also not fail-safe because typical switches happily forward traffic even if the STP process running on the CPU has died. If you build a L2 core, one broken switch (or OS glitch on a switch) can still take down your entire network easily (it's one of those pesky distributed, multiple single points of failure). In general, L3 networks are somewhat more robust in this regard, so it's often a good idea to avoid switch-to-switch connections (but that might be difficult, as it is difficult to tell L2 devices from L3 devices these days).
        
        
        Re: (Score:3, Informative)
        
        by 222 ( 551054 ) writes:
        
        spanning-tree portfast is your friend! (I'm sure you know this... just saying.)
        
        What bothers just as much is when I see a ton of switches in an environment with their VTP mode set to Server. A small mixup with VTP version numbers and you've replaced your entire VLAN database with... an empty one! Its an easy problem to fix, but nobody likes losing their entire network, even for just a few minutes.
    - Re:*Sniff* they grow up so fast! (Score:5, Interesting)
      
      by Nyall ( 646782 ) writes: on Tuesday February 10, 2009 @01:28AM (#26793933) Homepage
      
      I'm not a network engineer but I think we did that senior year of college (2004). The engineering department provided us with our own work rooms we could lock. The rooms only had a couple of Ethernet jacks so we brought in our own switch which I remember could auto detect the uplink. It was plugged into the wall then someone by mistake plugged both ends of another CAT cable into some open ports. That mistake took down half the campus network for a couple of hours till some very mad IT guys found us.
      
      - Re:*Sniff* they grow up so fast! (Score:5, Interesting)
        
        by adolf ( 21054 ) writes: <flodadolf@gmail.com> on Tuesday February 10, 2009 @01:52AM (#26794061) Journal
        
        The timeframe is pretty close - my story happened late in 2004. The network admins in my story were pretty livid as well. (Well, panicked, followed by angry and lividity once they'd found the fault. They blamed everyone, including us for selling them unmanaged switches in their telephones, and promised to find the responsibile party and throw them under the bus. It never happened. I hope that they eventually turned STP on.)
        It seems to be common in network administration to think (and I've mistakenly thought this way, too) that once some random person does something stupid and the entire fucking thing crashes that they'd just simply undo whatever it was and never do it again. Nevertheless, if lay people (or, no offense, students) were all that good at networking or computers, they'd probably never have produced the problem to begin with.
        These days, in my day job, I work with salespeople and law enforcement. They're not stupid -- in fact, most of the clients I work with do things daily that I could never accomplish -- but they occasionally do stupid things with computers and networks. I try hard to avoid blaming them for what they've done wrong, and to instead try to use it as an opportunity to better (and gently) show them how things actually work.
        I learned this, oddly enough, when pulling some Cat5 at a plastics factory. I moved a ceiling tile in an office that had a photo sensor fire alarm in it, and it went off. The entire plant was evacuated. The fire department showed up. Of course, there was no real fire -- the dust from the fiberglass insulation that I'd set the photo sensor on was enough to trigger it. And, thankfully, they were understanding. Because of my mistake, they learned a few weaknesses of their fire alarm system (some employees couldn't hear it and had to be found and dragged outside, which is a very real problem), and they considered it to be a good fire drill. They continue to hire us back for work today, and I learned not to do that again. :)
        
        
        Re:*Sniff* they grow up so fast! (Score:4, Interesting)
        
        by Xest ( 935314 ) writes: on Tuesday February 10, 2009 @04:25AM (#26794691)
        
        "Nevertheless, if lay people (or, no offense, students) were all that good at networking or computers, they'd probably never have produced the problem to begin with."
        I've seen IT professionals do exactly the same thing many a time. I don't think students are particularly special here, anyone who has never encountered the problem before is prone to it I'd say but most people in IT encounter it eventually one way or another!
        
        
        Re: (Score:3, Interesting)
        
        by Xest ( 935314 ) writes:
        
        Yes, unfortunately though at many places, they're not.
        I think the real question is, why the fuck is this even possible? There shouldn't be a single piece of networking hardware available today that's vulnerable to this by default, it's not as if the problem hasn't been known about since about as long as the relevant networking hardware has been around.
        
        Re: (Score:3, Informative)
        
        by Just Some Guy ( 3352 ) writes:
        
        They're not stupid -- in fact, most of the clients I work with do things daily that I could never accomplish -- but they occasionally do stupid things with computers and networks.
        I usually prefer "ignorant", which implies that you just don't (yet) know any better. I reserve "stupid" for a special class of mistakes, like expecting servers to work while unplugged.
        Put another way, stupid mistakes make you slap your forehead. Ignorant mistakes make you think, "oh, that's interesting!"
        
        Re: (Score:3, Interesting)
        
        by digitalunity ( 19107 ) writes:
        
        I agree, this is a great example. As someone who has worked in manufacturing before, I can say without a doubt most "fire drills" aren't much of a drill since they're planned in advance and staff are notified prior.
        The issue is that during production, staff can't just walk away from their machines without causing tremendous costs. To avoid those costs, management sees fit to notify staff prior to shutdown gracefully which kind of defeats the purpose of a drill.
        The effect is that most manufacturers do not kn
      - Re: (Score:2)
        
        by Yetihehe ( 971185 ) writes:
        
        On our campus we had two student admins per building and we have managed switch per each two floors (10 floors building). This campus was spread through entire city, so two girls which put one cable to their own small switch in room caused entire MAN to go down. It was isolated in minutes and offending floor turned off. Of course, it's not like huge loss happened, so this story will die soon, I submit it here in hope it thrives and comfort some admins that sometimes things don't go too wrong.
    - Re: (Score:2)
      
      by robbak ( 775424 ) writes:
      
      Yes, similar thing happened at this Internet Cafe I admin. I left a RJ45 joiner lying around, and someone (I won't assume malice) used it to connect two of our cables. I am ashamed to say it took some time and binary division to track it down.
    - Re:*Sniff* they grow up so fast! (Score:4, Insightful)
      
      by totally bogus dude ( 1040246 ) writes: on Tuesday February 10, 2009 @04:27AM (#26794699)
      
      I'm somewhat wondering how you manage to set up a fully redundant switched network without using spanning tree at all? I suppose they might've enabled it just for the switch interconnects and left it off for the access ports so they'd come up faster. Still if that was the case, they should've been aware of the risks and symptoms thereof.
      
  - - Re: (Score:3, Funny)
      
      by ta bu shi da yu ( 687699 ) writes:
      
      Was that before or after you fought for your right to party?
Did you feed (Score:2)

by mrmeval ( 662166 ) writes:

The HAMSTERS?
http://www.webhamster.com/ [webhamster.com]
On the plus side (Score:5, Funny)

by Toe, The ( 545098 ) writes: on Tuesday February 10, 2009 @12:13AM (#26793461)

Any day you get to legitimately use "horked" in a public post can't be all bad. :P

- Hork's been forked -- it's "borked"! (Score:3, Informative)
  
  by zooblethorpe ( 686757 ) writes:
  
  But I thought "horked" meant, y'know, horked, eh? Meaning, like, "stolen" --
  Doug: Hey - somebody horked our clothes!
  Bob: Geez, who'd want to hork our clothes, eh?
  Cheers,
Would like final analysis (Score:5, Interesting)

by Midnight Thunder ( 17205 ) writes: on Tuesday February 10, 2009 @12:13AM (#26793467) Homepage Journal

When you do work out what the root cause was, I am sure we would all like to find out what it was, so please post an update when you can.

- Re:Would like final analysis (Score:5, Funny)
  
  by Anonymous Coward writes: on Tuesday February 10, 2009 @12:34AM (#26793607)
  
  The problem was the system was HORKED, didn't you get that?
  
  - Re: (Score:2, Funny)
    
    by Linker3000 ( 626634 ) writes:
    
    Is that worse than B0rked?
    I thought the scale was:
    B0rked
    Horked
    F*cked
    Stuffed
    Iffy
    Working
    - Re: (Score:3, Funny)
      
      by Shay Guy ( 1466593 ) writes:
      
      Where does being Bork Bork Borked rank on that?
- Re:Would like final analysis (Score:5, Funny)
  
  by yanyan ( 302849 ) writes: on Tuesday February 10, 2009 @12:56AM (#26793757)
  
  The switches were running Windows 7 Starter Edition. http://tech.slashdot.org/article.pl?sid=09/02/09/1348255 [slashdot.org]
  
- Re:Would like final analysis (Score:5, Informative)
  
  by Precision ( 1410 ) * writes: on Tuesday February 10, 2009 @10:21AM (#26796909) Homepage
  
  I'll be sure to when I get to the data center next week and am able to get my hands on the angry switch in question. I do love how it just sat there quietly for two weeks w/o doing anything and then decided randomly to just start blasting out 20 Gbit.. sigh.. hardware..
  
  - Re:Would like final analysis (Score:5, Informative)
    
    by Cylix ( 55374 ) writes: on Tuesday February 10, 2009 @10:41AM (#26797131) Homepage Journal
    
    Failed ASIC on the switch most likely.
    I've see an issue just like that about once a year, but working with a sick number of systems globally the chances of seeing one offs becomes fairly regular.
    Depending on the failure it might have logged what it was doing, but I'll presume since your monitoring didn't catch the spike it was because it was just random garbage.
    Fun times!
    
And finally the question is answered: (Score:3, Funny)

by Anonymous Coward writes: on Tuesday February 10, 2009 @12:14AM (#26793469)

Who Slashdots the Slashdotters?

- Re:And finally the question is answered: (Score:5, Funny)
  
  by eosp ( 885380 ) writes: on Tuesday February 10, 2009 @01:26AM (#26793919) Homepage
  
  Quis slashdotiet ipsos slashdotes?
  
Things are bad... (Score:2, Insightful)

by spartacus_prime ( 861925 ) writes:

When even Slashdot gets slashdotted. Now if only we can make the Digg effect bury that site. For good.
This isn't the first time... (Score:5, Funny)

by narcberry ( 1328009 ) writes: on Tuesday February 10, 2009 @12:18AM (#26793489) Journal

First thing I'd do as Cyber Security Tzar would be to outlaw any network device that has the potential to become faulty.
We could've avoided this tragedy entirely.

- Re:This isn't the first time... (Score:5, Funny)
  
  by MBGMorden ( 803437 ) writes: on Tuesday February 10, 2009 @01:16AM (#26793875)
  
  Indeed. Studies show that you're far more likely to get hacked if you keep a computer in your home. Indeed it's often even a case where an attacker is able to wrest control of your own computer from you and use it against you.
  At the very minimum, given the elevated hazard potential to kids (over 90% of kids will suffer a computer accident before the age of 18), you should always keep your computers and networking equipment securely locked in separate compartments.
  I'm not going to go so far as you and call for an outright ban, but I think it's obvious that we need common-sense computer control laws put into place. In particular, we need to stop the widespread smuggling of these devices from across the borders of places such as Taiwan, Japan, and California, into our outer-city suburbs.
  
  - Re: (Score:3, Funny)
    
    by MightyYar ( 622222 ) writes:
    
    Couldn't we legislate the sale of a keyboard lock with every computer? Or maybe a smart computer that only responds to the hand of it's registered, legal owner.
and still no work done (Score:5, Insightful)

by qw0ntum ( 831414 ) writes: on Tuesday February 10, 2009 @12:22AM (#26793525) Journal

Even though /. was down, I still managed to not get any work done. Maybe it had something to do with the fact I kept rechecking to see if it were back up. Or maybe I should just stop blaming my laziness on external factors and just admit it is a personal problem: I would still find ways to not do work even without Slashdot! :P

Still having issues (Score:2)

by shaitand ( 626655 ) writes:

www.slashdot.org loads just fine but slashdot.org gives a 500 internal server error.
A tour of Slashdot... (Score:5, Funny)

by lymond01 ( 314120 ) writes: on Tuesday February 10, 2009 @12:58AM (#26793769)

The year is 2025.
Well, Ladies and Gentlemen, here you see what you may think is an archaic lot of old computers. You would be mistaken. These are Slashdot. No, no cause for alarm...and that door's locked anyway, you can't get out through there. The tour only goes forward. But I'm glad at the very least that you know what Slashdot is. Not was. IS.
It's a safeguard against...something. Something that was unleashed for 75 minutes in 2009 that crippled what was rumored to be the most robust public-facing cluster known. All we have left from that fateful day is the single post from the Slashdot network admin. Someone archived it, lucky us, because he was never seen after that day. I have a copy here, hardcopy of course -- no sense in taking risks so close to...well....
Here it is:
I fully believe the switches in that cabinet are still sitting there attempting to send 20Gbit/sec of traffic out trying to do something. I just don't know what yet.

- Re: (Score:2)
  
  by jd ( 1658 ) writes:
  
  *cue Holst's Mars* (Hey, we all know CmdrTaco is related to Professor Bernard Quatermass)
- Re:A tour of Slashdot... (Score:4, Interesting)
  
  by JWSmythe ( 446288 ) * writes: <jwsmythe@noSPam.jwsmythe.com> on Tuesday February 10, 2009 @03:11AM (#26794417) Homepage Journal
  
  Nah, I used to run one of the bigger, well know publically facing clusters [alexa.com]. It was ranked #300 by Alexa when I left over 2 years ago. What's happened since is their own fault. :)
  Actually, this wouldn't have downed that network. Every GigE circuit was individual to a city, or set of racks (depending on the site). There were no cross connects between them. Almost everything was designed so if we lost a city for any reason, it didn't hurt the site. We had connectivity outages, and even a couple brownouts that upset the power systems, but the sites were always accessible.
  Slashdot should not, under any circumstances, be hosted in one location. In my opinion, they should be at the largest continental and intercontinental peerings that they can be at.
  1 Wilshire, Los Angeles, CA - providing the west coast of the US, and the most substantial fiber links on the Pacific.
  111 8th Ave, New York, NY - providing the east coast of the US, and virtually all of the links to Europe.
  36 NE 2nd St, Miami, FL - providing the southeast US, redundancy for the Southeast US, and some fiber to Europe and S. America
  Redundant options.
  426 S LaSalle St, Chigaco, IL - providing good service to the East and West coast of the US
  55 S Market St, San Jose, CA - providing good service to the West coast of the US, and some trans-Pacific connectivity
  Some people really like Atlanta, Dallas, Houston, Las Vegas, Salt Lake City, and Vienna/Ashburn/Reston. I don't really suggest it, if you can have a presence in the better locations.
  There are some very nice global options too. I'm not sure how well the European networks have cleaned up. Several years ago, due to peering arrangements over there, most European traffic ended up going to New York and back to Europe, even though we were on one of the top Tier 1 providers. We ditched the site, and sent all of Europe to New York. Our users sent complements on our "new data center in Europe", since it was so fast. :) People like to complain, but rarely send complements. That was interesting. There are some great locations in Australia and Asia also, but ... well ... it's all in how much you want to spend.
  I know people in the Silicon Valley always scream when I suggest them as secondary, but if you've had a good look at all the major cities, you'd get over yourselves. Just because you live there, and there are expensive neighbors, it doesn't make you the center of the world.
  Slashcode would need some revamping to make work in this environment. There are lots of options there too.
  But, I'm not on the Slashdot IT team, so I don't get to make these decisions (or even give opinions).
  
  - Re: (Score:2)
    
    by techno-vampire ( 666512 ) writes:
    
    If it were me, I'd go for both California options. They're both near enough to the San Andreas Fault to be vulnerable to a major quake, but far enough apart that no one temblor would get both of them.
    - Re: (Score:2)
      
      by JWSmythe ( 446288 ) * writes:
      
      Nah, sometime later this year the big one will split California from Mexico through Oregon, and make the island state previously known as SansAngeles. :)
      Now, when will they get fiber run across the gap is another questions. :)
Is it possible.... (Score:5, Funny)

by GaryOlson ( 737642 ) writes: <slashdot@garyolso[ ]rg ['n.o' in gap]> on Tuesday February 10, 2009 @01:08AM (#26793821) Journal

...the problem down to a pair of switches...I fully believe the switches in that cabinet are still sitting there attempting to send 20Gbit/sec of traffic out trying to do something â" I just don't know what yet.
Is it possible the duplicate article generator tried to spawn, became entangled in its own potential well of duplicity, and now is trapped like two Lisp programmers deep inside their parenthesis?

- Re: (Score:2)
  
  by Hucko ( 998827 ) writes:
  
  they aren't trapped... they're building...
The world is coming to an end (Score:2, Funny)

by Tsagadai ( 922574 ) writes:

In Korea, only old people slashdot slashdot. The memes are funny. The insightful comments are insightful. The funny comments are funny, the trolls are trolls. Seems reseting slashdot fixed everything. The entire world is doomed!
The worst thing about this? (Score:5, Insightful)

by chrome ( 3506 ) writes: <chromeNO@SPAMstupendous.net> on Tuesday February 10, 2009 @01:20AM (#26793893) Homepage Journal

The worst thing about this? 5,000,000 people who think they know what happened, posting "helpful" suggestions or analysis
"The problem is definitely spanning tree!"
or
"Back in 1998, we were running these HP switches right, and ..."
or
"Did you try resetting the flanglewidget interface?!"
or
"I've seen this exact problem! You need to upgrade to v5.1!"
etc
Its not your network. It doesn't matter how much you think you know, you don't know the topology, or the systems involved. It'll be interesting to know what the ACTUAL reason was, when they figure it out. Assuming it isn't aliens.

- Re:The worst thing about this? (Score:5, Interesting)
  
  by XanC ( 644172 ) writes: on Tuesday February 10, 2009 @01:42AM (#26794007)
  
  ...Because if it's aliens, then it won't be interesting?
  
  - Re: (Score:2)
    
    by jd ( 1658 ) writes:
    
    Not really. Aliens log onto Slashdot a lot. The Timelords are the worst offenders, using the Matrix and a space/time inversion multiplexor to access the unused ports on the Slashdot switches directly.
    - Re: (Score:3, Funny)
      
      by Darth ( 29071 ) writes:
      
      this actually explains duplicate posts pretty well...
      The time lords, for a joke, take stories from slashdot, go back a day or two, and submit them. They get posted a few days early, but to avoid paradox, reality requires the "original" post to be made anyway. Thus we get double posts of stories.
      You all owe the slashdot editors an apology.
- Re:The worst thing about this? (Score:4, Interesting)
  
  by jd ( 1658 ) writes: <imipak@yaho[ ]om ['o.c' in gap]> on Tuesday February 10, 2009 @01:58AM (#26794087) Homepage Journal
  
  It's likely multicast-related, as that's where TFA states the problem was seen. There are only so many multicast issues you can have. True, we don't know the topology. True, we don't know the switch configuration. True, it's just as possible this is some sort of revenge by the Church of Scientology for all the Slashdot articles on them.
  However, some things seem more plausible than others. Since this was a spontaneous problem, hardware seems more suspect than software. If it is software (unlikely but possible), the only multicast protocol most switches use are the spanning-tree protocols.
  
Slashdotted (Score:5, Funny)

by Greyfox ( 87712 ) writes: on Tuesday February 10, 2009 @01:37AM (#26793983) Homepage Journal

Mirror [slashdot.org]

- Re: (Score:3, Funny)
  
  by therufus ( 677843 ) writes:
  
  It was a DoS; Denial of Slashdot!
turned off spanning tree protocol? (Score:5, Interesting)

by jamesh ( 87723 ) writes: on Tuesday February 10, 2009 @01:44AM (#26794023)

I fully believe the switches in that cabinet are still sitting there attempting to send 20Gbit/sec of traffic out trying to do something â" I just don't know what yet
We had something similar happen at a client site - a switch failed in a rack so we temporarily replaced it with an 8 port 'desktop' switch, and then a day later installed the proper replacement back in the rack. We didn't want any unnecessary downtime though so we linked them together and left instructions with the onsite guy to move all the connections from the desktop switch into the proper switch after hours. Which he did, including the cable that linked them together. The switch was in 'portfast' mode so any broadcast packet that got 'onto' the switch, stayed there :)

- Re: (Score:3, Funny)
  
  by powerlord ( 28156 ) writes:
  
  The switch was in 'portfast' mode so any broadcast packet that got 'onto' the switch, stayed there :)
  First rule of portfast mode:
  What ever happens in portfast mode, stays in portfast mode.
Skynet shmynet (Score:4, Funny)

by His Nastiness ( 542696 ) writes: on Tuesday February 10, 2009 @02:01AM (#26794103) Homepage

February 9th, 2009 8:55pm Slashdot becomes self-aware.

He could have fixed it in half the time (Score:5, Funny)

by Provocateur ( 133110 ) writes: <shedied@[ ]il.com ['gma' in gap]> on Tuesday February 10, 2009 @02:03AM (#26794109) Homepage

...were he not typing that long-a$$ summary. Twice as fast if he didn't have to spellcheck.
(j/k)
Which leads me to this question:
What do Slashdotter staff read to avoid doing work?

I don't really care, but... (Score:2)

by religious freak ( 1005821 ) writes:

Is this happening more often than it used to? I mean, it's tech and this is a non-paying site for most of us... it's going to break. But I swear, I remember we used to go over a year w/o seeing /. downtime, now it seems like it happens every few months.

Or have I just become more of a /. junkie than I used to be?
Mis-configured trunk ports can cause such an issue (Score:3, Informative)

by wtarreau ( 324106 ) writes: on Tuesday February 10, 2009 @02:55AM (#26794347) Homepage

This thing usually happens when two switches are attached with 2 (or more) trunked links ("etherchannel" in cisco terminology), and one of the switches has the trunk disabled on one of the ports (or someone moved the cable to another port during a diag). Thus the attachment becomes a loop. STP could take care of this, but it's common to disable it on access switches.

Seen That Once (Score:5, Interesting)

by maz2331 ( 1104901 ) writes: on Tuesday February 10, 2009 @03:40AM (#26794537)

A couple years ago, I had to troubleshoot a problem that was similar for a school district's network. Absolutely nothing could communicate.
I checked switches, routers, and servers for a while until I hooked a sniffer up, and still got bafflling results.
THEN I decided to go low-tech, and start disconnecting cables. That got me somewhere - certain backbone connections could be disconnected and traffic levels dropped to normal levels.
So, I hooked them back up, and went to the other end of the link, and started disconnecting things port by port until I found the problem.
It turned out to be an unauthorized little 4-port switch that had malfunctioned, and was spewing perfectly valid (as in, good CRC) packets to the LAN, but with random source MAC addresses.
THAT took down every switch in the network, as it required them to update their internal tables on a per-packet basis. The thing was actually not sending much data, but it was poisoning the switchs' internal tables. Not at the IP layer, but at the MAC layer.
When networking gear goes rogue, it can do really bad things to other connected equipment.
It's really hard to find the problem because every indication from every other piece of equipment is confusing. You almost always have to go to the backbone and disconnect entire segmets to find it.

Dogbert (Score:4, Funny)

by ciderVisor ( 1318765 ) writes: on Tuesday February 10, 2009 @06:14AM (#26795189)

...being out of CPU, the error message was actually something to do with multicast. As a precautionary measure I rebooted each core just to make sure it wasn't anything silly. After the cores came back online they instantly went back to 100% fabric CPU usage and started shedding connections again. So slowly I started going through all the switch ports on the cores, trying to isolate where the traffic was originating. The problem was all the cabinet switches were showing 10 Gbit/sec of traffic, making it very hard to isolate. Through the process of elimination I was finally able to isolate the problem down...
What did I say that sounded like "Tell me about your day at work" ?

- - Re: (Score:2)
    
    by SpaceLifeForm ( 228190 ) writes:
    
    But it should not happen, right?
    STP [wikipedia.org]
    The Spanning Tree Protocol is an OSI layer-2 protocol that ensures a loop-free topology for any bridged LAN.
    This would seem to be the clue:
    Luckily we don't have any machines deployed on [that row in that cabinet] yet so no machines are offline.
    No machines deployed == no machines are online
    There was no traffic there.
    - Re: (Score:3, Interesting)
      
      by JWSmythe ( 446288 ) * writes:
      
      Since no one would ever make the mistake of making a loop in a datacenter, it's fairly common to disable STP, among a few other things. It makes the time bringing a machine up on a port a bit quicker. On a Cisco, you're usually looking at 30 seconds. It'll bring it down to a fraction of a second.
      And it was (obviously) a big mistake.
      I leave it on in the datacenters. I can live with 30 seconds to bring the port up, if it means I'll never flood the whole network
      - Re: (Score:2, Insightful)
        
        by blosphere ( 614452 ) writes:
        
        You've considered using portfast on edge ports? :P You know, it's been there for awhile...
        
        Re: (Score:2)
        
        by JWSmythe ( 446288 ) * writes:
        
        :) I'm pretty sure that's what I do. I was lazy to log in and look though, and since I don't use it all the time, I don't know it off the top of my head....
        Ok, here's one of my desktop switch ports (we all have Catalyst switches on our desks, don't we?)
        interface FastEthernet0/9
        duplex full
        speed 100
        spanning-tree portfast
        There's a nice big warning on the Cisco site about it [cisco.com], which describes what they had...
        Caution: Never use the PortFast feature on swit
- Re:Slashdotted slashdot... (Score:5, Funny)
  
  by Inner_Child ( 946194 ) writes: on Tuesday February 10, 2009 @01:32AM (#26793963)
  
  I can see it now, a Michael Bay slasher/suspense flick (with explosions!) called Dupe. A group of teenagers decide to troll an online forum, but they quickly realize all is not as it seems when they discover a conspiracy to keep duplicate stories coming in order to increase advertising dollars masterminded by the evil genius Captain Burrito. Violence and hilarity ensue.
  And before anyone says this is a shitty plot... I *did* say Michael Bay.
  
- Re: (Score:3, Insightful)
  
  by jibjibjib ( 889679 ) writes:
  
  It sounds more like a network configuration accident or glitch than an attack. Besides, netsplits aren't incredibly unusual.
- - - - Re: (Score:3)
        
        by Achromatic1978 ( 916097 ) writes:
        
        you're trying for shittiest karma ever? what do you want to come back as?
        ... Twitter, maybe?

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Do you get the pink screen? (Score:4, Funny)

Re:Do you get the pink screen? (Score:5, Funny)

Re:Do you get the pink screen? (Score:5, Informative)

Re:Do you get the pink screen? (Score:5, Funny)

Wow, that sucks (Score:3, Interesting)

Re:Wow, that sucks (Score:4, Insightful)

Re:Wow, that sucks (Score:5, Funny)

Re:Wow, that sucks (Score:5, Funny)

Re:Wow, that sucks (Score:5, Informative)

Re:Wow, that sucks (Score:5, Informative)

Sometimes You Have To Be There (Score:5, Interesting)

Re:Sometimes You Have To Be There (Score:5, Informative)

Re:Sometimes You Have To Be There (Score:5, Insightful)

Re: (Score:3, Informative)

Re:Sometimes You Have To Be There (Score:5, Interesting)

Re:Sometimes You Have To Be There (Score:5, Interesting)

Re:Sometimes You Have To Be There (Score:5, Informative)

Re: (Score:3, Interesting)

Re:Sometimes You Have To Be There (Score:5, Funny)

Re:Sometimes You Have To Be There (Score:4, Funny)

Re: (Score:3, Funny)

Thanks for the information (Score:5, Funny)

Re:Thanks for the information (Score:5, Funny)

Re:Thanks for the information (Score:5, Funny)

Re:Thanks for the information (Score:5, Funny)

Re:Thanks for the information (Score:5, Informative)

In Soviet Russia (Score:5, Funny)

Re:In Soviet Russia (Score:5, Funny)

Re: (Score:2)

Re:In Soviet Russia (Score:5, Funny)

Re:In Soviet Russia (Score:5, Informative)

Re:In Soviet Russia (Score:5, Informative)

Re:In Soviet Russia (Score:4, Insightful)

Re:In Soviet Russia (Score:5, Funny)

Re:In Soviet Russia (Score:5, Funny)

A.I. (Score:5, Funny)

Re:A.I. (Score:5, Funny)

*Sniff* they grow up so fast! (Score:4, Funny)

Re:*Sniff* they grow up so fast! (Score:5, Insightful)

Re:*Sniff* they grow up so fast! (Score:4, Interesting)

Re:*Sniff* they grow up so fast! (Score:5, Informative)

Re: (Score:3, Informative)

Re:*Sniff* they grow up so fast! (Score:5, Interesting)

Re:*Sniff* they grow up so fast! (Score:5, Interesting)

Re:*Sniff* they grow up so fast! (Score:4, Interesting)

Re: (Score:3, Interesting)

Re: (Score:3, Informative)

Re: (Score:3, Interesting)

Re: (Score:2)

Re: (Score:2)

Re:*Sniff* they grow up so fast! (Score:4, Insightful)

Re: (Score:3, Funny)

Did you feed (Score:2)

On the plus side (Score:5, Funny)

Hork's been forked -- it's "borked"! (Score:3, Informative)

Would like final analysis (Score:5, Interesting)

Re:Would like final analysis (Score:5, Funny)

Re: (Score:2, Funny)

Re: (Score:3, Funny)

Re:Would like final analysis (Score:5, Funny)

Re:Would like final analysis (Score:5, Informative)

Re:Would like final analysis (Score:5, Informative)

And finally the question is answered: (Score:3, Funny)

Re:And finally the question is answered: (Score:5, Funny)

Things are bad... (Score:2, Insightful)

This isn't the first time... (Score:5, Funny)

Re:This isn't the first time... (Score:5, Funny)

Re: (Score:3, Funny)

and still no work done (Score:5, Insightful)

Still having issues (Score:2)

A tour of Slashdot... (Score:5, Funny)

Re: (Score:2)

Re:A tour of Slashdot... (Score:4, Interesting)

Re: (Score:2)

Re: (Score:2)

Is it possible.... (Score:5, Funny)

Re: (Score:2)

The world is coming to an end (Score:2, Funny)

The worst thing about this? (Score:5, Insightful)

Re:The worst thing about this? (Score:5, Interesting)

Sniff they grow up so fast! (Score:4, Funny)

Re:Sniff they grow up so fast! (Score:5, Insightful)

Re:Sniff they grow up so fast! (Score:4, Interesting)

Re:Sniff they grow up so fast! (Score:5, Informative)

Re:Sniff they grow up so fast! (Score:5, Interesting)

Re:Sniff they grow up so fast! (Score:5, Interesting)

Re:Sniff they grow up so fast! (Score:4, Interesting)

Re:Sniff they grow up so fast! (Score:4, Insightful)