A Note On Thursday's Downtime 75

Posted by Soulskill on Sunday July 19, 2015 @12:25AM from the slashdotting-ourselves dept.

If you were browsing the site on Thursday, you may have noticed that we went static for a big chunk of the day. A few of you asked what the deal was, so here's quick follow-up. The short version is that a storage fault led to significant filesystem corruption, and we had to restore a massive amount of data from backups. There's a post at the SourceForge blog going into a bit more detail, and describing the steps our Siteops team took (and is still taking) to restore service. (Slashdot and SourceForge share a corporate overlord, as well as a fair bit of infrastructure.)

A Note On Thursday's Downtime

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 75 Comments Log In/Create an Account

Comments Filter:

oh okay (Score:5, Insightful)

by Anonymous Coward writes: on Sunday July 19, 2015 @12:30AM (#50138189)

oh, I thought some of that shitware they sling got loose and bit them in the ass

- Re:NSA backup (Score:5, Funny)
  
  by Tablizer ( 95088 ) writes: on Sunday July 19, 2015 @02:11AM (#50138511) Journal
  
  Isn't NSA world's most famous backup solution?
  
  Yes, but they have to kill you after they restore your data.
  Try Chinese gov't instead.
  
  - Re: (Score:2)
    
    by war4peace ( 1628283 ) writes:
    
    Yup, they kill you BEFORE they restore your data.
- Re: (Score:2)
  
  by sound+vision ( 884283 ) writes:
  
  Adjectives are words though.
  - Re: (Score:1)
    
    by roman_mir ( 125474 ) writes:
    
    Not once I am done with them!
- Re:Less Trolling (Score:5, Funny)
  
  by Demonoid-Penguin ( 1669014 ) writes: on Sunday July 19, 2015 @12:54AM (#50138283) Homepage
  
  Lately I have noticed quite a fair bit less of the typical trolls.
  An increase in numbers has resulted in an increase in their diversity - plus there's so little time when they've so many places to go in their desperate battle for attention. The troll union proposed hot-bedding and time-sharing but, for obvious reason, were unable to get the propositions ratified.
  Noted troll think tank UnderTheBridgeWatch, recently published a report predicting that the recent balkanization of troll unions due to a large number of them getting married (the others just want to sleep around) under the new gay marriage laws will further increase the diversity of their appearance. Because unlike humans, trolls of the same gender do produce offspring.
  
  - Re: Less Trolling (Score:3)
    
    by mnemotronic ( 586021 ) writes:
    
    Personally, Ive off-shored & out-sourced a majority of my trolling, passive-aggressive self-rightous diatribes and compensated product endorsements. This leaves more time for, well, pr0n. Gotta have a life sometime ya know.
  - Re: (Score:2)
    
    by sound+vision ( 884283 ) writes:
    
    This is sad news. I was starting to miss the GNAA and goatse.
    - Re: (Score:2)
      
      by Demonoid-Penguin ( 1669014 ) writes:
      
      This is sad news. I was starting to miss the GNAA and goatse.
      Take heart, there is always gonorrhoea and syphilis. And Foxx News.
  - Re: (Score:2)
    
    by tepples ( 727027 ) writes:
    
    Because unlike humans, trolls of the same gender do produce offspring.
    How does that work biologically?
    - Re: (Score:2)
      
      by Demonoid-Penguin ( 1669014 ) writes:
      
      Because unlike humans, trolls of the same gender do produce offspring.
      How does that work biologically?
      Like flies on shit.
      Note that there is only one gender of trolls, and yet they increase in number. Proof!
      Witling faecus the rightfully endangered, potty-mouth troll.
      Habitat: under bridges, in sewage systems, amongst the cruft of computer systems. Dark places close to humans.
      Weaknesses: Humour, facts and sunlight. Humour causes them pain. Exposure to sustained Facts or strong Light are fatal.
      Appearance: Various aberrations. Recognisable by the unique pin-like growth on their neck - the only visible feature that reliably distinguishes them from sentient bipods.
      No pictures exist due to their sensitivity to light of any form Artists impression [regmedia.co.uk]
      History: A hydrogen-sulphide based life-form, possibly originating from the interaction of decomposing faeces and swamp gas. Whether they in fact qualify as a life form, or possess sentient capabilities is uncertain. It's theorised that they appeared when the first ancestors of humans developed intelligence, and that trolls have been devolving ever since. The theory is much debated and purely hypothetical as the only historical records are in the form of ancient legends due to the lack of fossil record. They have no backbone and upon death leave only a nasty stain and a foul odour.
      Biology: Their "closed-loop" digestive system allows them to survive their entire life eating only their own excrement. As their brain is composed of only two cells (neither of which function) they are unable to support any distro other than Windows - and even then, only the sliding kind.
      Additional references: Troll study [theregister.co.uk], Suler, J.R. and Phillips, W. (1998). Deviant Behavior in Multimedia Chat Communities. [rider.edu]
Sourceforge Badware risks ? (Score:5, Informative)

by nickweller ( 4108905 ) writes: on Sunday July 19, 2015 @12:41AM (#50138233)

Sourceforge is Badware risks .. http://i.imgur.com/Hhtgv0H.png [imgur.com]

- Re: (Score:2, Informative)
  
  by Anonymous Coward writes:
  
  Sourceforge use to be great but has been serving crapware for a couple of years. You'd have to be off your rocker to use it if you have any choice, either as an author or an end user.,
  https://forum.filezilla-project.org/viewtopic.php?t=30240&start=90
  http://www.theregister.co.uk/2015/06/03/sourceforge_to_offer_only_optin_adware_after_gimp_grump/
While you're at it, add some modern features (Score:5, Interesting)

by the_humeister ( 922869 ) writes: on Sunday July 19, 2015 @12:49AM (#50138259)

like unicode support and ipv6.

- - Re:While you're at it, add some modern features (Score:5, Funny)
    
    by dmomo ( 256005 ) writes: on Sunday July 19, 2015 @01:50AM (#50138463)
    
    Unicode, shcmunicode... they're adding REAL features, like social media icons, polls disguised as articles, hawt new barfy web-2-oh skinz, and the sexy removal of the "read more" link. Because BUZZFEED!
    
- It allowed spoofing comment scores (5:erocS) (Score:2)
  
  by tepples ( 727027 ) writes:
  
  They tried Unicode before. It allowed spoofing comment scores [slashdot.org]. SoylentNews claims to support Unicode; I wonder how it prevents spoofing comment scores.
At least it wasn't Beta (Score:5, Insightful)

by arglebargle_xiv ( 2212710 ) writes: on Sunday July 19, 2015 @12:51AM (#50138275)

Could have been far worse...

- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  Beta is coming. ...basically the meeting went like this:
  "what do you mean they didn't like the change to beta?"
  "fools, they don't know what's good for them"
  "I know, the idiot users are like frogs. We can boil them slowly. Let's start making all of the beta changes gradually over 6-12 months."
  "Genius! That'll show them, they won't even notice that we've changed anything at all"
  "Raises all round?"
  "Sounds good to me chaps!" ...etc...etc... beta is coming whether you like it or not.
- Re: (Score:1)
  
  by Tablizer ( 95088 ) writes:
  
  If it were Beta, we wouldn't know the difference.
A Blazing Storage moment ... (Score:3)

by cold fjord ( 826450 ) writes: on Sunday July 19, 2015 @12:57AM (#50138295)

All right! Nobody moves, or the storage gets it! .... Help me! Help me! .... Shut down! ..... Won't somebody help that bad drive?!
The reboot is near.

- Re: (Score:2)
  
  by sumdumass ( 711423 ) writes:
  
  You kind of butchered it with the reboot is near being after the fact but otherwise excellent job.
  BTW, I think that is one of the best movies ever. Lots of people don't realize what it was making fun of though.
  - Re: (Score:2)
    
    by JustOK ( 667959 ) writes:
    
    It's like twitter, have to read bottom to top.
  - Re: A Blazing Storage moment ... (Score:2)
    
    by mnemotronic ( 586021 ) writes:
    
    Mongo LIKE.
Might be some cleanup still needed (Score:3)

by cold fjord ( 826450 ) writes: on Sunday July 19, 2015 @01:04AM (#50138323)

I clicked on a "firehose" link and the most recent story was "YouTube's ready to select a winner" from March 2013.
But the "help us select the next story" link was ok, as was directly entering Slashdot.org/recent.
Good luck with the restore / clean up / troubleshooting. That's not a fun way to spend a weekend.

Future of free hosting at Sourceforge? (Score:3, Interesting)

by decaffeinated ( 70626 ) writes: on Sunday July 19, 2015 @01:04AM (#50138325)

Serious question: Just out of curiosity, who pays the bills for all of the infrastructure that keeps Sourceforge running?
Hardware isn't free and employees aren't free. I seriously don't understand how Sourceforge has kept the lights on all these years.
And by the way, I'm a very satisfied user of their services. But I do worry about their future.

- Re:Future of free hosting at Sourceforge? (Score:5, Informative)
  
  by fred911 ( 83970 ) writes: on Sunday July 19, 2015 @01:13AM (#50138351) Journal
  
  I think ever since the bundleware fiasco the revenue is generated by ads.
  https://en.wikipedia.org/wiki/... [wikipedia.org]
  
  - Re: (Score:1)
    
    by decaffeinated ( 70626 ) writes:
    
    Thanks. That's not chump change.
Thank you. (Score:5, Insightful)

by Etherwalk ( 681268 ) writes: on Sunday July 19, 2015 @01:31AM (#50138401)

Thank you to the Slashdot team. Bringing systems back up like that is emergency-mode-fun, but a lot of work, and we appreciate it.

- Re: (Score:2)
  
  by mlts ( 1038732 ) writes:
  
  Have to agree here. Lot of people appreciate /. being up and going.
  One can armchair quarterback and talk about how corruption wouldn't happen with this filesystem or this SAN, but corruption and problems happen no matter what the platform.
- - Re: (Score:2)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
- Re: (Score:2)
  
  by cerberusss ( 660701 ) writes:
  
  Amen. I've been visiting this site around user ID 110.000 or so, and I've actually never experienced a full blackout. Static version every now and then, that's all.
But it was webscale (Score:2)

by Billly Gates ( 198444 ) writes:

It was fast as hell!
here [youtube.com]
Timing (Score:1)

by Tablizer ( 95088 ) writes:

And right before the Pluto flyby.
Seriously, though, imagine the thoughts going through NASA minds when the probe crapped out a week before the big encounter. Their toilets must have been full of bricks.
It's not like rover problems where you can continue where you left off after you fix it. New Horizons couldn't stop.
Comment removed (Score:5, Interesting)

by account_deleted ( 4530225 ) writes: on Sunday July 19, 2015 @02:56AM (#50138603)

Comment removed based on user account deletion

- Re: (Score:3)
  
  by swb ( 14022 ) writes:
  
  The blog post was pretty content free about what exactly went wrong.
  I would have guessed they would have the functional ability to either restore a storage snapshot to get back an entire LUN or a VM from a VM-based backup, and maybe they did.
  - Re:Cause?? (Score:5, Funny)
    
    by danomac ( 1032160 ) writes: on Sunday July 19, 2015 @05:04PM (#50141309)
    
    Maybe somebody from Dice downloaded something from Sourceforge and installed it on their servers?
    
ZFS (Score:2)

by darkain ( 749283 ) writes:

Serious question: how much of this could have either had been prevented, or restored much more quickly if they were using ZFS with proper parity, checksuming, snapshotting, and sending (backups)? This really is the one-size-fits-all storage solution at this point.
- Re: (Score:3)
  
  by Harlequin80 ( 1671040 ) writes:
  
  Kinda depends on the failure. If your raid controller decides to die in a spasmodic on off on off way you can easily corrupt all your file systems in one go, zfs or otherwise. At that point if you didn't have redundant live storage pools it gets harder.
  Or of course there is the issue where someone does something stupid, like deleting files from live machines without thinking about what they are.
  - Re: (Score:1)
    
    by Anonymous Coward writes:
    
    Do you have any idea how ZFS works? Since ZFS is copy-on-write, you cannot corrupt already written data, unless your controller writes completely unrelated blocks or some crazy shit like that which I've personally never seen before.
    Also, a good setup separates the redundancy domains into separate hardware, i.e. if you run RAID10, no two disks of a mirror live on the same controller, for example.
    Deleting files is trivially defeated by regular snapshots.
    The best thing about ZFS: You always know the state of y
    - Re: (Score:2)
      
      by Harlequin80 ( 1671040 ) writes:
      
      Given I have seen sysadmin delete the backups to free up space you cannot always handle stupidity.
      And seriously? You cannot corrupt already written data? WTF. ZFS has a whole system built into it to periodically check if data has corrupted once on the disk. Its called scrub. Do you think they would have gone to a huge load of effort if no on disk corruption ever happened?!?!?
      ZFS is very good at ensuring that there has been no "in transit" corruption by doing a crc check of the written file before removi
      - That rebuild though! (Score:3)
        
        by Gazzonyx ( 982402 ) writes:
        
        I say this as someone that runs ZFS on his backup/file server; if you do have to restore or resilver it can take a long while! A single slow drive in a vdev will limit the entire pool's IO (the extent of which is entirely dependent on topology, but the weakest link always crushes you in ZFS). After a handful of TB of data, even with a pool of mirrored vdevs and a flash cache device, the resilver for a single drive can take a day unless you've got some serious spindle count at high RPMs. Even SAS drives d
        
        Re: (Score:2)
        
        by Harlequin80 ( 1671040 ) writes:
        
        It gets orders of magnitudes worse if you have two vdevs joined together in a single pool. I have 5 x 1.5g and 5 x 2g in a joint pool and I lost a 1.5. The re silvering process was days.
Ceph block devices (Score:2)

by TheRealHocusLocus ( 2319802 ) writes:

[... [sourceforge.net]] This incident impacted all block devices on our Ceph cluster.

Power/communications/routing down event? Was monitor quorum lost? Inquiring minds that are not trolls are curious and grateful that the path to restoration was clear. Best wishes.
No expert (Score:2)

by ChrisMaple ( 607946 ) writes:

I've negligible experience in this sort of failure and recovery, but...
Shouldn't slashdot and sourceforge be entirely separate, so that the failure of one can't bring down the other?
Shouldn't there be live redundant systems, so that when one fails, one of the redundant systems is switched online in minutes? I don't mean just redundant storage, but 3 or 4 systems running concurrently, taking the same input and monitoring to confirm that the output is the same.
Is this too expensive or not technically feasib
Obligatory bogachev (Score:2)

by mnemotronic ( 586021 ) writes:

Your important files encryption produced on this computer: photos, videos, documents, etc. Here is a complete list of encrypted files, and you can personally verify this.

Encryption was produced using a unique public key RSA-2048 generated for this computer. To decrypt files you need to obtain a private key. The single copy of the private key, which will allow you to decrypt the files, located on a secret server on the Internet; the server will destroy the key after a time specified in this window. After
Software or Hardware failure? (Score:2)

by RedMage ( 136286 ) writes:

"Storage corruption" is fairly vague. I've been bit by it in the past - once due to a vendor software bug (Oracle block corruption), and once due to hardware (flaky storage controller chip writing garbage (Supermicro MB)) I would like to hear more about the root cause.
RM
Great the /. could notify users (Score:3)

by Streetlight ( 1102081 ) writes: on Sunday July 19, 2015 @11:24AM (#50139795) Journal

It looks like /. had a Plan B ready in the case of a catastrophic failure. For some sites one just gets a blank page with some strange message when that happens. /. did the right thing letting users know they had a problem and were working on it and then let us know a bit about what happened. Thanks, /. techs.

Damn you, Slashdot! (Score:2)

by __aaclcg7560 ( 824291 ) writes:

On the same Thursday that Slashdot experienced data storage corruption, the 1TB hard drive on my Windows gaming PC crashed, reporting 4GB of free space available and unresponsive to IO block commands. (I've seen that behavior on USB sticks, but never on a hard drive.) Except for several years of email, all my data was on the file server. Oh, well. I got a good excuse to rebuild my eight-year-old PC, especially with Windows 10 around the corner. Meanwhile, I'm using a $250 Dell laptop for everything except g
Disclaimer (Score:2)

by RyoShin ( 610051 ) writes:

(Slashdot and SourceForge share a corporate overlord, as well as a fair bit of infrastructure.)

Nice to see that blurb of text again. Can we get this to happen every time you post a Nerval's Lobster/Dice slashvertisement, too?

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

oh okay (Score:5, Insightful)

Re:NSA backup (Score:5, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re:Less Trolling (Score:5, Funny)

Re: Less Trolling (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Sourceforge Badware risks ? (Score:5, Informative)

Re: (Score:2, Informative)

While you're at it, add some modern features (Score:5, Interesting)

Re:While you're at it, add some modern features (Score:5, Funny)

It allowed spoofing comment scores (5:erocS) (Score:2)

At least it wasn't Beta (Score:5, Insightful)

Re: (Score:1)

Re: (Score:1)

A Blazing Storage moment ... (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: A Blazing Storage moment ... (Score:2)

Might be some cleanup still needed (Score:3)

Future of free hosting at Sourceforge? (Score:3, Interesting)

Re:Future of free hosting at Sourceforge? (Score:5, Informative)

Re: (Score:1)

Thank you. (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

But it was webscale (Score:2)

Timing (Score:1)

Comment removed (Score:5, Interesting)

Re: (Score:3)

Re:Cause?? (Score:5, Funny)

ZFS (Score:2)

Re: (Score:3)

Re: (Score:1)

Re: (Score:2)

That rebuild though! (Score:3)

Re: (Score:2)

Ceph block devices (Score:2)

No expert (Score:2)

Obligatory bogachev (Score:2)

Software or Hardware failure? (Score:2)

Great the /. could notify users (Score:3)

Damn you, Slashdot! (Score:2)

Disclaimer (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals