Stories
Slash Boxes
Comments
typodupeerror delete not in

+-   Amazon explains why S3 went down-> on Saturday July 26 2008, @02:49AM Angostura

Submitted by Angostura on Saturday July 26 2008, @02:49AM
bug
Angostura writes "Amazon has provided a decent write-up of the problems that caused its S3 storage service to fail for around 8 hours last Sunday. It providers a timeline of events, the immediate action take to fix it (they pulled the big red switch) and what the company is doing to prevent re-occurrence. In summary: A random bit got flipped in one of the server state messages that the S3 machines continuously pass back and forth. There was no checksum on these messages, and the erroneous information was propagated across the cloud, causing so much inter-server chatter, that no customer work got done."
Link to Original Source
submission

This discussion was created for logged-in users only, but now has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
When we write programs that "learn", it turns out we do and they don't.