Angostura writes "Amazon has provided a decent write-up of the problems that caused its S3 storage service to fail for around 8 hours last Sunday. It providers a timeline of events, the immediate action take to fix it (they pulled the big red switch) and what the company is doing to prevent re-occurrence. In summary: A random bit got flipped in one of the server state messages that the S3 machines continuously pass back and forth. There was no checksum on these messages, and the erroneous information was propagated across the cloud, causing so much inter-server chatter that no customer work got done."
Attend or create a Slashdot 20th anniversary party! DEAL: For $25 - Add A Second Phone Number To Your Smartphone for life! Use promo code SLASHDOT25. Check out the new SourceForge HTML5 Internet speed test. ×