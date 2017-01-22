Follow Slashdot stories on Twitter

 


The 32-Bit Dog Ate 16 Million Kids' CS Homework (code.org) 23

Posted by EditorDavid from the blaming-the-Cloud dept.
"Any student progress from 9:19 to 10:33 a.m. on Friday was not saved..." explained the embarrassed CTO of the educational non-profit Code.org, "and unfortunately cannot be recovered." Slashdot reader theodp writes: Code.org CTO Jeremy Stone gave the kids an impromptu lesson on the powers of two with his explanation of why The Cloud ate their homework. "The way we store student coding activity is in a table that until today had a 32-bit index... The database table could only store 4 billion rows of coding activity information [and] we didn't realize we were running up to the limit, and the table got full. We have now made a new student activity table that is storing progress by students. With the new table, we are switching to a 64-bit index which will hold up to 18 quintillion rows of information.
The issue also took the site offline, temporarily making the work of 16 million K-12 students who have used the nonprofit's Code Studio disappear. "On the plus side, this new table will be able to store student coding information for millions of years," explains the site's CTO. But besides Friday's missing saves, "On the down side, until we've moved everything over to the new table, some students' code from before today may temporarily not appear, so please be patient with us as we fix it."

  • At least there was a back-up... Or not... Not even a 24-hour transaction log... Or not... Way to go code.org... set that example...

    • Re: (Score:2)

      by halivar ( 535827 )

      How do you back up data that was never stored? Or logs for transactions that never completed? And how, even if you had those transactions, would you meaningfully restore them when the restoration process itself would simply repeat the result of overflowing the available indexes?

      This isn't a typical disaster recovery scenario. The architecture itself is at fault, and the data is lost.

      • Uh, what kind of data are you talking about that was never stored?
        The obvious thing is to restore at the most recent backup. Some data will be lost of course, but that's better than losing all data, which apparently these people did.

        • Re: (Score:1)

          by Anonymous Coward

          They didn't lose all data. The lost every every insert into a table the occurred after its index reached it's maximum value. As the database insert was the method of storing the data, there's nothing to recover.

        • Some data will be lost of course, but that's better than losing all data, which apparently these people did.

          No they didn't.

          Any student progress from 9:19 to 10:33 a.m

          So a grand total of about 74 minutes' work was lost.

  • Don't trust the cloud as the only place you store your work.

    • Re: (Score:1)

      by Tablizer ( 95088 )

      Don't trust the cloud as the only place you store your work.

      A generalized version is don't trust any one system. Put copies on different servers/devices.

      Of course there's a break-even point where the labor to manage backups exceeds that lost on average to failures.

  • We will never learn (Score:3, Funny)

    by Xarin ( 320264 ) on Sunday January 22, 2017 @02:58PM (#53716483)

    4 billion rows of coding activity is all we will ever need

  • It is no surprise to me that the ones creating and operating this platform are just as incompetent as the "graduates" they produce. Mediocrity breeds mediocrity...

  • "The way we store student coding activity is in a table that until today had a 32-bit index... The database table could only store 4 billion rows of coding activity information

    if it can only store four billion rows, it isnt "the cloud." its just a KVM instance running on a shared hosting facility then, isnt it.

    we didn't realize we were running up to the limit, and the table got full.

    so not only were you incapable of scaling your infrastructure or your program to handle four billion rows --something every sysadmin on the planet is capable of-- you weren't even competent enough to set up monitoring for it.

    We have now made a new student activity table that is storing progress by students.

    the ones that lost all their data dont care. the students will leave to try something else, the educators will fall back on lesson plans that werent

  • Thank you for teaching the kids the importance of taking responsibility and being honest and open about your mistakes. It's okay to make mistakes as long as we learn from them. Too many people today are afraid of making mistakes and cover them up.
    • I find people like anecdotes here so please allow me to add: I was raised by very "tough" parents with a very "tough" form of discipline. Mistakes meant punishment. Today I have a 9 year old daughter who, like any other human being, makes mistakes. A few years ago I noticed a very strange phenomena with regards to "dealing" with her mistakes". When I would get upset with her and punish her for spilling on the couch or forgetting to clean her room I would see her make it again and as time went on she would

  • Seriously, was not a single developer or architect from Code.org around when Slashdot overflowed its 24-bit index? I know it has been a few years now, but I'm sure there are folks here who remember threading breaking and all other sorts of problems when it happened. Remember: https://slashdot.org/story/06/11/09/1534204/slashdot-posting-bug-infuriates-haggard-admins [slashdot.org]

    Granted, that was Slashdot, and while annoying, it was hardly the end of the world This problem with Code.org clearly reinforces "cloud

  • Well duh (Score:3)

    by cyber-vandal ( 148830 ) on Sunday January 22, 2017 @03:12PM (#53716549) Homepage

    It's code.org not databasedesign.org

  • I admit, I've mostly done it for speed purposes, but my understanding is that the record limit is per partition, so you could also use it to deal with record limits.

    They could either partition based on user IDs (might be faster to select by for the bulk of the queries), or by date (making it easier to manage autonumber fields).

