Afterlife Will Be Costly For Digital Films 395
Andy Updegrove writes "For a few years now we've been reading about the urgency of adopting open document formats to preserve written records. Now, a 74-page report from the Academy of Motion Picture Arts and Sciences warns that digital films are as vulnerable to loss as digitized documents, but vastly more expensive to preserve — as much as $208,569 per year. The reasons are the same for video as for documents: magnetic media degrade quickly, and formats continue to be created and abandoned. If this sounds familiar and worrisome, it should. We are rushing pell-mell into a future where we only focus on the exciting benefits of new technologies without considering the qualities of older technologies that are equally important — such as ease of preservation — that may be lost or fatally compromised when we migrate to a new whiz-bang technology." Here's a registration-free link for the NYTimes article cited in Andy's post.
Re:Why? (Score:3, Interesting)
Is it really that hard to solve? (Score:5, Interesting)
And yet (Score:3, Interesting)
Back in 90/91, I worked for a company that did burning of CDs and Laserdisc (compressed data for the DOD). The CDs cost something like 5 or 10 each, and the laserdiscs were a couple of hundred each. IIRC, These were based on gold, and would last something like 50 or 100 years without losing a single pixel. I would guess that hollywood could easily afford these.
Re:Easy solution (Score:3, Interesting)
How about printing a few copies of a binary bar-code record in big books of archival quality paper for terms of a few centuries? Or how about blowing the bit pattern into any other format with some longevity on some nice passive substrate like a non-flowing glass if you'd like to keep them for a few millennia? Two hundred plus grand a year per film to maintain, my aching ass. Give me two million bucks - the supposed cost to archive just ten films - and I *guarantee or your money back* that I can design (and build a prototype) archive system that will reliably maintain digital films such that they can be recovered many centuries from now with no more "yearly archival cost per film" than a roof over its digital head. Error correction and all. All this story demonstrates is that someone isn't taking proper advantage of the technical community.
Re:Linus has already solved this problem (Score:3, Interesting)
Re:Is it really that hard to solve? (Score:3, Interesting)
The problem isn't necessarily the medium of storage itself, its the whole of how the information is encoded. After awhile, the machinery and knowledge of the format will be lost.
With normal film, hold it up to a light, the image is there. Suppose that in 200 years someone wants to play back the film - even if such a machine did not exist, it would be easy to construct.
I recall reading a similar problem nasa ran into... they wanted to resurrect some data from very early rocket launches and move it to a new medium for historical preservation. The data was recorded onto a magnetic tape by an early computer, however all the machines that could read the tapes were long gone. Eventually they found a non-working machine in the basement of the smithsonian, and brought a couple guys in their 80s out of retirement to fix and run the thing. They were the only ones who remembered how it worked and how the data was structured.
We run into the same problem today with digital file formats and storage media. Even if the DVD survives hundreds of years... there won't be any working machines to play it, and nobody will be around who understands the format and how to turn it from microscopic divots into meaningful information.... unless we figure something out.
The problem is older and more extensive (Score:5, Interesting)
Technicolor dye transfer (imbibition) prints were much less fugitive. Color separations onto black and white film stock (often termed YCM for yellow, cyan, magenta) are much more robust. Production of these separations (and imbitition relief "matrix" films) was intrinsic to the Technicolor printing process (even if the film was shot in conventional tripack negative, then transferred to Technicolor for printing), and films where these intermediates were saved (or where someone presciently thought to have a set of YCMs made), are much safer for the future than anything kept only on color stock.
In the 70s there were some photo places (especially in Los Angeles) that marketed Eastman Color Negative 5247 movie film (short-end remnants from the movie industry) as a cheaper alternative for 35mm color negative still photography, and printed this onto 5283 color print film (same as movie prints) for 35mm slides.
I recently found a few boxes of these that I had shot back then (and stored under entirely careless, or Arrhenius/Murphy if you prefer, conditions). I am not good at evaluating color negatives by eye, but the positives were faded either to mutated colors or to almost nothing.
Even simple technologies can have amazingly short shelf lives under conditions of disuse. I recently turned on my stereo system after close to 3 years of not being used. The amplifier, CD player, and LP turntable all failed to operate. Part of this might have been due to de-formed electrolytic capacitors; these appear to have more-or-less repaired themselves after a couple of hours with the power turned on. Both the CD player and the turntable suffered additional electromechanical problems that required a combination of manual exercise and cleaning to rectify.
None of these devices have anywhere near the scary sophistication of a modern hard disk drive.
Seeing as I cannot remember what I last set my external firewall password to, imagine the additional challenge of future Hollywood being bitten deeply in the butt by present Hollywood's favored time-bombed destined-to-be-lost-art proprietary DRM technologies, with the keys long since dissipated in Hollywood's perennial miasma of mergers, acquisitions, lawsuits, cocaine, and personal vendettas.
Re:$208,569 (Score:5, Interesting)
My calculator says a 2 hour movie at 24 frames/sec will have about 175,000 frames.
A few more button presses tell me that's a bit north of 6 terabytes of data.
Let's quadruple that to include all the cut scenes and unused footage, to 25 terabytes.
TB drives are available now for $400 or so each. They use under 10 watts idle.
Building a 30 drive RAID would thus cost $12,000, and require perhaps 500 watts if run constantly, including cooling. Let's bump that to $15,000 to pay for controllers and chassis.
Three such arrays (in case of earthquakes, etc... keep 'em at opposite ends of the continent) would cost an initial $45,000, take up perhaps 7u of rack space, and need 50 kWh per day for all three. At 30 cents per kWh, that's 15 bucks a day, or $5500 per year. Let's double that, assuming those 7u cost you $5500 a year.
So... my numbers, triply redundant, come to an initial investment of $60,000 (profit, hey!), and a yearly cost of $20,000 (more profit!).
How the hell they came up with $208k is beyond me. I'm thinking I should start a company that does this for the studios, it's looking quite lucrative.
Re:So pretty much ... (Score:3, Interesting)
It is a pretty simple problem to solve. You set up a smallish data centre on three continents. You install some LTO4 tape libraries and start replicating the data to each over the internet. With LTO4 you are looking at ~600TB per 19" rack, and when you are not accessing the data (most of the time) you are not consuming power. Add in some checksumming and patrol checking of the tapes and problem sorted. In 5,10 years time you migrate to some new tape tech. That involves sticking some more frames in, hooking them up and telling the software to copy the data to the new tapes.
Remember as well this is a high assurance system not a high availability system, so some of the expense of a datacentre can be saved. No need for that diesel generator for example because it does not really matter if you cannot access the data today because of a power cut. What matters is that it is preserved and when the power returns you can access it.
Quantity vs quality (Score:3, Interesting)
I'm not convinced we need to keep 90+% of youtube or Friends and similar crap for people to watch 100 years from now.
Re:Why? (Score:1, Interesting)
[rant ON]
Also, grandparent isn't seeing the big picture, but I'll assume it was a genuine question, as most people Just Don't Need To Know this stuff. How much does a piece of paper cost? Barring external damage and extremes of pH (like newsprint), that piece of paper and the information stored on it (like say, oh, a Constitution) is good to go for a few hundred years, maybe shy of a thousand if it was hand made without chemicals at all.
I need three layers of technology just to spin up and read data from a 4 gigabyte IDE platter drive I bought 8 years ago, and that's just to access 8 year old pr0n!
Back to the topic, seeing as games like Doom3 involved terabytes of data for development, a digital motion picture like Star Wars with 4-5 hours of raw footage and god knows how many terabytes of ILM effects... well I can't really count that high, but RTFA for an idea of the level of complexity "born digital" masters involve. Do you really think they're going to "throw away the source code" and just keep the neat and tidy digital master? How will he make Han shoot first again, huh? Costs triple when you have to deconstruct it first!
Okay, here's an example of how it was in 2000. [berkeley.edu] Now, extrapolate data size and storage size and content creation for eight years... hmm, do you think there's a ceiling? What about the next eight years? Is that a logarithmic curve jumping off my page?
Photography was invented 150 years ago, and we still have the first physical photographs ever taken. They may be boring, but I will personally kick the ass of anyone that says they're not important.
[rant OFF]
Bottom line is, technology and content is changing and growing so fast, we no longer have TIME to decide or realize what of it is actually important, or the BUDGET to actually save it all indefinitely.
Re:Easy solution (Score:3, Interesting)
No. The trick here is only half archival; the other half - and it's not complex, just apparently not obvious - is that it should take any half-competent tech no more than a day or so to rig up a reader using discrete components of current technology, the task having intentionally made simple. An optical diode, resistors, a transistor, maybe a lens system and an XY table. Not "drives" and metaconstructs like them. This way, the components can be emulated if required (doubtful, but possible) by higher technology. The format needs to be blind-dumb-simple, as does the error correction; row-column EC will allow recovery of single lost datums and is trivial to implement. If it is easy to do today, it will be easy to do tomorrow. Once that is done, you can construct as sophisticated a reader as you like, all the while knowing that if worst comes to worst, some half-smart high schooler can recover the data given enough time and $100 in parts.
You misunderstood my guarantee, too; I was guaranteeing that I could get the job done and archive, and recover, a movie in this fashion, making a maintainance free storage method that did not suffer from unrecoverability. I was not guaranteeing the data; they have to provide physical security for it, and I have no control over that, so I couldn't possibly make any promises in that area. I *could* sell them some land in Montana; I just bought two city lots and the 5000 sq ft building on them for 25 grand. Taxes are low, too. ;-) There's plenty more where that came from - hundreds and hundreds of square miles. Thousands, even. Storage space isn't a problem unless they insist it be in LA, which - of course - would be stupid. It should be in a geologically stable area with a high speed pipe and reliable power, that's all.
Re:Capitalism to the rescue. (Score:3, Interesting)
Important Enough to Copy (Score:3, Interesting)
Huge amounts of fundamental culture simply disappears because it is so transparent or ordinary to those it affects. The next generation comes along and they forget about it because of that apparent mediocracy. For example, breast feeding was normal, ordinary, and public in America up through the 1950's. Movie and later Television rule-makers didn't allow showing it unless it was part of some National Geographic type presentation. Today, breast feeding is being re-discovered in a storm of controversy because an entire generation has not only forgotten, but confused the topic with beer commercials.
Then again, how many people want to remember Phillippine Midget Snuff films? And why?
Re:Easy solution (Score:3, Interesting)
I'd imagine the big G would fall over themselves to do it. And it would cost the movie industry zilch.
Re:$208,569 (Score:3, Interesting)
A stack of archival CD-R or DVD-R, or actually pressing a master would let you hold the digital data for a few hundred years quite reliably. Just has a FORMAT.TXT on there to describe the encoding format(s) you used, just in case anyone forgets. And yes, a text file can be 1000 pages long, if it must be.
And C programming language has been thriving for 30+ years, it might not be too much of an assumption to think someone could dig up a C compiler in 50 years and compile a straight ANSI C program. A program that converts My Weirdo Format(tm) to raw binary frames and audio with comments in the source code might be all that is necessary for transferring lost media. I suspect the source code for that could fit on your archival media and would take a tiny fraction of a percent of the space.
I suspect that since CDs and DVDs are so prevalent and such an open format, that even a thousand years from now someone will be able to figure out how to read one and copy it to another medium. And CD's format is simple enough that it would be trivial to reverse engineer, if someone dug up our civilization in 10,000 years they could likely find the thousands of the various dictionary and language CDs out there as a sort of rosetta stone.
obviously there would be data loss on 10,000 year old CDs, but theoretically you could pull something off the regular non CD-R kind.
Re:I'm not sure I'd call it "open sourcing" but... (Score:1, Interesting)