Saving Digital History 133
Gavinsblog writes "The Washington Post
is reporting that the Library of Congress in the U.S. plans to initiate the $100 million National Digital Information Infrastructure and Preservation Program (NDIIPP). It is hoped that the project will lead to the preservation of data that is constantly changing on the Internet. But I wonder who will choose what is worth saving?" This may remind you of the LOC's effort to preserve and digitize the audio collection in the National Recording Registry.
one persons trash... (Score:5, Insightful)
Demi Moore Is My Cousin (Score:1)
Re:one persons trash... (Score:3, Insightful)
On top of the $5 million the library received for planning the initiative in 2000, the plan approved yesterday releases another $20 million of funding to develop a system for evaluating and storing digital information. Just as the library receives more than 20,000 printed pieces each day but keeps less than half, it now faces the herculean task of deciding what digital information should be saved for future generations.
--
The library doesn't keep all of the printed information it receives, keeping all of the information online is an enormous, if not possible task. The archive.org has terrabytes upon terrabytes of data, and they don't even come close to having everything that was on the web at any one time. With the budget they're talking about, keeping all of this information would most definitely not be possible.
Re:one persons trash... (Score:4, Insightful)
Disk space is cheap.
Disk space is cheap.
Disk space is cheap.
Save everything.
Re:Will There Be a EUian Counterpart? (Score:2, Funny)
"Bonjour, you cheese eating, surrender monkeys"
With Belgium and Germany, and call it the World Digital Information Infrastructure and Preservation Program (WDIIPP), right?
Re:Will There Be a EUian Counterpart? (Score:1)
Re:Will There Be a EUian Counterpart? (Score:1)
What about the... (Score:1)
It's nice that the government can ignore it at will, at least till someone in Hollywood notices...
Re:What about the... (Score:1)
What about the DMCA? (Score:3, Interesting)
KFG
skip slashdot. (Score:5, Funny)
Re:skip slashdot. (Score:1)
Re:skip slashdot. (Score:1)
$RecursionLimit::"reclim": "Recursion depth of 256 exceeded."
Re:skip slashdot. (Score:1)
New Media Doesn't Last (Score:5, Insightful)
The irony is that, while digital files could be preserved indefinitely in absolute perfection, many are being completely lost in much less time than it would take a book to turn to dust.
Kudos to the folks at the Library of Congress, and other projects like the Wayback Machine [archive.org] who are working to preserve a surprisingly ephemeral media.
Re:New Media Doesn't Last (Score:5, Interesting)
> years if kept in appropriate conditions.
My suspicion is that punch cards will make a return at some point.
I think the only difference will end up being the material used; how many centuries could a stainless steel plate with pin sized holes last in a library's basement?
Re:New Media Doesn't Last (Score:5, Insightful)
Re:New Media Doesn't Last (Score:3, Interesting)
i'm sure something better that's got a life of a thousand years or more will come along eventually, but speaking in the here and now the only way to get that is with holes in a piece of paper.
Re:New Media Doesn't Last (Score:4, Insightful)
Most commercial tape backup solutions have proprietary encoding solutions, and who knows if that company is going to be in business/supported in 50 years. In fact, for true(r) long-term storage, it's recommended to copy the data from the commercial tape backup solution copy to plain old tar.
Keeping an archive on media that will be around in 50 years seems like a minor point compared to finding the exact tape with the right data you need in a format you can still decode.
-JG
Re:New Media Doesn't Last (Score:2)
Re:New Media Doesn't Last (Score:1)
Physical format? (Score:1)
In fact, for true(r) long-term storage, it's recommended to copy the data from the commercial tape backup solution copy to plain old tar.
GNU tar doesn't help if the physical format of the tape (before the operating system even gets to it) is unknown.
Re:New Media Doesn't Last (Score:2)
Re:New Media Doesn't Last (Score:3, Interesting)
Didn't you pay attention in that IT class when they were explaining the difference between Digital and Analogue? Digital's main advantage is its reproductability. So if, say, the CIC Lib^H^H^H^H^H^H^HLibrary of Congress were to refresh the information once every five years or something like that, then you've got an indefinate storage period. The problem with it is that it needs constant maintenance. The reason this is better than analogue archives is pretty simple... when analogue decays, it's pretty much never going to achieve its original quality. You can do things to try and make it similar, but you're never going to get it as pure as the original.
With digital archives, you can avoid the decay simply by transferring. This isn't an option really with analogue because once you transfer, you tend to lose quality. But bits are simply 1s or 0s, and digital transfer can be perfect. Throw some md5 checksums in there to make sure that you don't corrupt the data, and boom... you've got perfect digital copy.
Re:New Media Doesn't Last (Score:2, Insightful)
That is why I said, "The irony is that, while digital files could be preserved indefinitely in absolute perfection, many are being completely lost in much less time than it would take a book to turn to dust."
Did you even read my comment before firing off a snide reply?
Re:New Media Doesn't Last (Score:3, Insightful)
So its a nested set of problems, with no one solution -- copying, conversion and emulation will all be required.
There are two major advantages of analog over digital: the first is that inaction over a period of years does not destroy analog material. If you put a stack of paper in a box in the early 90s, it's probably fine. That degree of inaction, however, can be the death knell for digital material. If you put a stack of CD-ROMs or disks away in the early 90s, chances are at least some of that material is gone.
The second is that while analog degrades slowly, bit-sensitive digital data (encrypted, compressed or executable files) degrades extremely quickly. If you make a mistake handling a book, say, you may end up with one torn page, but if you lose even a small piece of a bit-sensitive file, the entire thing vanishes forever.
-clay
Re:New Media Doesn't Last (Score:1)
What scares me too is a lot of the stuff today is not only on very ephemeral media, its also encrypted so that it is readable only under very special circumstances.
It seems that content is doomed once the technology used to decrypt it is gone.
Re:New Media Doesn't Last (Score:1)
CDs/DVDs don't degrade rapidly (Score:1)
DVDs are like CDs but the data is layered. The layers reflect different wavelengths of light so that allows the format to take advantage of depth. DVDs are thus slightly thicker than CDs but not larger.
Re:CDs/DVDs don't degrade rapidly (Score:2)
The mirrored part IS on the surface, or about one layer of paint away from it (it's right underneath the label). Which means the data is vulnerable.
Also, the materials decay. There have already been reports of early CDs becoming unreadable because the aluminium started corroding. Who knows what will happen in 50 years?
Yes, CDs are relatively stable, but even the manufacturers aren't promising a CD will be readable in 100 years.
Re:New Media Doesn't Last (Score:2)
Want to see some punched cards [cvmt.com] from the 1700's ?
WaybackMachine (Score:5, Informative)
DMCA??? (Score:1)
Mike
Internet archive already exists (Score:1, Redundant)
Related News (Score:1)
In related news, the Library of Congress has also purchased a subscription to Playboy.
That's good. (Score:4, Funny)
I deleted all my porn, and I was afraid I wouldn't be able to get it again when I need it.
Re:That's good. (Score:2, Funny)
Something Old, Something New (Score:5, Funny)
Re:Something Old, Something New (Score:2, Funny)
So the US Gov is setting up a mirror? (Score:1, Redundant)
Quality not quantity (Score:4, Insightful)
Plus not all the data can be saved anyway... sites such the Internet Movie Database, Amazon.com, and even Multimap are database-driven. Even assuming you get access to the underlying database you still need to preserve the code which gets used to generate the pages. And for what purpose?
Add to that the problem of accessibility. If the data isn't laid out in an easy-to-browse fashion then it's as good as dead anyway. I prefer to browse a library by topic, not searching for keywords and hoping a nice book pops out.
database driven (Score:1)
One Noid; not a pair.
As a society. . . (Score:2, Interesting)
I'm not talking about history. I love history. My shelves are well stocked with various dead trees delineating history.
I'm talking about our own lives. When we go on vacation we tend to spend most of our time *documenting* our trip rather than living it. Then we live it "in absentia" as a kind of recreational post mortem.
It's a fascinating to thing to observe, but I admit it puzzles the hell out of me.
This point was driven home to me a while ago when someone pointed out how odd it was that I only have one photograph of my SO of 10 years. I only have it because my mother took it. In my mind why would I want a photograph when I could just look at *her*?
KFG
Not just gender neutral (Score:1)
Perfectly normal, married couples use it in this sense.
It's a nasty and vulgar bastardization of social language, but it has no real substitute I'm afraid.
KFG
Geocities (Score:3, Insightful)
the big red dot !?!? (Score:3, Interesting)
This may sound like a joke but I really hope they save the big red dot. I dont know if the website is still in existence but a while back there was a website that had a big red button. When you clicked it, it said you have clicked the big red dot. The counter had some ridiculous number. This was back when it was envogue to show off your hit count.
TRBBTDDA (Score:3, Informative)
I believe you are talking about The Really Big Button That Doesn't Do Anything.
A novel concept in its time, it was a strangely addictive big red button on a website. Established in 1994, and linking back to itsef, it was more repetitive than Taco's story postings.
As interest in it waned, though, they added a message board-ish thing that let people comment on the button. As it was quickly misused, the best comments were left and the worst deleted.
There, the very first MS bashing in large amounts began with comments like, "Huh? A button that does nothing? Must be a new Microsoft product..."
Although dead at the age of 5, its final resting place [pixelscapes.com] is in its original home, Spatula City [pixelscapes.com].
Finally... My Dream job... (Score:2)
Stephenson's a Genius... We're basically looking at the first instance of the CIC Database....
Now we can start looking at the Metaverse and nanodrugs.... I seriously can't wait...
Re:Finally... My Dream job... (Score:1)
Content (Score:2)
Well, for start they may as well mirror Google cache and go from there. Panel of recognised authorities should not have too much trouble deciding the standards for the worthiness of existing material. They will need high level of independence, perhaps total autonomy, to be able to do fair job.
sloshdat and Mod Point for history (Score:4, Funny)
Well, maybe they can come up with a system where people post what they think it is important in history and then some of the same people moderate that using a unit called Mod Points up or down to see if they are or not worth saving... maybe call it sloshdat.
A mechanism would be deviced to protect the figures that make history against the people reading the history, and effect that could be called Sloshdatted.
I'm sure that with a system like this, historic figures such as many of the presidents would be Modded Down, while anyone who trashes an established monopolistic corporation would appear in the history books.
A system like this, would, without any doubt, save and Mod Up a comment like the present one for future generations.
Re:sloshdat and Mod Point for history (Score:2)
Dudes (Score:2)
Please tell me..... (Score:2, Funny)
*shudders*
Re:Please tell me..... (Score:2)
Oh, but he MUST be preserved! How else can future historicans understand just how much we fear that site today?
National Security (Score:3, Interesting)
One possible reason: because the OIA and Company [slashdot.org] might need the data to track down terrorists, etc. (Much the same way that the FBI keeps a collection of outdated phones books.)
After all, when the events of Iran-Contra [chadwyck.com] blew over, Congress quietly passed a bill authorizing the CIA to use any Federal agency for cover. Why not the Library of Congress? Indeed, where else? Makes perfect sense.
Re:National Security (Score:2, Insightful)
now realizing that this is a useful idea? This article isn't about the black archives - you can assume that they've existed for years and have no such funding constraints.
Choosing what should stay.... (Score:3, Insightful)
Or of course they will stear clear of politics and pick only science and absolute news, thus making it pointless for future historians.
Saving what is said OVER what is already saved is an interesting idea, but will this be targeted beyond those people who already retain everything (like CNN and the BBC) or will it include them ? The BBC store everything, "Just in case", will this money record that information yet again, or will it concentrate on other fields after ensuring that the BBC information is already available?
Historians of the future will have more information than historians of any other generation. Their problem will be that the miriad of views reflected via this information doesn't mean an increase in the spectrum of political opinion, but the ability of everyone to be opinionated.
Their worst problem is that the leaders of the day (Bush, Blair et al ) don't stand out like the leaders of previous years. Will anyone rate the speach of Powell or Bush against, Churchill or Kennedy ? Nope. So how to judge politics of today, how to judge what should be stored, we have no leaders of merit, we have only retoric. So choose what to store, and realise that history will judge as much what you choose to save, as what you saved. This is a different problem to that which has faced historians up till now.
Don't put it on a floppy drive. (Score:1)
Open plea (Score:4, Funny)
Dear U.S. Library of Congress,
Although not a U.S. citizen, I implore you to retain redundant backups of the website goatse.cx [goatse.cx]. Losing this website to a disaster would be tantamount to losing the collective works of Shakespeare, DaVinci and Picasso. The goatse.cx guy [goatse.cx] is an artist in the truest sense of the word.
Yours very truly,
grubby
Time will tell (Score:2)
I was wondering also about how they actually plan to physically store this information for extended periods of time. I was going to post a question about it until something occurred to me. In 500+ years time I cant really imagine many people will give a crap about much of the digital material that is being churned out today. It will most likey be a case of viewing sonething like AOTC, falling on their asses laughing at the "special effects" but reaching male consensus that Natalie Portman was a babe.
Re:Time will tell (Score:1)
When google groups went up, the did specifically mention the first major 'spam' (C Greencard) in their press release.
It all went to shit after that.
No material can be ignored. (Score:3, Funny)
IT'S A CONSPIRACY (Score:1)
Actually.. (Score:4, Informative)
On top of the $5 million the library received for planning the initiative in 2000, the plan approved yesterday releases another $20 million of funding to develop a system for evaluating and storing digital information. Just as the library receives more than 20,000 printed pieces each day but keeps less than half, it now faces the herculean task of deciding what digital information should be saved for future generations.
--
The library doesn't keep all of the printed information it receives, keeping all of the information online is an enormous, if not possible task. The archive.org has terrabytes upon terrabytes of data, and they don't even come close to having everything that was on the web at any one time. With the budget they're talking about, keeping all of this information would most definitely not be possible.
Wonder what Disney will think (Score:3, Funny)
How do we know its real (Score:3, Interesting)
On a broader scale news media love the internet because they can make outlandish claims when a story first breaks and then modify it as the facts become available. How do we know whats being preserved is accurate ?
Secondly, do we trust the people controlling all this nice, easily modified information not to change it to suit some political whim ?
They say the victor writes the history book. Digital storage will allow the victors to run a few drafts by their spin doctors first.
From the viewpoint of meme theory... (Score:4, Interesting)
For example if talkorigins.org was wiped out of existance tomorrow, the theories it has created will live on in the minds of those who have read them. These essays can be easily recreated by re-reading the various creationist works. On the other hand, if the various creationist works were destroyed, they would probabally not be recreated because they have already been refuted.
The history of information is the history of massive portions of it being eliminated, but then either re-printed, re-discovered, or re-invented centuries later.
The Catholic church 'knew' the earth was the center of the universe.
Along came Copernicus with his helio-centric theory, and the popes tried to lock him in his house for his entire life.
Now, if the modern versions of these men were to make the same claim, they would be soundly laughed at.
So, while this is a noble effort, it is merely a collection of data. Time itself the bayesian filter that will determine which parts of the internet are important.
-Brett
Re:From the viewpoint of meme theory... (Score:1, Offtopic)
People have been saying Christianity is 'dying' or 'going to die' for thousands of years.
It hasn't happened yet.
Just some food for thought.
Re:From the viewpoint of meme theory... (Score:2)
It has managed to survive by constantly evolving itself through appropriation of new theories. A hundred years from now, I believe that fundamentalist preachers will be espousing DNA from the pulpit and damning those who believe in quantum mechanics.
The more things change, the more they stay the same.
-Brett
Re:From the viewpoint of meme theory... (Score:4, Insightful)
That's whistling past a pretty big graveyard.
The problem is that time changes the definition of interesting. Would you be interested in the ads from a copy of the NYTimes.com from 1998? Probably not, unless you wanted to chuckle at the 667Mhz Pentia selling for $2500.
Would you be interested in the ads from a copy of the New York Times in _1898?_ Those ads are a view into a world you never inhabited, and expose the preoccupations of the era in a way that the articles don't.
We can look at the 1898 ads, not because the important information saved itself, but because archivists did. Someday the ads from 1998 will have the same interests for historians and anthropologists. Who will do the archiving there?
If we leave it to the present to sort the good from the bad, the future will never know what we considered unimportant. If you'd asked anybody in 1960 what that era's biggest technological revolutions of the time were, they'd have all said atomic energy and space travel. The real answers turned out to be the transistor and the birth control pill.
We are just about the worst possible people to ask what's important now, because we're too close, and it would be hubris to pretend otherwise.
-clay
great (Score:1, Interesting)
CVS for the entire internet? (Score:2)
Re:CVS for the entire internet? (Score:1)
I dunno, I'll donate that box of floppies I have around here somewhere. Maybe if we all looked behind our collective couches we'd find enough floppies to take over the ... ahem ... back up the internet.
The internet can't be that big that it can't fit on a couple of floppy disks, surely?
Nineteen Eighty Four (Score:3, Insightful)
In Nineteen Eighty Four, The Party embraced the digital revolution because they could easily control what the news said about them. (Who controls the past controls the future...)
Anyway, the point is the government may not be the best to be in charge of this.
</rant>
How New? (Score:1)
Wtf ? (Score:1, Redundant)
the $100 million National Digital Information Infrastructure and Preservation Program (NDIIPP). It is hoped that the project will lead to the preservation of data that is constantly changing on the Internet...
1. Who is to be the judge of what is worth saving ? I mean, let's be honest, there's a *truckload* of 'internet' out there !!
2. Wouldn't $100 million be better spent on a new hospital or two ? Just a thought...
Preservation vs DRM (Score:5, Interesting)
Since the LOC seems to hold some of the strings over implementation of the DMCA, they can obviously craft a loophole for themselves. But it will be interesting to see what that loophole is, and how it will work. Will they simply leave the stuff under DRM, and have their own copy of keys, or will they manage to have an unprotected copy?
Enquiring minds want to know.
Actually cheaper to save everything (Score:3, Insightful)
The actual cost of storage is not that high. The highest costs are involved when human intervention enters into the equation.
Backing Up the Internet (Score:1)
good content will always persist. (Score:2)
Google already does this (Score:3, Informative)
What about google? (Score:1)
MGS:SOL (Score:2)
Archive.org, and its limitations (Score:4, Interesting)
There's a live backup of the Internet Archive at the Library of Alexandria [bibalex.gov.eg] in Egypt. Thus, no single government can censor the archive. More duplicates may be established in other countries.
Perhaps unfortunately, it's easy to remove material from the archive. Just put a "robots.txt" file on your site, and not only will it not be captured again, the archive will immediately refuse to display copies of the blocked site. This seems to be enough to keep the militant copyright holders happy.
Most text is saved, but not all pictures, and very little video. This is good enough for most historical purposes.
web archiving of websites (Score:1)
click here [archive.org] if you want to see how slashdot.org has changed over the years.
How to save digtial information? (Score:2, Insightful)
Now, the only way to accomplish this is to make it a dynamic storage. That is, go with the flow and when a new sooper dooper storage device is invented, copy the data to that, thusly ensuring two things. 1) The data is "refreshed" 2) The data can be read by the contemporary hardware.
Preservation vs Storage (Score:1)
Re:Come on!! In the era of distributed storage... (Score:1)
If the content is worthwhile, people will hold a copy on their systems worldwide. If its just junk no-one's interested in it, nobody in the world thinks it's worth a hoot, then it will fade into oblivion.
That was my great hope in P2P networking. If its decent, even if only one person thought so, it would be kept. Even though the Library of Congress may try to keep everything ,there is much variance in demand for the data kept. The fact is that some data may be accessed far more or less than average. The data most desired will be most plentiful, the data least desired will be least plentiful, and the data nobody wants gets dropped into the bit bucket. The neatest thing is that no-one in particular is on the critical path. The production of an entire civilization is maintained within that civilization by the civilization itself.
Well, it was a star-trekish dream of mine that the public as a whole begins thinking as one organism, keeping the good stuff, excreting the junk, sharing useful stuff for all.
Re:Come on!! In the era of distributed storage... (Score:2)
If the content is worthwhile, people will hold a copy on their systems worldwide.
IDK about you, but I rarely copy anything I found on the WWW to my local system. I just create a bookmark so I can find it again. When the original site disappears, I'm toast.
If anything, hyperlink technology has made information less plentiful: where you have to buy a book if you want permanent access to it, for digital media a link will suffice (for most purposes). Few people will think about the possibility of the original site disappearing.