Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!


Forgot your password?
Books Data Storage Media

E-Book Museum at Library of Congress? 91

David H. Rothman writes "E-books and other digital publications in the U.K. are about to go into a national archive, and in fact the Brits and others have even shown an interest in the e-book technology of yore. Goodness knows, as some have pointed out, we already have enough virtual e-book museums--unwittingly created by the march of technology. But how about an International Electronic Book Museum in the Real World, ideally the Library of Congress? Before Luddites and crypto-Luddites keel over at the thought, they should keep in mind that the technology is already several decades old and that it would be helpful to collect the artifacts in a systematic way before it's too late. More at TeleRead."
This discussion has been archived. No new comments can be posted.

E-Book Museum at Library of Congress?

Comments Filter:
  • by chmod_localhost ( 718125 ) on Wednesday November 05, 2003 @04:02PM (#7399469) Journal
    What happens when the software for reading these e-books is no longer supported? By using proprietary formats, it is inevitable that one day, the stuff in our nation's own library will be unreadable.

    Only by creating an open standard, which anyone can choose to implement on the system of their choice (open source it, while you're at it!), can the information truly be timeless.
    • For most texts ASCII or Unicode work just fine and are fairly efficient and easy to compress for archive purposes. Although, I'm not really sure what format to put graphics or pictures in.
    • Why not bundle the application to read the format with the book storage? Problem solved!
      • by farnz ( 625056 )
        What about the hardware to run it on? What about the OS? Is an eBook that only runs on 48K ZX Spectrums with Microdrives now good enough? Can we even read the media?

        The advantage of an open specification for the format (unencrypted PDF would work, for example) is that provided I can access the data, and provided I have a copy of the specification, I can read the books. If I don't have the specification in an alternative format, I'm screwed. If the reader requires (say) a PC without PCI to work, and I don't

    • Only by creating an open standard

      Stating the bleeding obvious, there's this thing called HTML. This isn't just about 'e-books', indeed those are a small part of the UK proposed law. They'd be storing webpages and electronic journal publications (e.g. science journals online). Much of which is in HTML anyway, which I was under the impression was, despite the efforts of certain large companies, an open standard impementable on the system of your choice.

    • That's why my personal ebook museum is all in ASCII. The text is recoverable if you can recover the 1s and 0s in any way whatsoever.

      Some people can even read the stuff directly from the printed binary, but that's a bit much for me. I'd transliterate back into text.

      No need to choose and implement any new standard, we've already got a beaut for English and Unicode is coming along.

    • If you reencode them in a different format, you've altered them, and you are no longer archiving the original eBooks.

      Keep em as they are. Our primitive 1024 bit encryption keys will be a joke to the quantum processing space men of the future, anyways.

      It'd be like translating french works from folks like Voltaire or Hugo into english, and throwing out the original manuscripts, because it will be easier for future historians to grok.

      The medium is the message.
    • In the early days of film, the LOC would only copyright things on paper. This resulted in many old films being printed on paper soley for the purpose of copyright. In retrospect this seems absurd, however a lession leard may be that the LOC can (and maybe should) should require that THEIR copy be manufactured using some archival method which may be unsuited for general distibution.
  • Let's just hope... (Score:5, Insightful)

    by CountBrass ( 590228 ) on Wednesday November 05, 2003 @04:03PM (#7399482)

    They don't make the same mistake as the BBC's Doomsday book project where they stored all the data on quickly obsoleted BBC Micro controlled laser discs using a proprietary format - woops! A real pain for them to recover it only a decade later.

    • I can see it now...

      We've got an archive full of documents and emails sent from the PM about Dr David Kelly, they're right here [computerworld.com.au].

      Oops - anybody got a working Windows RMS hooked up?

    • The Domesday Project was rescued just in time, but should stand as an example to everyone of just why it is important to have copies made of works in a format that will be readable in the future - even if this means using a different medium and eschewing technological copy-prevention measures for archive copies. This is our duty: to preserve this material so it can enter the public domain when its copyright expires.

      The format was not actually too hard to hack into, as the video discs were CAV analogue w
  • by sporkboy ( 22212 ) on Wednesday November 05, 2003 @04:06PM (#7399507) Homepage
    How many eBooks have been released as eBook only, not counting prereleases of excerpts or first chapters with "special intros". Aren't most of them just existing publications in a different format? If the format dies then there is a reason, and if the work continues in some sort of archival medium then how is it a loss? Would the same lamentations be heard over cassette recordings of books on tape?
    • I agree with you, but for the fact that unlike the dead tree or audio formats, the e-book has at least the potential to be full-text searchable. Which could be invaluable for the work in question.

      If this flies we wouldn't need Distributed Proofreaders [pgdp.net] anymore. B
      • And if I may segue from an earlier post this is another reason to stick to ASCII/Unicode. grep is great. grep is good. grep (and his buddies sed, awk and Perl) moved text searching from the realm of the "potential" to the fully realized, lo these many years ago.

        It's only the commercial interests that feel the need for new text format and new text tools for that format.

        Fuck 'em. Don't let 'em do it. Only buy ebooks in the existing open standard, just like you wouldn't buy a dead tree book that required spe
  • ...how many Libraries of Con...um, oh ok nevermind...
  • I can see it now: They go with a Microsoft databse, and the actual books decay and are lost. Then one day, an M$ update that goes out of control causes the database to crash. Irreplaceable works by such authors as the Minnesota steel worker who penned "here I sit all brokenhearted" are lost to the sands of time.
    • Yes, or there is a security problem and Hackers go in and rewrite some of the books without anyone knowing it. This is a scary thought :) I also agree that an open source format would be best. Maybe the slashdot community can start working on that. It could be a community project. After all.... It takes a village, people!
  • This is really going to screw up the Library of Congress data storage unit.

    Now the Library of Congress will be holding many Libraries of Congress. It's a conundrum!


  • Why the LoC? (Score:2, Insightful)

    by azzy ( 86427 )
    > ideally the Library of Congress?

    Why? What's so ideal about the Library of Congress to hold an international collection of e-books?
    • by tommck ( 69750 ) on Wednesday November 05, 2003 @04:14PM (#7399594) Homepage
      Yeah, I'm an American, but even I was going to say "Wait a minute!" to that one. The USA is not the whole world. Unfortunately, until we take over the planet, there isn't a single place that one can go with these things. NATO, the United Nations... They all only have some countries as members.

      I guess we now have a good reason for world domination!

      • >
        NATO, the United Nations... They all only have some countries as members.

        Which countries are missing from the UNO?

        • Switzerland (currently joining, but earlier denied membership because of their neutrality), the Vatican City, Taiwan (China is a member though), East-Timor, Kiribati, Nauru, Tuvalu and Tonga, and maybe a few others.

          Also, there are probably a few micronations that could be added to the list, e.g. Sealand.

      • I think that actually you would find that about 99% of the World's population lives in a country that's a member of the UN. As an onther poster pointed out Switserland is joining, leaving Taiwan (0.3% of World) as the largest country not a member.
        And I as I understand historically the seat occupied by China in the UN
        is considered to be Tiawanese by most/some Tiawanese.
        (Taiwan is wat is left of pre-comunist China, they themselves and the rest of the world is still figuring out if they are a separate country
        • "some" does not mean "all". "99%" does not mean "all".
          So, who is going to represent East-Timor and Taiwan and the others in getting all their books in the UN library?

          I guess it will have to be the UN library. That way we only have to conquer a bunch of small countries. Maybe we can wait until a Democrat is President and he can feel good about his own military victories for once...

          • I am not a native speaker, but I was under the impression that
            in english "some" was related to "several" "a few", more than to
            "most", "a lot". But then I could be mistaken, and indeed it does not mean "all".
            And I was under the impression that East-Timor is a member of the UN.
            http://www.un.org/News/Press/docs/2002/ga100 6 9.doc .htm

            Leaving mainly Taiwan ROC, and I was arguing about it's uncertain status, as both in mainland China
            and on Taiwan a lot of (some?!?) people can be found who would argue that public
            • From dictionary.com: some adj.
              1) Being an unspecified number or quantity: Some people came into the room. Would you like some sugar?
              2) Being a portion or an unspecified number or quantity of a whole or group: He likes some modern scupture but not all.
              3) Being a considerable number or quantity: She has been directing films for some years now.
              4) Unknown or unspecified by name: Some man called.
              5) Logic. Being part and perhaps all of a class.
              6) Informal. Remarkable: She is some skier.

              Being a

      • True. We must rebuild the Library of Alexandria! And while you're at it kick all those Arab squatters out of Egypt.
    • The LOC is one of few institutions in a position to do so.

      Because the LOC is located in a free country (blah blah slashdot rightwinger whining here) that will not censor the books, and will share the books with anyone who wants to see them. They also have the funding and resources available to make it happen.

      It doesnt preclude, say, China from making their own archive, and no doubt they would. But their archive would only include government approved books, and fat chance ever getting access to it.
      • I'd regard my country (UK) as 'free' also, but looking at political decisions taken in the 'free' world can we really trust them to allow totally unbiased uncensored material to be stored? I recall recently the US govnt created a list of websites that were banned, related to terrorism, what's to stop them demanding the LoC not stock certain e-books?
        • It wasn't the US government, it was some doofy judge in Pennsylvania.

          But, if your question is, will the LoC archive child pornography? No, they wont.

          I'm sure the UK would do it, or France, or Germany. But to them, heading an "international" effort means spending US tax dollars. "International Space Station" = NASA money, 2/3rds of the UN operating budget = American money.

          No matter who does it, as an American resident, I'm going to wind up bankrolling the motherfucker. Might as well keep it local.
        • As a Brit, the LOC git kinda got my juices flowing with the "Why should that be an american role?" attitude.

          On reflection - It is down to the UK and other European countries to archive the content, often generated in the US, which is banned by the US, and down to the US to archive content which is banned (or at least restricted) in Europe.

          I would suggest that several different international repositories are required. When at some point we wind up as a united Earth, we can then emalgamate the lot. (At whic
        • The first amendment for starters. You have the right to read terrorist literature, you don't have the right to donate money terrorist organizations, however. That is what was "banned".
      • Do you really think a book like the Big Book of Michief which explains how to build bombs and other things that the goverment doesn't want people to make will be available there. I have no doubts that in our "free country" certain books will not be available. You could argue strongly that very few people should have access to detailed instructions on making nerve gas, but I doubt anyone could argue that not providing unlimited access is censorship.
    • What's so ideal about the Library of Congress to hold an international collection of e-books?

      Probably because they could. Of course, it would make more sense to do it on linguistic/regional/national lines and have them point to each other when needed.

    • >> ideally the Library of Congress?
      > Why? What's so ideal about the Library of Congress to hold an international collection of e-books?

      This is a valid point. Why does the LoC rate as the "default" international library ? Why not, say, Library and Archives Canada [nlc-bnc.ca] ? Or the Australian National Library [nla.gov.au] ? Or the National Library of Ireland [www.nli.ie] ? Or the National Library of Jamaica [nlj.org.jm] ? Or .... any of any of these [yahoo.com] ? Why the LoC in particular ?

      I'm not trying to sound anti-American, just offering a non-

  • does the phrase, "Electronic Book Museum in the Real World" mean? Isn't an e-book museum, by it's very nature, virtual? If not, aren't the e-books then just regular books, minus the "e"?
  • by bcrowell ( 177657 ) on Wednesday November 05, 2003 @04:33PM (#7399780) Homepage
    Above all, as no printed material is produced during the delivery of a 'book', the cost of publishing the book is significantly reduced and the whole process of publication is environmentally green.
    That's not true. Here (pdf file) [nacs.org] is some info on college textbooks, for example. Printing, paper, and binding (PPB) are almost never a significant percentage of the retail price of a book.

    I would like to see the Library of Congress start accepting digital books for copyright registration, however -- it's a drag to have to send them hardcopies.

    In the early 1990s, Adobe's Acrobat reader was released. Although it is not a software specifically for eBooks, its multi-platform file format (PDF file) is an attractive feature for eBook publications. The digitization of both texts and graphics into a compact file that can be recognized in every platform is an important concept in eBooks. However, we still do not have an eBook publishing standard at the moment, though work in that direction is being done.
    Well, actually PDF is the defacto standard for digital books. It's just that none of the handheld devices use the standard; they all use their own nonstandard, proprietary formats instead.

    There are standard subsets of PDF that have been defined that are appropriate for archiving books. For example, the subsets don't allow you to include video or programs.

    • PDF sucks for eBooks, as far as I am concerned.

      One of the main problems is that, when you get down to it, the core functionality is putting images of a bunch of physical pages into one big file. This is fine when you can read it on a 1600x1200 screen, but when you need to view the image on a Palm, it doesn't work. (The text doesn't magically reflow to fit the Palm.)

      Personally, I think simple HTML (i.e. HTML 3.2) would be perfect for e-books.. easily parsed by any device (Palm, PocketPC, Desktop Compute

      • PDF sucks for eBooks, as far as I am concerned. [snip] Personally, I think simple HTML [...] would be perfect for e-books.
        It completely depends on what you're trying to do: create an electronic archive of books, or read books on a handheld device. Actually, one of the reasons for the failure of so-called "e-books" in the marketplace (apart from the proprietary formats) is that very few people actually want to read a whole book off of a hand-held computer.
    • PDF is the defacto standard for digital books

      And a very bad standard it is, too, IMO. PDF is great for one thing: producing an exact copy of a work on your screen or printer. Complete with the exact same font sizes, formatting, pagination, and so on.

      There are situations that's wonderful -- sheet music is an example I've used recently. But it's a lousy aim for most ebooks. In most cases you don't want the same pagination and formatting - you want the text to be reformatted to match how you're lookin

      • Unfortunately, for citations and other references, it's useful if not necessary to specify page, paragraph, and sometimes even line numbers and have them to point to the same place regardless of the "format".

        Something like this could be kludged by going overboard with <p id="chpt.1,p.55,par.4"> type tags using HTML, but there's something to be said for preserving the exact formatting of the original text.

        A lesser problem is properly preserving hypenation, which can pose a problem with HTML as well

        • Citations may not be a big issue for many people. And if they are, maybe some formatting-independent measure can be used; something like 'Chapter M, paragraph N' might be awkward for printed books, but might be easily-automated and accurate enough for ebooks. And it wouldn't need any changes to the text itself.

          Similarly, hyphenation is another artefact of the limitations of printing; surely ebooks shouldn't need to suffer from those limitations too? If text is stored in paragraphs, then it's up to the

          • You're right - I misread the summary as meaning older texts to be added to a "museum", in which case you would want to preserve formatting so that existing citations would match up when looked at 100 years down the road.

            As for citations not meaning much to most people, perhaps, but the works that are most often referenced and studied by others (be them scientific, religious, or [soon to be] classic literature) are also likely the ones you want to preserve and have an accurate, concise way to reference any

    • According to that PDF, the cost was 32.3 cents on the dollar, or nearly a third. That's a lot of money considering most college texts are between 20 and 50 dollars. That's 6 bucks off the largest of my lit class books, and 15 off my Java and UNIX books while I was majoring in SE.

      Here I'd like to note that I saved on my Shakespeare and Jonson class by finding nearly every text on Project Gutenberg (if you need a link to get there, shame on you! [promo.net]), while even at the used shops they were 4 and 5 pounds apiec
      • The category in their breakdown is "paper, printing, and editorial costs," so only part of that 32% is for the physical production of the book. The actual cost of paper, printing and binding depends on a lot of factors. Most importantly, it depends on how many colors of ink were used (1 to 4), and on the length of the press run.
    • I would like to see the Library of Congress start accepting digital books for copyright registration, however -- it's a drag to have to send them hardcopies.

      The whole point of accepting hardcopies is so they have something to store (and the preservation of paper is well studied) and check out.
  • Seriously, how much storage would you need for the Library of Congress? If I can fit the human genome on my Ipod right now, what size hard drive do I need for a gajillion books?
    • Here's one guess from Brewster Kahle at Alexa Internet:

      "guess-timated" the Library of Congress' existing print holdings as "about 20 terabytes or $200,000 in storage space. It would take up the space of a couple of Coke machines." Of course, unlike Alexa Internet, which takes everything on pages including video clips, sound, and graphics, Kahle's estimate for digital storage of LC's print collection reflects "only the text, all ASCII. The graphics would get very complicated to estimate."
  • That's a nice way to dismiss any criticism of the subject at hand (I'm not saying that it deserves any criticism though).

    If you don't agree, you're a luddite, and if you claim you're not a luddite, disagreeing will make you a crypto-luddite. It's almost like the unbeatable logic behind "denial is the first symptom of addiction".

  • by spotteddog ( 234814 ) on Wednesday November 05, 2003 @04:44PM (#7399880) Journal
    The Library of Congress is already working on a program for preserving "digitally born" documents. Look at http://www.digitalpreservation.gov/

    *disclaimer: I currently work at the Library of Congress, but not on this project.

    • No NIH syndrome, I'd hope. Keep in mind that the E-Book Museum proposal [teleread.org] focuses on the artifacts that the public can see right there in person and on the Net--the machines and the media, as well as videos of old e-book references in movies, on TV, and so on. That's a different issue from content preservation per se. What's more, the TeleRead item already includes a link to http://www.digitalpreservation.gov--please don't think I've denied LOC credit for existing activities. What I have in mind, of course, w
    • Usenet is currently the most significant "born digital" internation collection of documents. No I don't mean all those binary groups, but the ones that are conveniently already in ASCII, ISO-8859-*, or Unicode. Amidst the noise, there is a lot of knowledge there.

      A significant amount of early Internet history is there as well: Stuff you don't/won't see in AOL or MSN and stuff you certainly won't see in newspapers or books anymore because it doesn't validate today's corporate dogma.

      The Usenet archives n

  • by iantri ( 687643 )
    Won't DRM make it difficult for the Library of Congress to archive these? What about when it needs to be transferred to a new digital format (because paper has been around for ages; computer technology completely changes every 10 years)?
    • Get in touch with your MP or foreign equivalent right now and point out the need for archival copies of normally-DRM'ed works to be made in some unencumbered form. As a disincentive against the misuse of DRM, the provision of an unprotected version for archival should be a precondition to validate any law which would make it an offence to bypass said DRM restrictions. So if you haven't deposited an unprotected copy of your work at your own expense with your National Library, then any anti-circumvention la
  • by Anonymous Coward
    Distributed Proofreaders [pgdp.net] is the main source of public domain electronic books. It is part of Project Gutenberg [gutenberg.net]. DP consists of thousands of volunteers doing hundreds of books each month, and some of our math books, for which DP is using LaTeX. Thus, the project needs savvy (La)TeX folk to correct the OCRed texts.

    Thus, if you have a spare ten minutes now and then, you can make a significant contribution to public domain and mathematics. The finished e-books are free, downloadable, and computer-searchabl
    • Glad to see a plug for PG and DP! Needless to say, in the full-length post, I noted that the proposed E-Book Museum could feature a video interview with Michael Hart as well as the terminal he used in the early days of PG (or an equivalet). Would be one more way to promote PG and the related question of volunteering! I myself recently worked with other volunteers on Upton Sinclear's "The Brass Check." Bottom line? No conflict between The Computer Museum idea and PG, just synergy. - DR
  • I would think that many publishers already have a great deal of their works completely digitized at this point. As an aside, PDF would be just fine for a project like this, in fact pdf might even be overkill. And as far as how much trouble upgrading in 10 years will be, that's bollocks if the system is done right.
  • People looking for electronic archives should check out Bluemud.org [bluemud.org]. We have what I believe is the largest online archive of electronic documents. 25,000 documents online right now and another 225,000 waiting to be sorted by our librarians. As a warning, though, it's a mish-mash of stuff. A lot of full books, but a lot of other crap too: Old hacker 'zines, random usenet archives, and other more esoteric things.

    Plus, it's an open community. Anyone can become a librarian on the site and help sort doc
  • If you read the article, it's actually mainly NOT about books but rather about the other digital publications: zines, online-newspapers, et all.

    I think this is very useful as a large number of online versions of paper zines & newspapers have far more resources than their dead-tree counter part. Wall Street Journal and The Financial Times to name the few. So far, there was no central and/or organized way to capture this information.

    I also liked the bit: "This new legislation means that a vital part o

    • Just a gentle reminder that the E-Book Museum would preserve the artifacts and tell the story of the technology. It would not be so much of a content-preservation project. That's for other worthy endeavors and proposed endeavors [teleread.org]. As for the role of e-books in the UK content-preservation initiative, they are at least among the items included. Good enough for the point to be made! Needless to say, I couldn't agree with you more about the usefulness of preserving e-copies of nonbooks, too, such as magazines an
  • Around 5 or so years back the Library of Congress (or one of its peers) started digitally archiving old LPs and other recordings to preserve them. I know at one time, this archive was publicly available, but I've no idea of its current status or availability.

    An example of the content is it had several hours of mp3s transferred from live interviews of hillbilly moonshines. How-to's, stories, tales, etc...

    I'm curious if anyone knows where this might be, who is running it, and if it's still around?

  • Oh,gee. Ebooks. What's next? Virtual reality? Esther Dyson?

I've got a bad feeling about this.