Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Education Books Media The Internet

Carnegie Mellon's Digital Library Exceeds 1.5 Million Books 119

cashman73 writes "Most Slashdot readers are probably familiar with Google's book scanning project, a collaboration with several major universities to digitize works of literature, art, and science. But Google may have been beat to the punch this time -- about a decade ago, Carnegie Mellon University embarked on a project to scan books into digital format, to be made available online. Today, according to new reports, they now have a collection of 1.5 million books, the equivalent of a typical university library, available online."
This discussion has been archived. No new comments can be posted.

Carnegie Mellon's Digital Library Exceeds 1.5 million Books

Comments Filter:
  • Link here (Score:5, Informative)

    by autophile ( 640621 ) on Thursday November 29, 2007 @09:36PM (#21527159)

    http://tera-3.ul.cs.cmu.edu/

    • Re: (Score:2, Informative)

      Another link here

      http://dli.iiit.ac.in/ [iiit.ac.in]
    • by 602 ( 652745 )
      Actually, that's called a URL. A link would be this [cmu.edu].
    • You can review books from CMU's University Project at this Internet Archive page [archive.org].
  • by MrAndrews ( 456547 ) * <mcmNO@SPAM1889.ca> on Thursday November 29, 2007 @09:37PM (#21527167) Homepage
    This site (which is found at ulib.org [ulib.org] BTW) seems to have a pretty good collection of obvious titles to choose from, though having to download a custom plug-in to read anything is a bit annoying (and apparently temporary). I played around for a while, seeing what I could dig up, and didn't see any obvious gaps (though I purposely avoided anything modern).

    As an author, I was always a bit worried having Google as the sole gatekeeper for this kind of service... not that I necessarily distrust Google's intentions, but if they changed their worldview one day, it'd be a pity to have so much work invested in only one place, and have to re-build it all somewhere else. It's nice that there are proper choices, and not all from a commercial stance either.

    I don't know how smooth the integration process is (I submitted one of my books, but it appears it's a very un-automated system involving email etc, so it will probably take a while to see results). But still, I'm glad they're giving authors a way to help grow the library. Here's hoping it becomes even better than its promise!
    • It only has one of Shakespear's works - a Midsummer Night's Dream - in addition to a few biographies and translations. Seems like a pretty big omission.
    • by garcia ( 6573 )
      I'm quite impressed by the years of books they have offered. While I figured that many of the books would be out of copyright and a few would be done with permission, I was shocked to see that they have nearly 1/2 million books published after 1981.

      Check out their progress report here [ulib.org].
    • This site (which is found at ulib.org BTW) seems to have a pretty good collection of obvious titles to choose from, though having to download a custom plug-in to read anything is a bit annoying (and apparently temporary).

      I agree that custom plug-ins suck. But before they fix that, they should probably fix that typo:

      To see the book pages of ULIB, please dowload free TIFF plugin or DjVu plugin
    • by sowth ( 748135 )

      ... though having to download a custom plug-in to read anything is a bit annoying...

      You don't need a special plugin. You just need to specify a program which displays tiff files to your browser.

    • Actually, the Million Book Library seems brilliantly conceived but (so far) rather poorly executed. The "million books" claim -- while technically accurate, I suppose -- is less than it seems. A search on "Jane Eyre" pulls up about 20 listings, mostly one and the same edition of the book. Many books are only "partially accessible" -- something to do with copyright restrictions, I guess.

      I've only had about a 50% success rate in actually viewing books once I have their listing; the system just doesn't wo

  • We can access them online. Don't need no stinkin' library ticket :-)
    • Browsing around, it looks like you can only access 15% of the books. Looking through the computer section (naturally :), most of the books are long out of print (Eg: A Primer Of Algol 60 Programming by E. W. DIJKSTRA, 6502 Machine Code For Beginners, etc). Those books are still copyrighted, the publisher won't sell you a copy, yet they want to deny everyone access to it.
      • by jmorris42 ( 1458 ) *
        > Those books are still copyrighted, the publisher won't sell you a copy, yet they
        > want to deny everyone access to it.

        They have to follow the law so I forgive them on books under copyright. But they don't appear to even want to make it easy to access complete copies of books that are out of copyright. You can write them and ask for a full copy of a book. Bah. And no easy way to mirror the site (even just the out of copyright material) either.

        Our library already hosts a Project Guttenberg mirror.
  • Because apparently the Slashdot editors can't be bothered...

    http://www.ulib.org/ [ulib.org]
  • by HemmingSay ( 1136561 ) on Thursday November 29, 2007 @09:55PM (#21527321)
    i really like the idea of online libraries, but i had to laugh when i got the following result for the first book that came to mind: "Please provide a valid query (Word greater than length 3)" the book was "the old man and the sea".
    • by sqrt(2) ( 786011 )
      Maybe they should outsource their search features to google.
    • Have you tried the "" thingies on the ends?
    • Did you realize it responded to your query with a creative poem of its own?

      You said, please give me
      The old man and the sea
      Please provide, it said, a valid query
      With a word greater than length three,
      and with that, I could supply thee
      with a result that you would like to see.
  • I suggested to my company that we use Carnegie Mellon's reCAPTCHA program to solve two problems 1) Improve our CAPTCHA implementation 2) Help Carnegie Mellon with their online publishing initiative. To my pleasant surprise I recently found the company decided to go ahead with reCAPTCHA. Sweet! If you are not familiar then check it out and do some good for everyone! http://recaptcha.net/ [recaptcha.net]
  • by chipasd ( 1135399 ) on Thursday November 29, 2007 @10:17PM (#21527477)
    For those that missed the articles about C.M.'s associated project for validating all those scanned words on all those scanned pages: http://recaptcha.net/ [recaptcha.net]

    reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. More specifically, each word that cannot be read correctly by OCR is placed on an image and used as a CAPTCHA.
  • on book from '20s

    wow. Universal access pffft
  • I don't want to be a party-pooper, but I wasn't that impressed with the collection. The latest chemistry books and engineering books were from 1920. A LOT has happened in chemistry and engineering since then. Are they starting with older books and slowly moving to newer ones? The Chinese collection is impressive, but hard to read (unless you know Chinese). Also the plugin only works on windows-based computers. That is sad for me.
    • by theMerovingian ( 722983 ) on Thursday November 29, 2007 @10:33PM (#21527579) Journal

      Copyright law in the US started out pretty reasonable - 20 years from the date of registration. Walt Disney spent alot of money and lobbied the government for another 20 year period. Before this could expire, they lobbied to have copyright terms extended to the life of the author plus 20 years. As a result of the Sonny Bonno act, it was expanded to the life of the author plus 75 years. (NOTE: this is a very brief approximation of US copyright law history - it was actually somewhat more complex than this and with several more twists and turns). See here for a detailed explanation. [copyright.gov]

      The functional result of this lobbying is that no US copyrighted work created since 1923 has lapsed into the public domain (unless the owner screwed up by not renewing the copyright at the appropriate juncture).

      • You bring up somehting important. I saw mostly chinese and indian texts in that online library, not to mention that it takes some stupid plugin. As a result, that online library and anything like it will forever remain useless for most people. WE can borrow a book from a public library for free, so why can't you read good online books for free? disney:( sonny bono:( We need copyright reform.

        • It is ridiculous that drug companies can spend billions of dollars on research for a drug patent that only lasts 20 years, while any pot-smoker with a guitar can write some song and the US government will grant him a monopoly that potentially extends well over 100 years.

          Note: I am not 100% in support of drug patents in the current state, but the discrepancy between patents and copyrights is very dramatic.

          • Also note that the US government will help the pot smoker enforce his copyright with criminal sanctions, whereas the drug company is completely on its own in enforcing its patent with civil litigation.
        • It depends on your definition of "most people".
  • by ross.w ( 87751 ) <rwonderley AT gmail DOT com> on Thursday November 29, 2007 @10:30PM (#21527565) Journal
    So how many Libraries of Cogress is that?
  • by liftphreaker ( 972707 ) on Thursday November 29, 2007 @10:51PM (#21527655)
    I picked a book at random, Dickens' tale of 2 cities. Here's the first few lines:

    "TIT was the best of tunes, it was the worst of times,..."

    "li was tie winter of despair, we had everything before us,..."

    I guess they just OCR'd books en-masse without proof reading. Oh well, think of it as an exercise for your brain.
    • Google's isn't any better. There, you have access to the actual scans, but many of them are of poor quality.

      http://books.google.com/books?id=whSwpQn8p9QC&printsec=frontcover#PPR3,M1 [google.com]
      Scroll down through the first dozen pages or so of Shakespeare's Julius Caesar; all of the words are cut off at the bound edge. I wonder why anyone would devote the time to scanning an entire book (including the blank pages) if they aren't going to do it right.

      • Doing some quick browsing around, I tried the HTML view in one book and got a 404 error yet in another book the HTMl view worked fine.

        The QuickTime plugin introduces delays which make flipping through pages pretty tedious. The HTML view is faster, but then you're going to run into OCR errors as others noted.

        Here's an idea: create a custom wiki-style site containing all the book text and provide a link to each scanned page. One would normally read the standard web (HTML) version, but if any errors are foun
    • Mr. Burns: 'Lets see. It was the best of times, it was the "blurst" of times! You stupid monkey!'

      Maybe CMU just needs to hire smarter monkeys...
  • The definition of a Library is just changing. When you look at a small Internet cafe what you are really seeing is the modern version of a Library that also caters for those who wish for some refreshments. If the old Dickensian hard copy libraries want to survive they will have to become more communal and socially active. Yes, that means having network access and a place for young people to talk. While you have them captive you can promote books with posters on the walls and seminars and social events. Its
  • I have an idea. Hear me out...

    Sure, Google is currently losing the scanning wars, but they'll catch up. Someone else may join the race, and eventually there will be a single collection containing 1 BILLION books. Sure, I like to read. I also suspect other people like to read too, but who has the fucking time to read 1 BILLION books? As an average, educated male, I hate being in a discussion with someone who name-drops a book I never heard of before, as a proof that my point is invalid because I am not we
  • 90% of books available today are not worth the paper they are printed on; most of the rest contain nothing original. It seems like almost all that is worth reading or studying was written before 1900. What masterpieces the 21st century has offered so far? The New Kind of Science? Is that all we, the intelligent species, are capable of now?

    It seems that, culturally, we are way behind compared to what we were a hundred years ago. Want to learn geometry? Read Euclid. He wrote his books thousands of years ago.
    • Re:Well... (Score:5, Insightful)

      by agrippa_cash ( 590103 ) on Thursday November 29, 2007 @11:36PM (#21527967) Homepage
      You are mistaken, and for this you should be glad. It often takes several years for masterpieces to be recognized as such, so it shouldn't surprise you that nothing you like has been acclaimed. I'm not a high culture joe myself, so please don't be offended, but today's high culture may be incomprehensible to you because you aren't sophisticated enough to appreciate it. If you grow up watching Fantasia, it is easier to enjoy Stravinski. As for originality, the tale is in the telling. People of years past lived and died much as we do, a bit more fresh air and hard work maybe but basically the same. Basically. They were us first, what are you going to do? Culturally we are far, far ahead of the 1907 crowd. Your image of 1899 is almost certainly based on the western upper class (listening to Wagner) rather than the teeming western poor (listening to minstrel shows) or the uncountable colonized listening to whips, maxim guns, pickaxes and sermons.
    • Re: (Score:2, Insightful)

      by Transtrek ( 1173839 )
      Also worth asking, are you willing to learn 2000+ year old greek to read Euclid or for Euler learn Latin (the language in of scholarship in his time)? One reason that we have and use more modern math textbooks is changes in language and notation over time. Also it is often the case that the original proof is far from the best that has been found since there is now more structure developed in later works that allows either condensing or a novel approach. If you limit yourself to pre-1900 works, you throw
    • Want to learn geometry? Read Euclid. He wrote his books thousands of years ago. Calculus? Euler is your best teacher, and has been so since 1700s.

      What about chaos theory? Theory of computation? Axiomatic set theory? Topology? Large chunks of modern probability theory?

      Mathematics is developing more new material faster than it ever has.

    • It seems that, culturally, we are way behind compared to what we were a hundred years ago. Want to learn geometry? Read Euclid. He wrote his books thousands of years ago. Calculus? Euler is your best teacher, and has been so since 1700s. Fiction? Music? Architecture? ... You get the point.

      It seems that you aren't exposed to any modern culture.

      Euclid's Elements are fine, and fun to read. But I wouldn't read Euclid for differential geometry. Or symplectic geometry. Or dozens of other kinds of geometry born
      • Marsden's Calculus

        That's exactly my point: why did the guy need to write "his" calculus in the first place. I can't see any reason other than personal profit. Unless he is some genius teacher who invented a novel way of presenting the material that would make learning effortless or something. The truth is exactly what I said: most of the calculus books available today contain nothing original.

        Music? Even more.

        Yes, of course - if you count all the mp3 files out there.

        • Calculus books shouldn't contain any original research. They're textbooks. There are clear pedagogical reasons to use new textbooks:
          • There are obvious regional variations in spoken and written language. Euler was Swiss.
          • Notation changes over time as the concepts are hammered down.
          • More importantly, textbooks are intended to guide students to useful applications, and in advanced texts, possibly fruitful avenues for research, which obviously change as relevant applications change and research progresses.
          • The a
          • What you are saying sounds like it makes sense, but, unfortunately, it does not.

            regional variations... Euler was Swiss.

            Ok, I understand: Germans had hard time understanding him. (He wrote in Latin, by the way, not in his "regional variant" of German; in any case, this problem, when it does exist, is solved by the method known as translation.)

            any reasons

            Stupidity?

            What you are essentially saying is that each and every calculus textbook is a worthy read, if you want to learn calculus, even if yo

            • Ok, I understand: Germans had hard time understanding him. (He wrote in Latin, by the way, not in his "regional variant" of German; in any case, this problem, when it does exist, is solved by the method known as translation.)

              Translation of mathematical texts amounts to rewriting it. Why not make it better for a particular purpose at the same time?

              What you are essentially saying is that each and every calculus textbook is a worthy read, if you want to learn calculus, even if you have read all other book on
    • Original groundbreaking technical literature is often very difficult to understand. The author struggles to describe the new concepts. Many years later, other authors can simplify explanations and remove dead ends and needless excursions into side cases. Authors with skill at being authors instead of being researchers can choose more understandable language.

      Newer texts are also likely to use modern terminology, whereas original papers may have obsolete or obscure terms. Consider trying read a text where de

  • A lot of these books would languish in obscurity, only to be touched by very few people. Now the information is available via search, which means even more useful information can be had and these lost "works" might finally serve the purpose they were meant to serve...to educate the masses.

    Printed books have their place, and does the digital library. The quality of our information is based on easily it can be accessed. A report written based on 3 sources the old way, might benefit from having 100 sources
  • If you are a student and don't have a lot of money to buy your books (mostly in third countries ). You can find all your college textbooks in there. I think is a way better library than google and others, obviously because the copyright material. But anyways in that countries you would have photocopied the book. Or maybe because when you want to legally buy a book you find out that it cant be shipped to your country.

  • No viewers for Linux / Firefox and the website feedback gives

    Not Found

    The requested URL /cgi-bin/udlcgi/ULIBCopyrightreport2.cgi was not found on this server.
    Apache/2.0.55 (Ubuntu) mod_perl/2.0.2 Perl/v5.8.7 Server at tera-3.ul.cs.cmu.edu Port 80
  • by aminorex ( 141494 ) on Friday November 30, 2007 @01:34AM (#21528893) Homepage Journal
    1.5 million books? Ok, maybe my tastes are a bit more focussed on mathematics, physics, programming, economics, and linguistics than would be the CMU library, but I just burned 3 DVDs worth of math books alone, 12GB of PDF, at roughly 8MB/title, for 1500 titles. And that was just one week's worth of crap filtering for one man. Methinks CMU isn't really trying.
  • how many library of congress is that?
  • Last time I counted, I had 800,000 e-books on disk. For a large institution, I'd expect better. Their collection probably isn't mostly sci-fi and D&D manuals though :/
  • It packs black and white images like crazy, though a Firefox plugin would be nice, this really is one of the best online book viewer I've seen technology wise. It's fast and pretty easy to interface with scripts, and all the images seems to be cropped.
  • ...that's nearly as big as my...er...friend's...MP3 collection.
  • That just wasn't a good experience. I found the one book I looked for (Pilgrim's Progress) but I found the User Experience next to bad. They need to kick that up a couple of notches before I would use this over Google's books.
  • I can't use the proprietary DjVu image viewer plug-in with my Mac/Firefox combo and I don't understand the use of TIFF images served one (very slow) page at a time requiring both Flash & Quicktime in my browser. Other book sources offer text and PDF formats. It is possible that there is some intent to restrict users by these unconventional formats and awkward serving procedures.

    Indeed, the FAQ says that if you want to download an entire book for offline reading, you must send an email request. Your use

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...