Human-Powered Internet Archive Book Project 113
Carl Bialik from the WSJ writes "A group led by the Internet Archive is planning a massive, ambitious effort to scan millions of old books and make them available for Web searching early next year. Behind that effort are about a dozen scanners, employees making about $10 an hour to manually scan volumes -- some more than a century old -- one page at a time, on special contraptions. The Wall Street Journal Online visits a University of Toronto library to watch one of the scanners in action: 25-year-old Liz Ridolfo."
Contributing to Gutenberg (Score:2, Interesting)
It's lighter! (Score:4, Interesting)
All though anything useful has to be illegal...
Re:Diffrent? (Score:1, Interesting)
Can only be a good thing (Score:2, Interesting)
In Stanislaw Lem's science fiction book "Memoirs Found in a Bathtub", all the paper in the world gets eaten by a virus and chaos ensues. Interesting read if you've missed it, has made me paranoid about how much the world still depends on paper.
Re:Why not join the Gutenberg Project (Score:2, Interesting)
The real test and business opportunity comes in the distribution phase. The first person to have a huge library of old books, and contracts with publishing houses for new books (with "purchases" by the end users, and DRM encumbered, of course) is the person who will win the market and define the (capitalistic) best way to scan and distribute.
And come the semantic web, things get really interesting. Already we have tons of sites that do cross-referencing between academic papers -- at least, the citations, as well as categorization by topic. When we can start doing this for books based not only on genre, but topic or specific references to persons, or general concepts ("Book X mentions technology Y on page Z. Click here for link!")... well, things will become far more informative. I suspect that in this field, the information -- the texts -- may become free, but the computerized (and human-assisted) analyzation, linking, value-added stuff will be the new commodity. He who has the best algorithm wins.
I guess information has always wanted to be free, but the analysis of said information lies firmly in the realm of economics.
Scanner: I want. (Score:3, Interesting)
The obvious advantage of this rig is that you don't have to open the spine 180 degrees and smash the books flat onto a single glass plane, you don't have to open the book up more than 90 degrees, so it's gentle on the spine of fragile old books. And the glass wedge is always self-centering against the spine of the book. The only way this scheme could work better is if there was a way to turn the pages automatically. But these are old and presumably valuable works, safer to let paid low-wage drones to do the work than risk mechanical damage.
Book Scanners (Score:3, Interesting)
How can I help? (Score:1, Interesting)
Manual seems safer to me.... (Score:3, Interesting)
Interesting - I don't understand your line of thinking - interested to hear more. Is the argument that automated page turning is *cheaper* so it's a pity that the project spends a lot on labour charges (manual scanning)? Or is the argument that the automated page turning is easier on the fragile old books? I'd appreciate if you could offer more details about the technology - the company's demo video shows a vacuum device lifting pages, but both examples are with modern books. Honest question: surely the advantage here is a low labour cost method of scanning huge numbers of pages (like the telephone directory example they show). But if you have fragile books, surely the advantage of a human is that they can see that individual pages might be particularly fragile, maybe even needing support or repair to scan, while the pre-set vacuum device will plough on regardless, it won't be able to make a decision on the quality of the pages. Does it have any sensing devices built in? My experience of older books (e.g. nineteenth century) is that in some cases the paper can be very brittle.
Re:RTFA? (Score:3, Interesting)
This is when the 'remove this object' firefox extension [mozilla.org] comes in handy. Just remove the image and the text is readable. 'Undo last remove' to get the image back.
I don't think you should have been modded down.
Re:Diffrent? (Score:2, Interesting)
At least partner up for the process of scanning even if they have different plans as to what to do with the scans
Libraries (Score:2, Interesting)
(1) The library paid for the copy you're borrowing. (Or somebody paid for it, in case the book was donated to the library.) Thus the author was paid for that copy. If you read a whole copyrighted book via a Content Display Site (CDS - Google Print, Amazon Search Inside, etc.) and never buy the book, the author wasn't paid. Copyright law is about creating new copies; you're not creating a new copy when you read in a store or from a library.
(2) Browsing in a bookstore is pretty inconvenient. You can't take the copy with you to look at any time you want. (Unless you buy it! That's sort of the point.) Bookstores know that few people really read entire books in the store -- else they'd go out of business. However, reading a book from a CDS doesn't have that limitation: You can take it with you, on your laptop, etc. This is particularly critical in light of digital paper, when the digital copy is the paper copy.
(3) Libraries and bookstore reading isn't anywhere near free: You have to move your physical body to the bookstore to read. For one thing, you can't likely do that at 3am. (And certainly not in your pajamas.) You can't do it from your bed, couch, or desk, without getting up. You have to spend time to move your body down there, which might be 10min-30min each way; 20-60min round trip, plus say 10min to find the book, a place to sit, etc; call it 30-70min. If you value your time at say, $10/hr, that's $5-12. Then there's the cost of transportation. If the library/bookstore is three miles away, 6mi. round trip, and gas costs $2.50/gal., and you get 20mi/gal., that's another $.75. The IRS figures driving a car costs $.405/mile in repairs, wearing it out, etc., so that's another $2.40. So you're at something like $8-15 to go read a "free" book.
Really -- if it were that free, people would do a lot more of it.
Yet reading a free copy from a CDS doesn't have those limitations. It is much closer to $0, actually and truly free. THAT's the problem.
(4) You can't pass on a "free" copy you read in the store or from the library. You have to leave the book at the bookstore (or buy it); you have to return the book to the library. Reading a book in digital form that was stolen from a CDS, you could pass that copy on to others by email, via a web page, P2P software, etc.
So, bottom line, bookstore/library reading isn't really free. CDS copies are essentially free, and that's the problem. They're too convenient to read free.
This is one of the reasons we formed the COCOA Association ( http://www.copyrightaccess.com/ [copyrightaccess.com] ), to make more copyrighted work available. (Note, COCOA does not inhibit indexing and searching and returning text snippet search results -- just what page images can be displayed.) If you support this, please sign our petition at http://www.petitiononline.com/cocoa/petition.html [petitiononline.com] -- thanks!
Dr. Andrew Burt,
Chair, The COCOA Association