Human-Powered Internet Archive Book Project 113
Carl Bialik from the WSJ writes "A group led by the Internet Archive is planning a massive, ambitious effort to scan millions of old books and make them available for Web searching early next year. Behind that effort are about a dozen scanners, employees making about $10 an hour to manually scan volumes -- some more than a century old -- one page at a time, on special contraptions. The Wall Street Journal Online visits a University of Toronto library to watch one of the scanners in action: 25-year-old Liz Ridolfo."
Diffrent? (Score:1)
Re:Diffrent? (Score:4, Insightful)
anyone can do 'a christmas carol' because it's copyright has expired..
using however, someones PRECISE arangement of the text is not permissible however- that has it's own copyright...
so if I buy a current day copy from amazon, I cant scan it in... but if I buy a copy that's last edition/print was more than 75 years ago, it is fair game.
Re:Diffrent? (Score:1, Informative)
bullshit
Re:Diffrent? (Score:2)
I have to make my own assemblage of the story.. and sell that,
Re:Diffrent? (Score:3, Insightful)
If the work in question is under copyright, you can't copy and redistribute it; if it's not, then you can. The only exceptions would be the fair use provisions, and I don't think that they would cover you reproducing an entire book, even if it was for non-commercial use: if you're a university professor you can't copy an entire textbook and give them out to your students. That's a non-commerical use, but it's still illegal. There might be some exception
Re:Diffrent? (Score:2)
http://www.gutenberg.org/faq/C-16 [gutenberg.org]
best commet ever! (Score:4, Funny)
I too want to be modded Insightful!
Exactly. (Score:2)
You could not, however, scan the book and distribute the images of the pages. Because although the original author's text is not under copyright prot
Re:Diffrent? (Score:1)
That is true in the UK and Commonwealth countries, but not in the U.S., so far as I can tell. The UK has something called "typographical arrangement copyright" which is what you were referring to. This lasts for 25 years, independent of any copyright of the text itself.
The U.S., however, has no explicit equivalent stated in its copyright laws. I suppose one might make a claim that normal cop
Re:Diffrent? (Score:2)
There probably aren't any. Copyrights do expire.
You're right AND wrong - copyrights don't expire (Score:2)
But that's not true anymore. Currently copyrights have an expiration date, but the expiration date has consistently gotten farther away faster than it has gotten closer.
Essentially NOTHING expires now unless somebody didn't do their paperwork.
Tha
Re:Diffrent? (Score:1, Interesting)
Re:Diffrent? (Score:4, Informative)
The Open Content Alliance is a consortium of non-profit and for-profit groups which is dedicated to building a free archive of digital text and multimedia. It was conceived in 2005 by Yahoo and the Internet Archive. It was conceived in response to Google Print's closed nature, and aims to keep public domain works in the public domain on-line. These results will then be used in the search results of participating search engines. You can see a sample of the open content at openlibrary.org
A large difference between the OCA's approach and that of Google Print is that the OCA intends to ask a copyright holder before digitising a work that is still under copyright, while Google Print will digitise any book unless explicitly told not to do so by November 1, 2005.
So, Google Print will almost certainly be better when searching for copyrighted material. For public domain works, we'll have to wait and see.
IMHO, it seems like a little cooperation here would make a lot of sense for both parties - they could save money trading digital copies 1-for-1 while remaining in (healthy) competition.
Re:Diffrent? (Score:2)
This is very true. However you see this sort of thing in a lot of emerging industries -- two competitors will duplicate each other's work until eventually one defeats the other in the marketplace and buys up their work at fire-sale prices. As long as either one thinks that they can "win," there's little incentive to help.
Too
Re:Diffrent? (Score:5, Informative)
Re:Diffrent? (Score:2, Interesting)
At least partner up for the process of scanning even if they have different plans as to what to do with the scans
Contributing to Gutenberg (Score:2, Interesting)
Sorta. (Score:5, Informative)
If you look at the current books on Distributed Proofreaders [pgdp.net], you'll see that some of them credit the Million Books Project for the page scans.
Re:Sorta. (Score:1, Insightful)
*Until advertisement factors in. Advertisement ALWAYS factors in...
Re:Sorta. (Score:2)
Internet Archive doesn't have to specifically give them the images though.
OCA and PG scratching each others' backs (Score:3, Insightful)
The focuses of OCA and PG are really quite different: PG is most interested in preserving the essential information of a book (ie, its text), while OCA's interest is in preserving the form of the book (ie, its fonts, pages format, coloration, even down to the yellowing of the pages). That having been said, there's a lot each can do for the other (and has!).
The Archive has archived most of PG's material, because even though the Books department of The Archive is focussed mostly on preserving books, The Ar
Re:Contributing to Gutenberg (Score:5, Informative)
I maintain several lists that show the DP harvesting status of several image collections, including The Internet Archive's Canadian Libraries collection [ntlworld.com], Google Print [ntlworld.com], and Early Canadiana Online [ntlworld.com]. As you can see, we will not be running short of material to work on for a very long time, even without any of these recently announced initiatives. That said, it's always great to see more material be made freely available, rather than locked up behind expensive subscription services like Jstor and EEBO.
It's lighter! (Score:4, Interesting)
All though anything useful has to be illegal...
Re:It's lighter! (Score:4, Insightful)
More correct than you may know about illegal.... (Score:1)
It is based on old teachings and books that were banned for many reasons.
Somebody felt threatened...for some reason!!!
I say, "let me have the access to everything"
and I am a big girl and can make up my own mind!
Pat
Re:Why not join the Gutenberg Project (Score:4, Informative)
Project Gutenberg and the Open Content Alliance are working on two slightly different things:
The OCA is making available the images of scanned pages. That's fine for reading an entire book, but you can't search it, nor copy a section of text into a document of your own.
Project Gutenberg makes available plain text, usually illustrated HTML, and occasionally other versions, of public domain books, which can be used by anyone for no cost.
If you'd like to help prepare public domain ebooks, visit Distributed Proofreaders [pgdp.net] and proofread a page a day (or more!).
Re:Why not join the Gutenberg Project (Score:1)
So, as the summary states:
make them available for Web searching
does not mean that there will be a complete text index available (that is full text search,) but instead you can only search for specific works?
If you'd like to help prepare public domain ebooks, visit Distributed Proofreaders and proofread a page a day (or more!).
I do that every once in a while on their German counterpart: GaGa
Re:Why not join the Gutenberg Project (Score:4, Insightful)
That probably means that the search index will be uncorrected OCR, which leads to some inaccurate searches. The problem with using raw OCR is scannos, words that may be recognised as a different word that "looks" the same, for example modem and modern [google.com], or an i might be recognised as a slash [google.com].
Your time might be better spent at the real Distributed Proofreaders [pgdp.net], or DP-Europe [rastko.net], since Projekt Gutenberg-DE is not an offical branch of PG, and actually copyrights its output (unlike the real PG).
Re:Why not join the Gutenberg Project (Score:1)
Thanks. I hadn't paid attention to that detail. Damn it! You just can't trust people anymore. I feel raped. I'll go home now...
Re:Why not join the Gutenberg Project (Score:2, Interesting)
Re:Why not join the Gutenberg Project (Score:1, Informative)
Not google (Score:1)
Can only be a good thing (Score:2, Interesting)
In Stanislaw Lem's science fiction book "Memoirs Found in a Bathtub", all the paper in the world gets eaten by a virus and chaos ensues. Interesting read if you've missed it, has made me paranoid about how much the world still depends on paper.
Re:Can only be a good thing (Score:2)
Preservation?
Do you really think your magnetic/optical/flash/etc storage will last as long as printed paper...even assuming you can find a CD reader in 50 years? Maybe you mean to recopy the data every few years, but if something gets lost for a few decades, it's lost for ever.
Re:Can only be a good thing (Score:3, Informative)
That is called periodic storage, and for anything you wish to preserve, it is necessary. You're argument is a bit weak, considering that any information in book or electronic format needs to be recopied periodically. Books need to be done so less then electronic copies, however electronic copies are cheaper and easier to store, which offsets the costs.
The OP wasn't saying to burn the paper books after their stored, merely to put them in electronic format AS
Hey there... (Score:4, Funny)
Please email me at superdesperateteengeek@needtogetlaid.net
Re:Hey there... (Score:2, Insightful)
No, I think that actually has a leg up on this comment.
Re:Hey there... (Score:1)
====
"Sexy? What's wrong with being sexy?" -- Spinal Tap
Good Bad Ugly (Score:5, Insightful)
Old books prior to copyright laws are being scanned.
The bad:
Pay is roughly $10/hr. Now, I happen to be concerned that someone being paid so little should be handling rare books. Not to mention the college graduate getting paid so little.
The ugly:
The digital camera contraption costs $30,000!! There's a few scanner manufacturers left in the world and none of them have exploited this niche. Shame on them.
Re:Good Bad Ugly (Score:3, Funny)
May we assume that you will therefore be donating additional funds, up to the level of your concern or the amount you can afford (whichever is less)?
Re:Good Bad Ugly (Score:1)
I would tend to think this is a good thing. It means that the people doing it aren't neccesarily in it for the money. Being paid by the hour also gives them an incentive to take their time about it. ;)
As long as the people hired are screened for at least a medium-high level of respect for old books, I don't see a problem here.
Re:Good Bad Ugly (Score:3, Informative)
20000 dollars, 40-50 weeks a year, 40-50 hours a week
yep, that's 10 dollars an hour...
Does that mean all the PHD students should be kicked out of their labs and shouldn't be able to handle expensive books?
Re:Good Bad Ugly (Score:4, Informative)
Actually, you can buy a robotic book scanner [kirtas-tech.com] (there's a demo video of it). No doubt it costs an arm and a leg although it may be worth it if you're scanning a large enough volume of books.
Re:Good Bad Ugly (Score:2)
Re:no. (Score:1)
Scanner: I want. (Score:3, Interesting)
The obvious advantage of this rig is that you don't have to open the spine 180 degrees and smash the books flat onto a single glass plane, you don't have to open the book up more than 90 degrees, so it's gentle on the spine of fragile old books. And the glass wedge is always self-centering against the spine of the book. The only way this scheme could work better is if there was a way to turn the pages automatically. But these are old and presumably valuable works, safer to let paid low-wage drones to do the work than risk mechanical damage.
Book Scanners (Score:3, Interesting)
Re: Hmm... (Score:1)
No, wait...
More people doing a good thing is good.
Re:Hmm... (Score:2)
Full text for copyright lapsed works? (Score:2)
Personally I've read lots of old science fiction from copyright lapsed works, there is some in Gutenberg, and like it quite a bit, though I'd like to find more of them.
For example I'm looking for Perry
Re:Full text for copyright lapsed works? (Score:2)
Will it automatically provide full text or scanned image files for works that have gone out of copyright?
If by "automatic" you mean after it's been scanned by someone, the images processed, placed onto the server and put into the system. Then yes, it will; automatically provide the scanned image files.
nd do the restrictions against scanning , storage or reproduction also lapse when copyright lapses?
Yes, because it becomes a public domain work. You can do anything (from publishing it unchanged, creating y
Kirtas automatic book scanner (Score:1)
Manual seems safer to me.... (Score:3, Interesting)
Interesting - I don't understand your line of thinking - interested to hear more. Is the argument that automated page turning is *cheaper* so it's a pity that the project spends a lot on labour charges (manual scanning)? Or is the argument that the automated page turning is easier on the fragile old books? I'd appreciate if you could offer more details about the technology - the company's demo video shows a vacuum device lifting pages, but both examples are with
Re:Manual seems safer to me.... (Score:1)
Printer Friendly Link (Score:2)
http://online.wsj.com/public/article_print/SB11311 1987803688478-VNpw62xi_JA4avE8cxOZf0pf_nM_20061109
And of course, direct linkage to the picture of the girl.
Because that's the only reason 90% of you would click on the link anyways
The Girl Has Nice Shoes [wsj.com]
As an aside, cigarettes + old books = bad
"This book almost killed me," Ms. Ridolfo said to her boss, Gabe Juszel, who was preoccupied with a stack of books and di
Re:Printer Friendly Link (Score:2)
Unless the 'girl' is Lauren Bacall [ntropie.de].
Re:Printer Friendly Link (Score:2)
Oh, well
Re:Printer Friendly Link (Score:1)
Libraries? (Score:1)
Re:Libraries? (Score:2)
Libraries (Score:2, Interesting)
(1) The library paid for the copy you're borrowing. (Or somebody paid for it, in case the book was donated to the library.) Thus the author was paid for that copy. If you read a whole copyrighted book via a Content Display Site (CDS - Google Print, Amazon Search Inside, etc.) and never buy the book, the author wasn't paid. Copyright law is about creating new copies; you're not creating a new copy when you read i
Re:RTFA? (Score:2)
Why is this off topic? Seems perfectly reasonable to me. I actually had the same thought.
Re:RTFA? (Score:3, Interesting)
This is when the 'remove this object' firefox extension [mozilla.org] comes in handy. Just remove the image and the text is readable. 'Undo last remove' to get the image back.
I don't think you should have been modded down.
How can I help? (Score:1, Interesting)
Re:How can I help? (Score:3, Informative)
As a few others have mentioned, jump in to Distributed Proofreaders [pgdp.net]. We take the raw images (either scanned specifically for DP or taken from scanning projects like this) and produce checked, corrected text, which then goes to Project Gutenberg [gutenberg.org]. A few hours a week can help a lot.
$10/hr for scanning books? (Score:2)
Send the scans to india or eastern europe to be scanned for a fraction of the price. I mean really. This seems to be a serious operation - why not maximize the use of available resources? Spending $10/hr on scanning is just dumb.
Re:$10/hr for scanning books? (Score:2)
Re:$10/hr for scanning books? (Score:2)
I am not saying send them to a random person with a scanner. However, this can be done competently.
Re:$10/hr for scanning books? (Score:2)
Re:$10/hr for scanning books? (Score:1)
You're right, there is plenty of skilled US citizens that work for $10/hr.
Re:$10/hr for scanning books? (Score:2)
There ARE plenty of (reasonably) skilled US citizens, in the form of college students, willing to work for $10 an hour. I'm sure you wouldn't have any trouble finding people to work the scanner at that rate at any large university. Especially if you offered flexible/non-daytime hours: the most popular campus jobs in my experience were always the ones that you could work in the evening or at night.
In fact I'm a little disappointed that
Re:$10/hr for scanning books? (Score:2)
Over a century old... (Score:3, Funny)
employees making about $10 an hour to manually scan volumes -- some more than a century old
I think that if they hired younger people to scan the books, it might go a little faster.
Imagine a 100 year old at this job...
"...(mumble mumble) in my day we used priests to copy books (mumble mumble) oh dear, I tore another page, darn Parkinson (mumble mumble)"
When I read "human-powered"... (Score:1)
Darn expensive, machines win (Score:2)
On the other hand, the whole project is funded by Microsoft and Yahoo, which creates the usual good (open content!) / evil (paid for with the devil's money!) dilemma.
That's enough coffee for me, I suppose...
Scanning with precision is difficult (Score:2, Informative)
The (Jack) Vance Integral Edition [vanceintegral.com] was a volunteer effort to produce a limited edition 42 volume set of the complete works of Jack Vance, restored to as close to the author's original manuscripts as possible.
(The project is complete, and an amazing success.)
The team scanned and edited many of Jack's early works for which there was no good clean manuscript. They developed software tools that would compare scans from different editions to automatically find errors. It turns out that even the best human edi
"some more than a century old..." (Score:1, Redundant)
mark
Re:"some more than a century old..." (Score:2)
I do hope they're not duplicating efforts... and whether they even know about Project Gutenberg. http://www.promo.net/pg/ [promo.net]>
I expect their scans to be sumbitted to it.
$11 - $12 per hour (Score:1)
What would be the equivale
Re:$11 - $12 per hour (Score:2, Funny)
Probably about $35 an hour, they'd only work seven hours, three days a week, and they'd be on strike half the year anyway. And you can't fire any of them.
Urgh... Sounds cruel... (Score:2)
Bible (Score:1)
Re:Fp/Google (Score:1)
Re:Fp/Google (Score:2)
Re:Fp/Google (Score:1)
Re:Fp/Google (Score:1, Funny)
You must be new here ;)
Re:http://www.digg.com (Score:1, Troll)
Too bad the comments on Digg make this place look like a scholar's retreat.