25000 Books Proofread By Project Gutenberg Distributed Proofreaders 29
New submitter fritsd writes "Project Gutenberg Distributed Proofreaders, a volunteer site which helps provide public domain books to Project Gutenberg, announced that their 100 000+ volunteers have reached the milestone of 25 000 books scanned, OCRed, and then meticulously proofread."
The 25000th title is The Art and Practice of Silver Printing by Capt. Abney and H. P. Robinson.
meticulously proofread (Score:2)
Re: (Score:2)
No, this is more similar to GalaxyZoo approach, showing a page at a time and letting the proof-reader compare OCR and image side-by-side. See first link.
The more interesting question is, will this serve as a test data set to improve OCRs?
Re:meticulously proofread (Score:4, Insightful)
If I'm not mistaken, they mean meticulously proofred by us in reCAPTCHAs.
When I was proofreading on DP, all rounds of proofreading involved examining the scanned images and comparing it to the OCR text and making corrections. The later rounds of proofreading involved increasing attention to various details of correctness and formatting. All of this was done directly in the DP web interface. I didn't see any mention of the use of captchas in the OCR process.
Re: (Score:3, Informative)
Yep, multiple rounds, and multiple levels of proofers and formatters
who have to earn the right to access those higher rounds
by completing hundreds of pages and passing a few tests.
Re:meticulously proofread (Score:5, Informative)
I signed up and proofread a few pages when I saw someone mention this site in the comments a few weeks ago. It's pretty interesting stuff and is mostly intuitive, but there are some tricky corner cases, e.g. hyphenated words that span two lines. Back in the day, publishers were pretty inconsistent about what words were hyphenated (e.g. to-day), and Project Gutenberg is (rightly) adamant that the text maintains the original spelling and hyphenation.
The only thing I completely missed was that I didn't put an extra newline at the top of the page when the first line was the start of a new paragraph. Those instances were found and corrected by the second-round proofreader. There is a third round of proofing, two rounds of formatting, two rounds of post processing, and then an optional "Smooth Reading" round that anyone can do. I've checked out a few of the finished products, and they are much, much better than the naked OCR'd texts of old.
Re:meticulously proofread (Score:4, Informative)
I have read quite a few of their books and have found them all to be high quality edits.
I would like to thank everyone who has worked on the project for the excellent job they are doing.
(In contrast, I recently purchased a Kindle copy of Paul Theroux's The Happy Isles of Oceania which is about 20 years old and they obviously produced the electronic copy by OCR and from the looks of it did little or no proofreading. There were obvious typos on every page. It's irritating that a publisher who actually get's paid to do this work can't be bothered to do even cursory proofreading.)
Makes you appreciate the fine work the Gutenberg people are doing.
Re: (Score:2)
Re: (Score:2)
Oh surely that's a genius CAPTCHA system!
"Correct this flagged-as-wrong OCR text". OCR-bots would surely get it wrong, and the humans would contribute to the greater good!
Of course this assumes the humans can spell and will do correct corrections. MayB Nt th3n!!11!.
Thanks! (Score:4, Informative)
whose paying these guys? (Score:2)
Re: (Score:1)
proof that socialism is a failure.
Volunteering || Charity != Socialism.
Re:whose paying these guys? (Score:4, Insightful)
proof that socialism is a failure.
Proof that people can in fact be decent, generous, and caring.
Re:whose paying these guys? (Score:4, Interesting)
Proof that people can in fact be decent, generous, and caring.
Or bored. many years ago I had this temp job of staffing the front desk, really quite little traffic and the occasional call, collecting the mail and various other small duties but a lot of downtime and no interest in training me for more since it was a rather short contract. Project Gutenberg seemed like a good way to pass the time, and they were cool with it as long as I tended to my other duties when they needed tending. Seem like a better use of my time than playing solitaire.
Re: (Score:2, Troll)
Re: (Score:2)
Re: (Score:1)
Proof that you are a moron: you don't know the difference between 'whose' and 'who is'.
Or what socialism is.
Re: (Score:2)
Proof that you are a moron: you don't know the difference between 'whose' and 'who is'...
That's why he's pissed at PD. They didn't like his work product.
Re: (Score:2)
Jules Verne's work is awsome. I'm reading it now and learning french. And for you who are also learning, there are some good free audio books out there, e.g. http://www.litteratureaudio.com/
Re: (Score:2)
Jules Verne's work is awsome. I'm reading it now and learning french. And for you who are also learning, there are some good free audio books out there, e.g. http://www.litteratureaudio.com/ [litteratureaudio.com]
Also librivox.org has some good French content. "Ezwa" has a great reading voice and does many of the chapters in this book - http://librivox.org/le-tour-du-monde-en-quatre-vingts-jours-by-jules-verne/ [librivox.org]
Hitting the Sonny Bono wall (Score:3)
There is a lot of great public domain material out there
So what happens once Project Gutenberg has finished releasing all notable books in the English language that were first published on or before 1922?
Re: (Score:3)
So what happens once Project Gutenberg has finished releasing all notable books in the English language that were first published on or before 1922?
Since lots of things are in the public domain in other countries that are not in the PD in the US, maybe there could be a Project Gutenberg.uk ?
Re: (Score:3)
It's a fantastic contribution to human intellectual heritage. Once in digital form, it will be easy to make copies and ensure a high degree of redundancy so that this knowledge and culture will not be lost even if civilization suffers a setback.
Michael Hart (Score:3)
Many thanks to Project Gutenberg and their volunteers.
Also many thanks to Michael Hart [wikipedia.org], the founder, heart, and soul of Project Gutenberg. Michael passed away in 2011. Although I never met him face-to-face, we exchanged many emails, and even spoke on the phone a few times. He was a generous and selfless man, and somewhat eccentric (but in a good way). We love you Michael, and we miss you. You made the world a better place.
I'm glad he's doing someting with his time, (Score:2, Funny)
I'm glad Mr. Guttenberg [imdb.com] is doing something with his time and money with such a noble project as this. I guess it makes up for the Police Academy movies [imdb.com] he did.
Now if I only read books.
Re: (Score:2)