Forgot your password?
typodupeerror
Books

25000 Books Proofread By Project Gutenberg Distributed Proofreaders 29

Posted by Unknown Lamer
from the get-your-free-knowledge dept.
New submitter fritsd writes "Project Gutenberg Distributed Proofreaders, a volunteer site which helps provide public domain books to Project Gutenberg, announced that their 100 000+ volunteers have reached the milestone of 25 000 books scanned, OCRed, and then meticulously proofread." The 25000th title is The Art and Practice of Silver Printing by Capt. Abney and H. P. Robinson.
This discussion has been archived. No new comments can be posted.

25000 Books Proofread By Project Gutenberg Distributed Proofreaders

Comments Filter:
  • If I'm not mistaken, they mean meticulously proofred by us in reCAPTCHAs. I think that was the organization that got into that a little.
    • No, this is more similar to GalaxyZoo approach, showing a page at a time and letting the proof-reader compare OCR and image side-by-side. See first link.

      The more interesting question is, will this serve as a test data set to improve OCRs?

    • by cruff (171569) on Wednesday April 10, 2013 @12:58PM (#43413469)

      If I'm not mistaken, they mean meticulously proofred by us in reCAPTCHAs.

      When I was proofreading on DP, all rounds of proofreading involved examining the scanned images and comparing it to the OCR text and making corrections. The later rounds of proofreading involved increasing attention to various details of correctness and formatting. All of this was done directly in the DP web interface. I didn't see any mention of the use of captchas in the OCR process.

      • Re: (Score:3, Informative)

        by Halotron1 (1604209)

        Yep, multiple rounds, and multiple levels of proofers and formatters
        who have to earn the right to access those higher rounds
        by completing hundreds of pages and passing a few tests.

        • by butalearner (1235200) on Wednesday April 10, 2013 @03:18PM (#43414959)

          I signed up and proofread a few pages when I saw someone mention this site in the comments a few weeks ago. It's pretty interesting stuff and is mostly intuitive, but there are some tricky corner cases, e.g. hyphenated words that span two lines. Back in the day, publishers were pretty inconsistent about what words were hyphenated (e.g. to-day), and Project Gutenberg is (rightly) adamant that the text maintains the original spelling and hyphenation.

          The only thing I completely missed was that I didn't put an extra newline at the top of the page when the first line was the start of a new paragraph. Those instances were found and corrected by the second-round proofreader. There is a third round of proofing, two rounds of formatting, two rounds of post processing, and then an optional "Smooth Reading" round that anyone can do. I've checked out a few of the finished products, and they are much, much better than the naked OCR'd texts of old.

          • by mspohr (589790) on Wednesday April 10, 2013 @06:22PM (#43416911)

            I have read quite a few of their books and have found them all to be high quality edits.
            I would like to thank everyone who has worked on the project for the excellent job they are doing.

            (In contrast, I recently purchased a Kindle copy of Paul Theroux's The Happy Isles of Oceania which is about 20 years old and they obviously produced the electronic copy by OCR and from the looks of it did little or no proofreading. There were obvious typos on every page. It's irritating that a publisher who actually get's paid to do this work can't be bothered to do even cursory proofreading.)

            Makes you appreciate the fine work the Gutenberg people are doing.

    • by wbr1 (2538558)
      Who cares. Captcha performas a needed service. I have seceral domains that would be over run by spambots if not for it. If it performs a secondary service, great.
    • by fatgraham (307614)

      Oh surely that's a genius CAPTCHA system!
      "Correct this flagged-as-wrong OCR text". OCR-bots would surely get it wrong, and the humans would contribute to the greater good!

      Of course this assumes the humans can spell and will do correct corrections. MayB Nt th3n!!11!.

  • Thanks! (Score:4, Informative)

    by Tim the Gecko (745081) on Wednesday April 10, 2013 @12:47PM (#43413309)
    Many thanks to Project Gutenberg and their volunteers. There is a lot of great public domain material out there, and I've especially enjoyed Dickens, Wilkie Collins and Trollope. Also Jules Verne's work is pretty good for French learners.
    • proof that socialism is a failure.
      • by Anonymous Coward

        proof that socialism is a failure.

        Volunteering || Charity != Socialism.

      • by chipschap (1444407) on Wednesday April 10, 2013 @01:43PM (#43414021)

        proof that socialism is a failure.

        Proof that people can in fact be decent, generous, and caring.

        • by Kjella (173770) on Wednesday April 10, 2013 @02:02PM (#43414267) Homepage

          Proof that people can in fact be decent, generous, and caring.

          Or bored. many years ago I had this temp job of staffing the front desk, really quite little traffic and the occasional call, collecting the mail and various other small duties but a lot of downtime and no interest in training me for more since it was a rather short contract. Project Gutenberg seemed like a good way to pass the time, and they were cool with it as long as I tended to my other duties when they needed tending. Seem like a better use of my time than playing solitaire.

      • Re: (Score:2, Troll)

        by GLMDesigns (2044134)
        Socialism is top-down government officials commanding individuals to obey the collective. Many socialists desire that people belong to the state.

        Melissa Harris-Perry ... professor at Tulane, has endorsed the concept of human ownership by the state ... saying in a promo for MSNBC that "we have to break through our kind of private idea that kids belong to their parents or kids belong to their families and recognize that kids belong to whole communities." http://news.investors.com/ibd-editorials/040913-65129 [investors.com]

        • To the person who moderated this post as "troll" Any particular reason why you so marked it that way? I directly answered the OPs post. Or, is it that your moderation is based on how you feel about the opinion stated?
      • by Anonymous Coward

        Proof that you are a moron: you don't know the difference between 'whose' and 'who is'.
        Or what socialism is.

        • Proof that you are a moron: you don't know the difference between 'whose' and 'who is'...

          That's why he's pissed at PD. They didn't like his work product.

    • by Anonymous Coward

      Jules Verne's work is awsome. I'm reading it now and learning french. And for you who are also learning, there are some good free audio books out there, e.g. http://www.litteratureaudio.com/

    • There is a lot of great public domain material out there

      So what happens once Project Gutenberg has finished releasing all notable books in the English language that were first published on or before 1922?

      • by slash.dt (701002)

        So what happens once Project Gutenberg has finished releasing all notable books in the English language that were first published on or before 1922?

        Since lots of things are in the public domain in other countries that are not in the PD in the US, maybe there could be a Project Gutenberg.uk ?

    • by Livius (318358)

      It's a fantastic contribution to human intellectual heritage. Once in digital form, it will be easy to make copies and ensure a high degree of redundancy so that this knowledge and culture will not be lost even if civilization suffers a setback.

    • Many thanks to Project Gutenberg and their volunteers.

      Also many thanks to Michael Hart [wikipedia.org], the founder, heart, and soul of Project Gutenberg. Michael passed away in 2011. Although I never met him face-to-face, we exchanged many emails, and even spoke on the phone a few times. He was a generous and selfless man, and somewhat eccentric (but in a good way). We love you Michael, and we miss you. You made the world a better place.

  • by Anonymous Coward

    I'm glad Mr. Guttenberg [imdb.com] is doing something with his time and money with such a noble project as this. I guess it makes up for the Police Academy movies [imdb.com] he did.

    Now if I only read books.

If a 6600 used paper tape instead of core memory, it would use up tape at about 30 miles/second. -- Grishman, Assembly Language Programming

Working...