Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Books Media Education

Project Gutenberg Made Accessible 214

scishop writes "Mazarin is an open-source interface to Project Gutenberg's library. Mazarin increases the accessibility of Gutenberg's 10,000+ books as it formats the books for HTML display -- providing paginations in addition to generating table of contents and other advanced markup features -- along with enabling users to carry out full-text searches on the entire library."
This discussion has been archived. No new comments can be posted.

Project Gutenberg Made Accessible

Comments Filter:
  • Tested (Score:4, Interesting)

    by mpost4 ( 115369 ) * on Monday May 24, 2004 @09:16AM (#9236916) Homepage Journal
    I can not test the claim of all 10k works, but I tested what I thought would be most likely to be left out, and I found that they were there.

    I Tested Martin Luther.

    (if it was not for the printing press the reformation would not have been as sucsessfull as it was)
    • Re:Tested (Score:2, Funny)

      by Anonymous Coward
      What I love is the home page that says:

      "All contents copyright 2004 Scott Fortmann-Roe."

      Yeah, sure.
    • OK, the Mazarin site is dead, but if it really contains the whole of Project Gutenberg [gutenberg.net], then they have 11 books by Martin Luther. Just try this search [gutenberg.net].
    • And that is why we are slashdotting it?

      At least thats my experience after "testing" it now.

  • Looks nice and dandy (Score:3, Interesting)

    by tfbastard ( 782237 ) on Monday May 24, 2004 @09:16AM (#9236922)
    But did they have to make the tutorial presentation a fullscreen flash file?
  • PG (Score:5, Informative)

    by ArbiterOne ( 715233 ) on Monday May 24, 2004 @09:17AM (#9236931) Homepage
    Most of PG's more well-knownalready are formatted into HTML. [promo.net]
    • Re:PG (Score:5, Informative)

      by Charles Franks ( 686911 ) on Monday May 24, 2004 @09:32AM (#9237052)
      The promo.net address is an old one and no longer maintained, please reference gutenberg.net [gutenberg.net]

      Charles Franks
      Founder, Distributed Proofreaders [pgdp.net]

    • Ahh. Thanks. That makes putting books into my palm via Plucker, more attractive. I hate having to have all these different readers for different books. The more I can fit into Plucker, the happier I am.

      Thanks.
    • Re:PG (Score:5, Informative)

      by flimnap ( 751001 ) on Monday May 24, 2004 @09:57AM (#9237280) Homepage

      Indeed, there are many, many sites that do all sorts of wonderful [blackmask.com] things [pluckerbooks.com] with Project Gutenberg eBooks. That's the wonderful thing about PG, you can do anything you like with the books.

      While personally I prefer the original and the best [gutenberg.net]... hey, whatever floats your boat!

      It is very much worth noting that Project Gutenberg would have nowhere near as many eBooks as it does without the help of Distributed Proofreaders [pgdp.net]. Sign up there, and proof just a page a day to make your contribution to preserving literary history. You can proofread as little or as much as you like, and do something worthwhile! Distributed Proofreaders [pgdp.net] is a great way to spend some of your time.

  • by alexatrit ( 689331 ) on Monday May 24, 2004 @09:19AM (#9236941) Homepage
    I searched on "oil" and came up with numerous passages from various versions of the Bible, and a few recipes from an Italian cookbook. Attempted to search again, but amazingly the site fails to respond...
  • P2P / Library (Score:5, Interesting)

    by Anonymous Coward on Monday May 24, 2004 @09:19AM (#9236946)
    Interesting idea, I can't get to the website but a feature I'd want is the content shared P2P so you don't have to rely on a central server for the content.

    A central webpage index could just have ed2k links to the files: sharereactor for books. When they update the book they release a new hash-link and the file onto the network.

    It being P2P it could open it up to more then just public domain books too ;).
  • by twoshortplanks ( 124523 ) on Monday May 24, 2004 @09:21AM (#9236963) Homepage
    Hmm, nicely formatted error messages. Does anyone know what this is? I'm assuming it's a mod_perl handler of some sort.
  • I think the site is about to go down, it's already terribly slow...
  • by Ronald Dumsfeld ( 723277 ) on Monday May 24, 2004 @09:25AM (#9236999)
    10,000+ books. Right, so I've got to read all of them before I can post a comment?

    Oh wait, this is Slashdot.
  • by Anonymous Coward on Monday May 24, 2004 @09:29AM (#9237035)
    This sounds like it just adds complexity and does not make gutenberg's data accessible.

    There were several research projects for which I used pg [slashdot.org] as a corpus. However, pg's a terrible hassle for the first-time researcher, since the format of the introductory text ("we're gutenberg, here's the copyright, blah blah") is inconsistent.

    You have to remove the introductory text to avoid bias in the corpus, however there are so many pathological special cases (different formats, spelling, languages, words used, punctuation, case) that it requires several hours of Perl coding to successfully strip the header text from 75% of the documents with >99% accuracy. Yuk.

    If gutenberg is serious about making their work more accessible, they should think about the simple concern of ensuring consistency in the header text format.
    • They're meant to be accessible to people who want to read old texts, not specifically to people who want to do some sort of arcane text processing on them.
    • If gutenberg is serious about making their work more accessible, they should think about the simple concern of ensuring consistency in the header text format.

      It's part of the big plan to convert everything over to XML. But this really isn't a big concern of Gutenberg, as most people don't use the entire archive as a corpus, and removing the headers and footers from even a couple dozen texts is no big deal.
    • it requires several hours of Perl coding to successfully strip the header text from 75% of the documents with >99% accuracy
      If you haven't already, have you considered publishing your perl script(s)? Something that took hours oughtn't be reinvented.
  • Text version (Score:5, Informative)

    by Whitecloud ( 649593 ) on Monday May 24, 2004 @09:30AM (#9237045) Homepage

    since some seem to have trouble on the index page... here it is:

    Project Gutenberg is the brainchild of Michael Hart [slashdot.org], who in 1971 decided that it would be a really good idea if lots of famous and important texts were freely available to everyone in the world. Since then, he has been joined by hundreds of volunteers who share his vision.
    Now, more than thirty years later, Project Gutenberg has the following figures (as of November 8th 2002): 203 New eBooks released during October 2002, 1975 New eBooks produced in 2002 (they were 1240 in 2001) for a total of 6267 Total Project Gutenberg eBooks. 119 eBooks have been posted so far by Project Gutenberg of Australia [promo.net].

    Click here [promo.net] for the full PG story and here [gutenberg.net] for the latest News [gutenberg.net], and learn about the Stockholm Challenge Award [promo.net] recently won by Project Gutenberg in the category Culture.

    The key link is search page [promo.net].

  • by GGardner ( 97375 ) on Monday May 24, 2004 @09:32AM (#9237062)
    What's the best way to read online texts? There are a bunch of PG texts I might like to read, but reading them in a web browser, as a big text file gets tiring after ten minutes or so. I'm not sure why I can read a book for hours, but the screen for minutes, but there you have it. I don't think that HTML will help this problem -- does anyone have recommendations for better ways to read these files?
    • by gunne ( 14408 ) on Monday May 24, 2004 @09:39AM (#9237112) Homepage
      If you have a palm pilot, i can recommend Weasel Reader [sourceforge.net].
      I've been using it for a couple of years on my Palm V, and despite its small screen size it works perfectly for reading ebooks.
    • does anyone have recommendations for better ways to read these files?

      On an old palm pilot or in the notes folder on an ipod. I found that it's the backlight of a computer screen (and on the new palms) that is what hurts my eyes when trying to read.


      -Colin [colingregorypalmer.net]
    • What works best for me is any text-editor/word processor. I delete line by line or paragraph by paragraph as I have read them. Don't know why I feel that is comfortable, but it is.

      (Keep a backup of the original in case you want to check again what the name of the butler's niece was.)

    • I don't enjoy curling up on the sofa with my computer to read a novel. It isn't the warmest way to spend a few relaxing hours with a book.

      It is however a great way to research the classics for info and reports.

      I still like to hunt around old bookshops, and often I can find those works for a buck or two.

      • It is however a great way to research the classics for info and reports.

        That's quite a failure on some levels, if that's all we're doing. One of my personal favorite authors in PG, J. S. Fletcher, is never going to be considered part of the "classics". But he's a nice read for mystery lovers who like Victorian London.

        I still like to hunt around old bookshops, and often I can find those works for a buck or two.

        Which books? Some of our books can not be found that cheap and many of the ones which can, mi
    • The last time I had to read something off of PG, I made a local webpage with an embedded frame, and used a little bit of scripting to allow me to change the colors and fonts and such occassionally. So I'd start with a dark background and light foreground, and when my eyes got tired of that I'd change the contrast. I usually just stuck with the same font, though larger. Surprisingly enough it made it easier for me to get through it. Unfortunately, I have no idea what I did with that page - probably trash
    • Support HP and Lexmark, print it. Unfortunately, I have never found any electronic reading medium that compares with a paper book. In my experience, there's no way you can lie down in a couch or bed and have the same experience with a computer as you can have with a book. I have used my very lightweight Sony Vaio, but it still generates a lot of heat, and has a ridiculous 1024x768 resolution.
    • I prefer to make them into .pdfs and read those in Adobe Acrobat Reader on a pen slate (Fujitsu Stylistic).

      I've one example up in my portfolio, http://members.aol.com/willadams (it's also in the TeX Showcase, http://www.tug.org/texshowcase ), Okakura Kakuzo's _The Book of Tea_ --- got the text from PG, set it, made some corrections (I've a (letterpress) printed copy in its slip case at home), sent those to PG (took two tries, but they finally accepted and applid most of them), and printed and bound a copy
  • by Lord Zerrr ( 237123 ) on Monday May 24, 2004 @09:34AM (#9237069)
    I love sexy robot voice tutorials! mazarin tutorial [palary.org]
  • This is what I get when I visit the Mazarin link:

    error: Can't call method "prepare" on an undefined value at /usr/lib/perl5/5.8.3/BookTools/Translator.pm line 20.

    context: ...
    16: my $dbh=DatabaseConnect("translations");
    17:
    18:&nb sp; sub Prepare{
    19: $dbh=DatabaseConnect("translations");
    20: return $dbh->prepare($_[0])
    21: or die "Couldn't prepare statement: " . $dbh->errstr;
    22: }
    23:
    24: sub SetLanguage{
    ...

    code stack: /usr/lib/perl5/5.8.3/BookTools/Translator.pm:20
    / usr/lib/perl5/5.8.3/BookTool

    • The 'DatabaseConnect' function didn't return anything.

      Not a big deal, really, but they probably should have trapped that, as it could happen for any number of reasons (database down, authentication failed, etc).

      I find that I'm getting much slower when I write programs these days -- because I'm checking errors for those things that I would've just blown off, or not have thought about in my earlier days.

      [there's a few different things that could be done to this -- but I don't know why they're calling Datab
  • by Leobinus ( 782479 ) on Monday May 24, 2004 @09:43AM (#9237150)

    Bah. Posting HTML is so 1996. You can do so much more with these texts. One example is Open Source Shakespeare [opensource...speare.org], which takes all of Shakespeare's texts, indexes them, presents them in an attractive manner, creates a concordance, provides a full-text search engine, organizes the lines by character, etc.

    All of the texts are open source, and you can download the database and source code from the site, too. Check it out.

    • Actually, this could be useful for some of the dedicated eBook readers. I personally have the Rocket eBook reader (it's been in a box for the last 2 years, but I still have it), and their software allows you to convert HTML to their eBook format. This is much more convenient due to the Table of Contents markup and (typically) non-fixed width text formatting. Without those two features, reading an eBook on the thing becomes much, much harder. Otherwise, tou have to do a text search for chapter titles and
    • And what's not HTML about that site?

      HTML is great stuff. Please don't bash it senselessly.
  • Bam (Score:4, Funny)

    by kunudo ( 773239 ) on Monday May 24, 2004 @09:45AM (#9237167)
    Monday May 24, @03:14PM : Project Gutenberg made accessible
    Monday May 24, @03:15PM : Project Gutenberg made inaccessible
  • Slashdot'd (Score:4, Funny)

    by Bipedismaximus ( 713734 ) on Monday May 24, 2004 @09:51AM (#9237217)
    "Project Gutenberg Made Accessible"

    Oh, the irony that is slashdot.

  • Mazarin increases the accessibility of Gutenberg's 10,000+ books

    In a related story, the Slashdot effect decreases the accessibility of Gutenberg's 10,000+ book.
  • by HiyaPower ( 131263 ) on Monday May 24, 2004 @10:06AM (#9237358)
    Gotta turn a living you know...
  • Too bad the site couldn't hold up, I really wanted to see my contribution
    http://www.gutenberg.net/etext04/awbv110.txt [gutenberg.net]
    there in HTML.

    The first volume was converted to HTML by hand by someone else and to pdf, by machine, I think, whereas my site simply has the e-text:
    http://rjs.org/gutenberg/Stevens_Thomas/ [rjs.org]
    So an automated process would be a boon. What I'd really like to see is an OS text-to-voice reader program. I wrote a wxPython program to assist conversion from scanned text to PG format: http://r [rjs.org]
  • Gutenberg, Google (Score:2, Interesting)

    by Anonymous Coward
    Wouldn't it be great if Google were involved in Gutenberg in a major way?
  • Gutenberg Disclaimer (Score:5, Interesting)

    by Twinky ( 32219 ) on Monday May 24, 2004 @10:34AM (#9237640)
    What always struck me as odd is the enourmous length of the disclaimer that Project Gutenberg attaches to every text. To me it seems to be the most obvious sign of a law system that is ridiculously screwed. No book I ever read had a legal statement like this.

    Quote:

    LIMITED WARRANTY; DISCLAIMER OF DAMAGES But for the "Right of Replacement or Refund" described below, [1] the Project (and any other party you may receive this etext from as a PROJECT GUTENBERG-tm etext) disclaims all liability to you for damages, costs and expenses, including legal fees, and [2] YOU HAVE NO REMEDIES FOR NEGLIGENCE OR UNDER STRICT LIABILITY, OR FOR BREACH OF WARRANTY OR CONTRACT, INCLUDING BUT NOT LIMITED TO INDIRECT, CONSEQUENTIAL, PUNITIVE OR INCIDENTAL DAMAGES, EVEN IF YOU GIVE NOTICE OF THE POSSIBILITY OF SUCH DAMAGES. If you discover a Defect in this etext within 90 days of receiving it, you can receive a refund of the money (if any) you paid for it by sending an explanatory note within that time to the person you received it from. If you received it on a physical medium, you must return it with your note, and such person may choose to alternatively give you a replacement copy. If you received it electronically, such person may choose to alternatively give you a second opportunity to receive it electronically. THIS ETEXT IS OTHERWISE PROVIDED TO YOU "AS-IS". NO OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED, ARE MADE TO YOU AS TO THE ETEXT OR ANY MEDIUM IT MAY BE ON, INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimers of implied warranties or the exclusion or limitation of consequential damages, so the above disclaimers and exclusions may not apply to you, and you may have other legal rights. INDEMNITY You will indemnify and hold the Project, its directors, officers, members and agents harmless from all liability, cost and expense, including legal fees, that arise directly or indirectly from any of the following that you do or cause: [1] distribution of this etext, [2] alteration, modification, or addition to the etext, or [3] any Defect.
  • ...donating to the good cause. If you don't want to donate money, volunteer to proofread, or it might be worth it for writers out there to consider a notation in your will that will allow your works to pass either directly into the public domain, or, as i have been in contact with lawyers to discuss, simply passing the copyright of your own works on to project gutenberg. This allows them more work to publish, and if you're in a contract somewhere that allows for royalty collection, you can set it up so that those royalties switch to project gutenberg at the time of your death.

    Now might also be a good time to contribute an hour a week to a literacy project, or to make a donation there. Adult literacy is a serious issue all over the world, and that includes right here in the states, where there really are bright people out there who could have better lives if they could read. I can't think of a more on-topic subject than project gutenberg to discuss adult literacy and the need for both literacy teaching and to support free literature for the masses such as this project provides.

    Just my $0.02...

    solemndragon
  • by dpbsmith ( 263124 ) on Monday May 24, 2004 @10:51AM (#9237793) Homepage
    At the risk of pointing out the obvious, Michael Hart's decision to make the basic format of PG texts "plain vanilla ASCII" has resulted in texts that are highly accessible by any meaning I can think of for that word. They are also compact, platform-agnostic, and durable. Texts contributed in the 1980s are fully usable today.

    While there have been constant complaints about PG using the "wrong" format, opinions on the "right" format have been the flavor-of-the-month (or at least several flavors per decade). Had PG decided to use a "better" format, all of their volunteer time would probably have been taken up converting (say) WordPerfect to RTF to HTML to SGML to XML, leaving relatively little time to digitize and proofread texts.
    • The problem with "plain vanilla ASCII" is that expresses less information than is present in the original and is a PITA to recover this information. This is especially true for texts that use non-ASCII characters (or illustrations of any kind).

      I agree that flavor of the month representations are bad, but markup languages have been around for a long time and it wouldn't have been hard to use something (like small subset of SGML) to add a bit more formatting info. Then when people want to look at the te

  • by ricky-road-flats ( 770129 ) on Monday May 24, 2004 @11:05AM (#9237903) Homepage
    I only last week downloaded Project Gutenberg as an ISO - it has 9,500 books on it and weighs in at about 3.85 GB. All the books are as plain text within a ZIP file, accessed through a set of basic web pages also on the disc.

    It's great - I now have that on my laptop hard drive, mountable by Alcohol, so I'll never be short of anything to read, especially when the web's not available...

    I can't find the torrent file I got it through, but if it helps the filename is pgdvd.iso and the size is 4,139,646,976 bytes.

    • Torrents, ISOs and what have you are linked through the PG site [gutenberg.net]. You can also order a gratis copy of CD or DVD if you like (please consider making a donation in that case).

      There used to be a special library archive format (Green thingy something), but I don't see it on the site anymore?
  • by grrussel ( 260 ) on Monday May 24, 2004 @04:47PM (#9241498) Homepage Journal

    I've created an RSS feed from the Project Gutenberg list of etexts. The RSS feed contains titles, authors, descriptions and links to the relevant page or file on http://www.gutenberg.net/

    PGDB.rss [eu.org] PGDB.rss.gz [eu.org]

I tell them to turn to the study of mathematics, for it is only there that they might escape the lusts of the flesh. -- Thomas Mann, "The Magic Mountain"

Working...