Project Gutenberg Made Accessible 214
scishop writes "Mazarin is an open-source interface to Project Gutenberg's library. Mazarin increases the accessibility of Gutenberg's 10,000+ books as it formats the books for HTML display -- providing paginations in addition to generating table of contents and other advanced markup features -- along with enabling users to carry out full-text searches on the entire library."
Tested (Score:4, Interesting)
I Tested Martin Luther.
(if it was not for the printing press the reformation would not have been as sucsessfull as it was)
Re:Tested (Score:2, Funny)
"All contents copyright 2004 Scott Fortmann-Roe."
Yeah, sure.
Re:Tested (Score:2)
"Project Gutenberg Made Accessible" (Score:2)
And that is why we are slashdotting it?
At least thats my experience after "testing" it now.
Re:and then just think (Score:2, Informative)
Re:and then just think (Score:2, Funny)
Re:and then just think (Score:2)
diet = A formal general assembly of the princes or estates of the Holy Roman Empire
worms = the location of the diet
hence the diet of worms.
Re:and then just think (Score:2)
I think after that Diet, he was summoned to Rome, to appear in presence of the Pope. He did not. And since he found a protector, and supporters, the schism just widened, instead of being killed there as just another heresey.
Of course there are other factors why the Protestant movement was successful and not suppressed. For example, Henry VIII of England adopting a version of it. The alignment of various hostilities in Europe to either pro or anti Reformation. The allure of 'by faith
Re:and then just think (Score:2)
Re:and then just think (Score:2)
Quite wisely. A Pre-reformer who went, Johann Huss, was granted a promise of his safety. The then-current Pope gave him audience and asked him to explain his views. Behind a curtain the Pope's secretary took notes, and instead of allowing him back to Bohemia he was given to the Inquisition to be burnt, with the secretary's notes as proof of heresy.
He never
Re:and then just think (Score:2)
He never adopted Protestantism. He merely separated from Rome. Reform in England was done by the bishops after Henry's death, during the reign of Elisabeth.
Re:and then just think (Score:2, Insightful)
Anyone want to buy an indulgence?
Re:and then just think (Score:2)
see http://www.divinemercysunday.com/plenary_indulgen
Re:and then just think (Score:2)
Re:and then just think (Score:2)
How some one named Michael Patrick O'Connor end up a Lutheran, when his paternal Grandpartes are Roman Chathloic?
Well I would have to say I think what the [confessional*] Lutherian church teaches fits what I see people to be. aka the total depravity of man, when I look around I don't see that humans are good, when I look inside myself I see evil, though I don't want to be. I look and see a just God, no way I could ever appease him by my works.
(*I use confessional here, because there
Re:and then just think (Score:2)
Re:and then just think (Score:2)
Re:Yeah? (Score:3, Informative)
Quite to the contrary, these books were added by Rome at Trento. Until them they were usually copied along the Bible without being considered part of the Canon, just like the Shepherd of Hemas before the Montanist heresy.
It was only when Luther decided to have them printed apart from the Bible that Rome decided to try to accuse him of tooking them out of where they never belonged...
Re:and then just think (Score:5, Interesting)
It was very convenient for the Roman Church to have a practical monopoly on what was widely acknowledged at the time to be the main source of information, the Holy Bible. When the printing press was invented, this diluted that monopoly, since then the ordinary people could afford their own copies of the Bible and became independent from the Church for information. Luther was one of the first to realize that, when he urged people to read the Bible. A consequence of that was that people learned to read. Until early in the 20th century, the literacy rate for countries which are mostly Lutheran, e.g. Scandinavian countries and parts of Germany, were much higher than in southern Europe, where people were mostly Catholic.
A modern analogy:
Catholic Church --> RIAA
Lutheranism --> P2P
Re:and then just think (Score:2, Interesting)
and what is wrong with monopoly? Uniformity breeds community.
Re:and then just think (Score:3, Interesting)
How do you know that? Apart from the religious dogma that postulates the existence of a homunculus called the "soul", we do not know much about how consciousness arises. What we do know is that information doesn't exist in a vacuum. Information needs a physical medium to exist. Check "An Introduction to Information Theory", by John R. Pierce, Dover Publications, ISBN 0-486-24061-4, chapter 10 - "Information Theory and Physics" for a basic explanation why. No
Re:and then just think (Score:3, Funny)
I have not had the time to speak with all information, so this is merely anecdotal evidence of the diversity of opinion among informations.
Re: and then just think (Score:2)
Bruce Sterling (Score:3, Funny)
Re:and then just think (Score:2)
Re:and then just think (Score:2)
Re:and then just think (Score:3, Informative)
Not only that, but Luther translated the Bible into the common tongue. He used to hang out in pubs and the market and make notes of how people really spoke so that his translation would reflect day-to-day usage. The result - which is solidly argued in The Sovereign Individual and elsewhere - is that the common man
Re:and then just think (Score:2)
Re:and then just think (Score:2)
Re:and then just think (Score:2, Insightful)
I take a different view: just imagine all the problems that we'd still be dealing with if the Reformation had never happened!
Re:printing press (Score:2)
think of it if Hus was successfull instead of being a lutherian I would be called a husian (ok maybe not)
Re:Tested (Score:2)
Looks nice and dandy (Score:3, Interesting)
PG (Score:5, Informative)
Re:PG (Score:5, Informative)
Charles Franks
Founder, Distributed Proofreaders [pgdp.net]
Re:PG (Score:4, Informative)
However, it insists on at least a plain vanilla version of a text, as that format has proven to be the most durable and accessible.
So next time you post a text version to PG, make sure you post HTML and PDF versions alongside.
(Do read the rules for HTML in the PG FAQ first, though.)
Re:How about the TEI XML format? (Score:2)
And you can include them in zip file with the text with appropriate markings in the text.
How about using the Text Encoding Initiative's TEI XML format instead?
Have you ever marked up a book by hand in TEI XML? I can produce an ASCII book suitable for PG in a hour or so, from the output of DP. Every book I've tried and eventually quit trying to produce in XML took hours and hours, and it wasn't in working to make a text
Re:PG (Score:2)
Thanks.
Re:PG (Score:3, Informative)
Charles Franks
Founder, Distributed Proofreaders [pgdp.net]
Re:PG (Score:2)
Re:PG (Score:5, Informative)
Indeed, there are many, many sites that do all sorts of wonderful [blackmask.com] things [pluckerbooks.com] with Project Gutenberg eBooks. That's the wonderful thing about PG, you can do anything you like with the books.
While personally I prefer the original and the best [gutenberg.net]... hey, whatever floats your boat!
It is very much worth noting that Project Gutenberg would have nowhere near as many eBooks as it does without the help of Distributed Proofreaders [pgdp.net]. Sign up there, and proof just a page a day to make your contribution to preserving literary history. You can proofread as little or as much as you like, and do something worthwhile! Distributed Proofreaders [pgdp.net] is a great way to spend some of your time.
Slashdotted? (Score:3, Funny)
Re:Slashdotted? (Score:2, Funny)
you really want to say, this is the first time you follow a link posted on
Re:Slashdotted? (Score:2)
Re:Slashdotted? (Score:2)
Re:Slashdotted? (Score:2, Insightful)
Well, this piddling 'interface' to Project Gutenberg may have died, but the real PG website [gutenberg.net] is still going strong!
While you wait, you could do something worthwhile [distribute...eaders.net]. (That is, instead of reading the 10,000 other "Its /.ed already" posts)
P2P / Library (Score:5, Interesting)
A central webpage index could just have ed2k links to the files: sharereactor for books. When they update the book they release a new hash-link and the file onto the network.
It being P2P it could open it up to more then just public domain books too
Slashdotted - but nice error messages (Score:5, Interesting)
Re:Slashdotted - but nice error messages (Score:2, Funny)
Re:Slashdotted - but nice error messages (Score:5, Informative)
Re:Slashdotted - but nice error messages (Score:2, Interesting)
As others noted, this is definitely Perl HTML::Mason, which is one of the best web scripting environments I've ever worked in. An adequate comparision would be something like this: PHP's down-to-earth approach of mixing code and HTML, JSP's and Website Meta Language's ideas on how to separate them again if they need to be (code componentization and tag libraries), Perl as the scripting language, and Apache mod_perl to give it some speed (also works as CGI).
I'm just wishing to know how to turn the cool-loo
Re:Slashdotted - but nice error messages (Score:2)
Re:Slashdotted - but nice error messages (Score:2)
It's HTML::Mason [masonhq.com]
Slooow (Score:2)
10,000+ books? (Score:5, Funny)
Oh wait, this is Slashdot.
Gutenberg is totally inaccessible (Score:5, Interesting)
There were several research projects for which I used pg [slashdot.org] as a corpus. However, pg's a terrible hassle for the first-time researcher, since the format of the introductory text ("we're gutenberg, here's the copyright, blah blah") is inconsistent.
You have to remove the introductory text to avoid bias in the corpus, however there are so many pathological special cases (different formats, spelling, languages, words used, punctuation, case) that it requires several hours of Perl coding to successfully strip the header text from 75% of the documents with >99% accuracy. Yuk.
If gutenberg is serious about making their work more accessible, they should think about the simple concern of ensuring consistency in the header text format.
Re:Gutenberg is totally inaccessible (Score:2, Insightful)
Re:Gutenberg is totally inaccessible (Score:2)
Re:Gutenberg is totally inaccessible (Score:2)
It's part of the big plan to convert everything over to XML. But this really isn't a big concern of Gutenberg, as most people don't use the entire archive as a corpus, and removing the headers and footers from even a couple dozen texts is no big deal.
Re:Gutenberg is totally inaccessible (Score:2)
Text version (Score:5, Informative)
since some seem to have trouble on the index page... here it is:
Project Gutenberg is the brainchild of Michael Hart [slashdot.org], who in 1971 decided that it would be a really good idea if lots of famous and important texts were freely available to everyone in the world. Since then, he has been joined by hundreds of volunteers who share his vision.
Now, more than thirty years later, Project Gutenberg has the following figures (as of November 8th 2002): 203 New eBooks released during October 2002, 1975 New eBooks produced in 2002 (they were 1240 in 2001) for a total of 6267 Total Project Gutenberg eBooks. 119 eBooks have been posted so far by Project Gutenberg of Australia [promo.net].
Click here [promo.net] for the full PG story and here [gutenberg.net] for the latest News [gutenberg.net], and learn about the Stockholm Challenge Award [promo.net] recently won by Project Gutenberg in the category Culture.
The key link is search page [promo.net].
Re:Text version (Score:4, Informative)
One of the projects run by the Internet Archive [archive.org] is the Bookmobile, which creates, prints, and gives away (for a nominal production fee) books created from public domain sources. One of their most popular products is an illustrated edition of Alice in Wonderland.
who can read English...
Yes, PG's content is primarily English at the moment, but this is only because most of the volunteers up until now have been English. If you are confident in a language other than English, you can help us get more books in this language -- either by scanning them, or by proofing the books which other people have scanned by joining the Distributed Proofreading Project [pgdp.net] (or the new EU sister-project DP Europe [rastko.net]). At the moment the main site has projects available for proofing in German, Latin, French, Spanish, Swedish, Finnish, Dutch, Hebrew, Danish, Italian, ancient Greek, and Gaelic. The EU site has, in addition, books available in Serbian, Slovenian, Romanian, Welsh, Hawaiian, Russian, Polish, Lithuanian, Ukranian, modern Greek, and Bulgarian.
if the copyright has expired...
Yes, the vast majority of books in PG are copyright expired. This isn't a big problem, though, as we've only scratched the surface of the number of copyright expired books. Even at the current rate of growth, there's enough to keep us going until the US copyright regime starts letting new books into the public domain in 15 years or so.
Re:Text version (Score:2)
You want that we should be miracle workers? We do what we can.
who can read English
What, at least a half billion people? That's completely ignoring the German, Latin, French, Spanish, Dutch, Danish, Bulgarian, Serbian and Chinese which have gone into PG in some quantities.
if the copyright has expired
Out of the about three thousand years of important writings, we're restricted from 95, or about 3%? Even acknowledging the desire for those
Best way to read online texts? (Score:5, Interesting)
Re:Best way to read online texts? (Score:5, Informative)
I've been using it for a couple of years on my Palm V, and despite its small screen size it works perfectly for reading ebooks.
Re:Best way to read online texts? (Score:2)
Re:Best way to read online texts? (Score:3, Interesting)
On an old palm pilot or in the notes folder on an ipod. I found that it's the backlight of a computer screen (and on the new palms) that is what hurts my eyes when trying to read.
-Colin [colingregorypalmer.net]
Re:Best way to read online texts? (Score:2, Interesting)
(Keep a backup of the original in case you want to check again what the name of the butler's niece was.)
Re:Best way to read online texts? (Score:2, Insightful)
It is however a great way to research the classics for info and reports.
I still like to hunt around old bookshops, and often I can find those works for a buck or two.
Re:Best way to read online texts? (Score:3, Informative)
That's quite a failure on some levels, if that's all we're doing. One of my personal favorite authors in PG, J. S. Fletcher, is never going to be considered part of the "classics". But he's a nice read for mystery lovers who like Victorian London.
I still like to hunt around old bookshops, and often I can find those works for a buck or two.
Which books? Some of our books can not be found that cheap and many of the ones which can, mi
Re:Best way to read online texts? (Score:2)
Fuck the trees. (Score:2)
Re:Best way to read online texts? (Score:2)
I've one example up in my portfolio, http://members.aol.com/willadams (it's also in the TeX Showcase, http://www.tug.org/texshowcase ), Okakura Kakuzo's _The Book of Tea_ --- got the text from PG, set it, made some corrections (I've a (letterpress) printed copy in its slip case at home), sent those to PG (took two tries, but they finally accepted and applid most of them), and printed and bound a copy
the tutorial talked to me (Score:3, Funny)
So much for that theory: ERROR! (Score:2)
No database handle was returned. (Score:2)
Not a big deal, really, but they probably should have trapped that, as it could happen for any number of reasons (database down, authentication failed, etc).
I find that I'm getting much slower when I write programs these days -- because I'm checking errors for those things that I would've just blown off, or not have thought about in my earlier days.
[there's a few different things that could be done to this -- but I don't know why they're calling Datab
Straight HTML = archaic (Score:5, Interesting)
Bah. Posting HTML is so 1996. You can do so much more with these texts. One example is Open Source Shakespeare [opensource...speare.org], which takes all of Shakespeare's texts, indexes them, presents them in an attractive manner, creates a concordance, provides a full-text search engine, organizes the lines by character, etc.
All of the texts are open source, and you can download the database and source code from the site, too. Check it out.
Re:Straight HTML = archaic (Score:2)
Re:Straight HTML = archaic (Score:2)
HTML is great stuff. Please don't bash it senselessly.
Bam (Score:4, Funny)
Monday May 24, @03:15PM : Project Gutenberg made inaccessible
Slashdot'd (Score:4, Funny)
Oh, the irony that is slashdot.
Project Gutenberg Made Inaccessible (Score:2, Redundant)
In a related story, the Slashdot effect decreases the accessibility of Gutenberg's 10,000+ book.
SCO will file a lawsuit saying they wrote them all (Score:3, Funny)
Gutenberg archive and access (Score:2, Interesting)
http://www.gutenberg.net/etext04/awbv110.txt [gutenberg.net]
there in HTML.
The first volume was converted to HTML by hand by someone else and to pdf, by machine, I think, whereas my site simply has the e-text:
http://rjs.org/gutenberg/Stevens_Thomas/ [rjs.org]
So an automated process would be a boon. What I'd really like to see is an OS text-to-voice reader program. I wrote a wxPython program to assist conversion from scanned text to PG format: http://r [rjs.org]
Gutenberg, Google (Score:2, Interesting)
Gutenberg Disclaimer (Score:5, Interesting)
Quote:
Now might be a good time to consider (Score:5, Insightful)
Now might also be a good time to contribute an hour a week to a literacy project, or to make a donation there. Adult literacy is a serious issue all over the world, and that includes right here in the states, where there really are bright people out there who could have better lives if they could read. I can't think of a more on-topic subject than project gutenberg to discuss adult literacy and the need for both literacy teaching and to support free literature for the masses such as this project provides.
Just my $0.02...
solemndragon
Funny definition of "accessible..." (Score:4, Insightful)
While there have been constant complaints about PG using the "wrong" format, opinions on the "right" format have been the flavor-of-the-month (or at least several flavors per decade). Had PG decided to use a "better" format, all of their volunteer time would probably have been taken up converting (say) WordPerfect to RTF to HTML to SGML to XML, leaving relatively little time to digitize and proofread texts.
Re:Funny definition of "accessible..." (Score:2, Interesting)
I agree that flavor of the month representations are bad, but markup languages have been around for a long time and it wouldn't have been hard to use something (like small subset of SGML) to add a bit more formatting info. Then when people want to look at the te
Re:Funny definition of "accessible..." (Score:2, Insightful)
I you checked the volunteer mailinglist of Project Gutenberg, you would see that every now and again somebody waltzes in and says: "Why don't you do such and so? It's easy! You guys must be idiots for not doing it my way."
Neglecting the fact that such people rarely have the decency to find out if this discussion has already been held, and what the arguments were, list members will then ask the question:
"If it's so easy, why don't you show us how its done?"
That wi
Re:Funny definition of "accessible..." (Score:2)
If you want the fonts, use scans. You can't preserve the fonts in any sane transcription medium, and nobody tries.
TOC
Historically, PG doesn't lose the table of contents. It's usually pretty trivial to reconstruct, even if lost, and most PG to HTML converters do so automatically.
If texts really had been formatted in WordPerfect or RTF, a converter could easily be written (have been for years, actually) to convert the texts to any other format.
And each time you did that, it would get ug
Re:Funny definition of "accessible..." (Score:2)
I'm not sure what the official PG archive line says, but the stated Primary Rule in the Distributed Proofreaders Proofreadi [pgdp.net]
Re:Funny definition of "accessible..." (Score:2)
Italics are preserved _like so_. As a general rule, TEI doesn't preserve these things, either; instead it marks up things up as titles and what not. Again, most non-facsimile reprints don't preserve font size and formatting of titles.
[PG's TOC] no longer serves the purpose of directing you to the location of that section or chapter.
Use the search function, Luke. Hit control-F or whatever, enter the title name, and it should take you right to it.
I really do
Already very accessible... (Score:4, Interesting)
It's great - I now have that on my laptop hard drive, mountable by Alcohol, so I'll never be short of anything to read, especially when the web's not available...
I can't find the torrent file I got it through, but if it helps the filename is pgdvd.iso and the size is 4,139,646,976 bytes.
Re:Already very accessible... (Score:2, Informative)
There used to be a special library archive format (Green thingy something), but I don't see it on the site anymore?
Re:Already very accessible... (Score:2)
The big problem is that modern versions of Windows and Linux come with zip decompressers, and they're pretty much universal on any personal computer platform still in use (personal computer not meaning IBM PC-compatible here.)
The same thing you complain, is part of the reason that they don't use bzip. It's more convenient to zip multiple files (like illustrated HTML) than to have to u
The Project Gutenberg Index as RSS (Score:3, Interesting)
I've created an RSS feed from the Project Gutenberg list of etexts. The RSS feed contains titles, authors, descriptions and links to the relevant page or file on http://www.gutenberg.net/
PGDB.rss [eu.org] PGDB.rss.gz [eu.org]
Re:Fully slashdotted (Score:2, Funny)
Re:unhuh-explanation (Score:3, Insightful)
Hind sight is 20/20
Re:thewolfweb.com is faster than slashdot. (Score:2, Funny)