Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
News

Giving Project Gutenberg Recognition 256

A reader wrote to us about a project that deserves a lot of goodwill Project Gutenberg which is having a hardtime getting attention. Click below to read a letter from the head of the Project.

In an email from Michael Hart, the head of Project Gutenberg, he says:
"Getting the Etexts to twice as many people is just as important as creating twice as many Etexts. . .but without MAJOR publicity it is not likely to happen. . .we constantly get messages from readers who tell us they have been LOOKING for Etexts for years and just at that present time FINALLY FOUND US. . . . That means we cannot get to a major part of our audience with the kind of publicity we have, we need something more. . . . For example, we were the first in an entirely new column: "People To Watch" in the November 8th edition of TIME magazine, but we have received less than a dozen emails per that article. . .what we really need to do is get on Oprah Winfrey, and hopefully add something to her book club. Those of you on AOL, perhaps you could email the show and request they invite us. . . ! We should undoubtedly also try the other talk shows, and "magazine" shows, etc. All the press we receive is from them contacting us, I have had no luck "generating" publicity. . .which seems to be easy, for those who have the knack. . .it's just not MY knack. . .help!!!"

So, if there's anything that you can do to help - do it!

This discussion has been archived. No new comments can be posted.

Giving Project Gutenberg Recognition

Comments Filter:
  • Proj. Gutenberg has saved me much money on books. When I need one for school, or just to read, I grab it and download to my Palm. (And when i get bored, I can always switch right over to Tetris or Hardball). Thanks Project Gutenberg!
  • Don't think they'll let you in, they'd probably find some clause about you not being the authors or something... Anyway, can you really afford the bribes? =)

    On the topic, I find the Etexts QUITE useful... suggested bookmark #1.
    --------------------------------------------- -------------------
    Everybody's got something to hide except for me and my monkey...
    www.stampede.org
  • Another project that will save you much dinero is the one @ cmu. It's the yahoo of online books.

    http://www.cs.cmu.edu/books.html

    Btw.. you didn't say first post.. what's wrong with you ;>

    ---

  • by Anonymous Coward on Wednesday November 17, 1999 @04:33PM (#1523471)

    Another worthy Gutenberg-style project which deserves help is the Internet Dictionary Project [june29.com]. They are building the world's first copyright-free English-to-French dictionary by typing in the text from an old out-of-copyright dictionary by Spiers dating from 1853. They're asking for help, which involves typing in the text, using some simple formatting rules, for any of the remaining pages by reading the scanned images of the original book.

    After joining the project [june29.com], you download a scanned page and type it in according to their instructions. [june29.com] An important issue is whether they should be aiming to put the resulting work into the public domain as they state, or under a licence offering something akin to GPL's protections?

  • Try teaming up with amazon or something.
  • I've been using PG for a long time, and it's an EXCELLENT resource. The people running it obviously do need help publicizing, and I think once they get started the project will really take off, seeing as they already have tons of information. As the Subject says, once they hit critical mass in terms of public knowledge they're set. Another hassle they're facing is copyright law. The PG site lists some elementary info on copyright law, and it was changed not too long ago to cover a lot more stuff, so unfortunately we'll be seeing that many fewer etexts in our lifetimes. Ah well, what Project Gutenberg provides is nonetheless fantastic, and hopefully it will soon receive due recognition. Dan
  • Thanks for the link :)

    Btw.. you didn't say first post.. what's wrong with you ;>
    a) I'm not a l-user or an AC
    b) with my luck, in the time it would have taken me to type "fisrt post," it would have been second or third :)
  • by jdube ( 101986 )
    I'm gonna spread the word on IRC and all sites I have (as if anyone visits them [freeshell.org]). I've always wanted to see books online and finding this site was a godsend. Seeing as the only other site that does (did... I think they stopped a month ago) was MCP but those books (teach yourself jack shit in 2 seconds) weren't worth reading anyways.


    If you think you know what the hell is really going on you're probably full of shit.
  • by Jonathan ( 5011 ) on Wednesday November 17, 1999 @04:40PM (#1523477) Homepage
    ...but they are hardly the only people producing free e-texts. Yes, I remember that in the pre-Web era their ftp site was about the only place on the net for e-texts, but as the existence of huge archive sites like The Online Books page [upenn.edu] show, PG is just one group among many similar groups these days.
  • Team up with Amazon? Why would they promote a free competitor to their own service?

    Jack
  • Project Gutenberg has been around for literally years, and is a resource I always check when I can't find that elusive book.

    In fact, we used to use the text from many Gutenberg documents when I was fiddling around with data compression (specifically compression methods aimed at text and english in particular).

    Also, many of my relatives have asked me "Can you find out about this book on the net, and where I can find it?" and are somewhat suprised when I hand them a disk with the Gutenberg text version of it. First time I did this, they thought it was reviews of the book and details where to find it, instead of the actual text. "Remember how I said that a floppy disk holds about as much text as one small book?". *grin*

    However I think they got one of the most important boosts of advertising they could ever want, an article on good ol' Slashdot. Way to go Hemos! (and CmdrTaco of course) *grin*

    PS: Heya to FunkyBob, the guy who did most of the coding on that compression stuff was I mentioning earlier - and when will it ever work properly damnit! It's only been about 6 years! *grin*

  • Project Gutenberg is a Godsend for those of us who love to read. Last summer I spent nearly all my free time there. They have a great collection of older literature. Soon after I discovered the site, I had read nearly all their Sherlock Holmes collection, as well as many other books/stories I had wanted to read but never gotten around to checking out from the library. The only thing I dislike about the site is that they have virtually no 20th-century literature. (But that's due to restrictive copyright laws, not because of any failing of the site's administrators.)

  • Good idea... only problem... Amazon is in the business of _selling_ books to make money. I just don't see them posting a link... "Buy This Book Now *OR* READ IT FOR FREE!!!!!!" :)
  • How about giving Project Gutenberg a free banner ad here on Slashdot? Now that'd generate a lot of traffic and put them right out in the public view!

    Whaddya say guys?

  • Project Gutenberg was one of the first endeavors that really got me interested in the internet. Although etext always seemed to be in tough competition with online porn. However if they really want the Slashdot readership, perhaps a few comments that they are running on a Beowulf cluster and use only GPL software would really kick the /. effect into high gear.
  • by John Fulmer ( 5840 ) on Wednesday November 17, 1999 @04:50PM (#1523485)
    Actually you are belitting them. The link you gave credits Project Gutenberg for most of it's information and only lists two groups, Project Gutenberg and Celebration of Women Writers, as actual organized groups doing online texts.

    And while does list a number of online books, most are small individual online collections, and many are formatted HTML.

    Project Gutenberg is:

    a) Organized
    b) Just the text, no formatting
    c) Extensive

    They are the premier group doing online texts. You really have to give them that.

    jf
  • by Anonymous Coward
    if only for one thing, and this is nitpicky, but I think it might help a little:

    They need to make their online books a little more web-friendly. I understand PG's reasoning behind keeping everything in pure vanilla ascii, but quite frankly, in that state, the etexts don't look terribly great, nor are conveniently navigable in most web browsers. Appearance and ease of use are, imo, important factors if you want to attain a large audience.

    Bono Vox, bono@vox.org
  • by RaveX ( 30152 ) on Wednesday November 17, 1999 @04:53PM (#1523487)
    Personally, I remember my first run-in with PG back in the days of the BBS... it was a Taoist text in one of the download sections that had been created for PG. I also seem to remember a very lofty goal at the time, something like a billion downloads...?

    At any rate, I think a few areas might provide support...

    *Amazon (someone mentioned this) is a _bad_ idea. Profit motive and releasing free documents don't coincide well.

    *The Palm computing platform is the big plus. To be able to read in such a convenient form is wonderful, and PG offers a large library of material for consumption. However, PG needs to _market_ to them, meaning convenient little formats, getting linked to, etc...

    *Align with the OS movement more, there's plenty of talent that would likely work on such a task, but probably isn't even aware of it. Getting mentioned on /. is a huge start.

    *Make better use of technology... I seem to recall very slow rates of progress, which lowers the level of excitement for those involved (it's sad that this is a factor, but very true)- can't many works simply be OCR'ed?

    *The general public (Oprah Winfrey's audience, etc.) is most likely worthless. It seems as though most of the public rarely reads, let alone transcribes... The only thing they might be good for is cash to support the effort.
    Just my US$0.02
  • My laptop's a reall old 486 running dos/win3.1 : P (linux won't boot), but wow .. project guttenburg saved me a ton of money on not buying books that i wouldn't use ever again .. plus its less of a load to carry ..

    Maybe they should start posting posters across campuses and such .. eh ?
  • by apsmith ( 17989 ) on Wednesday November 17, 1999 @04:56PM (#1523489) Homepage
    If any of you have played with the E-book readers out there (Rocketbook [rocketbook.com] or Softbook [softbook.com] are the main contenders) you'll notice that 90% or so of the books they offer right now seem to be public domain ones, mostly from the Project Gutenberg collection. And that does make sense - PG is all about etexts, the E-book readers are about reading etexts... Anyway, it seems the two parties ought to get together. But unfortunately, the Ebook vendors seem to be more focused on licensing and copyright issues and making money from selling content, rather than just making and selling their hardware. Can't Dell or somebody like that get into this business and show how it ought to be done?

    Anyway, if we could get a bunch of recent books out there in the public domain (or GPL of course) - either under Project Gutenberg or some other auspices - I think that would demonstrate this is a serious option for the future of reading. The technical market might be ideal - how about merging in some of the Linux Howto's and the Linux documentation project with this kind of effort? Instead of making a buck for yourself and Tim O'Reilly, how about publishing with Project Gutenberg next time? Just as with Linux and the World Wide Web, it could be a way to guarantee readership you would never get by selling the stuff.

    By the way, I prepared 2 books for Project Gutenberg many years ago, and did some work on their Encyclopedia project, but I've not been keeping track for the last few years - it's definitely continued to grow and be successful. Despite Michael Hart's quirkiness, it really has come close to fulfilling the original promise (10,000 free etexts by 2000). A hearty congratulations to Michael and all the volunteers!
  • Project Gutenberg carries only copyright expired texts, so it doesn't really compete directly with Amazon's business. And it may be a good thing for Amazon: it's a nice service for Amazon's customers to be able to read books online.

    Even in the case where Amazon is trying to sell the same books covered by Project Gutenberg (e.g. classics), I think most people still prefer to read the book in printed form, so providing links from the book's page to the project Gutenberg text provides a using previewing service and may help to generate more interest (and sales) for the book.
  • What PG is doing is a really great service and needs to get publicity. Sadly the copyright restrictions serverely limits most recent books from being included. Wouldn't it be nice if the GPL would catch on with books? Why not some "open-source" sci-fi novels?
  • It's not as farfetched as it seems. Amazon sells hard-copies, not eBooks. A lot of people want a hard copy when they're reading a book. The only place it cuts into their revenues is where someone wants to look up something quick in a book and would be willing to pay for it only if they had to. Ever tried reading a book online? It's just not the same thing. Amazon could get some good press out of this. Market economics will determine whether the publicity is worth the lost revenue to them, unless a high-up individual in the organization is feeling particularly philanthropic and sees the value of this project for all people and looks beyond the dollar bill. Not altogether impossible. It's worth a shot, I'd say.
  • Good comments.

    On the question:
    > can't many works simply be OCR'ed?

    Project Gutenberg has been using OCR for years, including some custom OCR software developed along the way. However, they care about quality too, and OCR text ALWAYS has errors, especially when you're OCR'ing something that's 75 or 100 years old, as required by the copyright laws. The major effort is usually in proofreading. However, in some cases it's just faster to re-type the text - that's what I did for the things I worked on for them. I also learned how to touch type at 90 words/minute :-) which never ceases to impress my co-workers.
  • by Dacta ( 24628 ) on Wednesday November 17, 1999 @05:05PM (#1523494)

    I've been reading PG books since I've been on the net ('94) and I think they have got to be one of the most important resources available.

    People discount PG by saying thing like "Oh, you can get free texts anywhere" and "Books are outdated, anyway".

    Well, imagine happening without PG: Copyright laws are changed so that copyright does not run out after 30 years (or whatever it is) - and this is what the film lobby wants.

    Then, in 10 years or so, a law is made giving ownership of texts that have become public domain back to the decendents of their owners, who then seel them to film companies or amazon.com

    These companies decide that they only want to sell paper-books, and the demand for some titles is so low that you have to get a special publishing run for them.

    Then a some books get banned for being sexist/sexy/racist/communist or whatever, and you can no longer get them - period!

    Books - or at least the text of then is the life blood of civilisation - and PG is something that is making this freely (as in speach) available to all.

    Support it!

    PS:yes, I know the scenerio above wasn't real, and I know "the internet changes everything", but in 5 years, when you are reading "Sherlock Holmes" on your Palm XX, you can thank Project Gutenburg for keeping it free.

    --Donate food by clicking: www.thehungersite.com [thehungersite.com]

  • by Anonymous Coward

    I like getting my hand on free 19th Century classics as much as the next guy. However, I find Project Gutenberg of dubious usefulness. And I strongly disagree to the claims of some journalists that this project, if completed, will be a great help for the schools of the Third World.

    I am sure the Internet and its associated technologies can be used to help impoverished kids worldwide, but I don't think they would benefit much from an electronic version of, say, Boswell's Life of Johnson... in English.

    [I know this is slightly off-topick, but I just wanted to prevent someone coming up with the references to rural Kenya that always pop up when discussing Project G.]

  • It is sad to see this here.

    Really? I'm rather thrilled to see this here. Until now, I've never heard of 'em.

    If Project Gutenberg had been started by Linus or RMS, would it send out hysterical letters every month asking for money to keep the project afloat?

    I fail to see how this qualifies as sending out "hysterical letters every month asking for money to keep the project afloat." Rather, it seems to be a request for publicity. It doesn't seem to be all that hysterical at all.

    Wondering what this has to do with Slashdot and Open Source and GPL.

    I don't see that it has much to do with Open Source and the GPL, but it seems to have a bit to do with Slashdot. "News for Nerds. Stuff that matters." Well, I'm a nerd. I like to read. This is news to me, and stuff like this sure matters to me. End of story here, at least.

    Seems PG is a conservative outfit that resists change in technology,

    Unfortunately, I fail to comprehend how translating texts printed on paper to an easily-reproducable format that can be easily obtained via the internet qualifies as resisting change in technology.

    won't cooperate with other free ebook causes,

    Proof? Links, quotes, or something, please. If this is true, I'd be interested in seing something to subsantiate this, as I'm not likely to take the word of an A.C. alone as gospel truth.

    is intent on producing numbers of poorly proofread texts and admittedly of not top quality,

    Try getting 30 of your closest friends and proofread several hundred thousand pages of material and see how well you fare in getting all the errors. Please cite something to prove that they are "intent on producing numbers of poorly proofread texts."

    and doesn't accept criticism from outsiders.

    Again, proof please!

    BTW, the aforementioned letter from Hart is not on the page linked to, and it contains errors of fact about copyright.

    I don't see any claims that the letter is on the page linked to. If if does contain errors of fact about copyright, please cite some.

    Maybe I'm way off here. I've never heard of this project before today, and thus my knowledge is limited to what I've seen here and my brief perusal of their web site, which more or less only consisted of checking to see what they had by F. Scott Fitzgerald (only This Side of Paradise) and Gabriel Garcia Marquez (nothing). If what you say is true, I'm certain that I, as well as other readers of Slashdot, will benefit from having some primary source material to peruse demonstrating your claims. Right now, all we have to go on are your quite unsubstantiated allegations.

  • by Anonymous Coward on Wednesday November 17, 1999 @05:10PM (#1523497)

    You are wrong. PG lists a little over 2,000 books. The On-Line Books Page has links to more than 10,000 in English alone. Therefore you are wrong in saying the OLBP "credits Project Gutenberg for most of it's information."

    Instead of calling PG "organized" and "extensive," you might say it is random and shallow. The OLBP and http://www.ipl.org [ipl.org] list books by Dewey categories and subject, while PG does not. PG is shallow because it refuses to put the necessary bibliographic information on texts--one can't even find out which year or which edition most of them are.

    Because PG uses only ASCII files downloaded by FTP or gopher, it has not yet joined the World Wide Web. It is not possible to make a deep link to a paragraph or page inside a PG book, for example if one wishes to reference a quotation. It is intended for offline reading only, and in that respect it offers nothing over a paperback book. Instead, other online book projects deliberately produce works that take advantage of computer power, for example by using more readable fonts.

    As far as PG being the "premier" group doing online texts, that smells of a little old American prejudice. Since they use only ASCII, they can't include accented characters for other languages. They have started to include a few works in European languages, with strange conventions to represent those characters, but it will be interesting to see how PG adapts to Unicode and the extension of the World Wide Web to non-English-speaking nations.

    A more significant beef about PG is that it is centralized and dominated by one person, who does not share the philosophy of Open Source production that most of us do. Instead of forcing individuals to contribute to this project, why not help them set up their own web sites to publish their own works, or other works they have scanned? The WWW has made this type of centralized project unnecessary and even harmful. That is what we ought to be discussing here, not how to send money to this project to bail it out once again.

    I am not afraid to belittle Project Gutenberg. I sign my name, too!

  • by B10MA55 ( 115892 ) on Wednesday November 17, 1999 @05:17PM (#1523499)
    1) Use your domain name: Gutenberg.org 2) Get the crawlers to got thought the texts _at your site_ . You can wrap in pre tags... 3) Your ripe for a grant for outreach. If you don't have the "official" framework, contact some CS or English depts and see about some joint work here. 4) oss4lib is a new group that could be seen as having a relation to you... 5) Perhaps some outreach letters to English depts at various levels, from grade school up. 6) Bells and Whistles: how about some history on gutergerg, past and present. Entertainment. 7) Given talks at various places helps. You might meet some connected people on the way... 8) In general, of the e-libraries, what tactics are the successful ones using. Seems a good learning place. I do like the tasteful layout and quickness of your cover page. I have always been impressed with Gutenberg! Good luck
  • The way that Project Gutenberg gets material is by people contributing the often rather substantial effort of typing in material, as well as working to verify its correctness.

    That really is a quite considerable cost, in much the same way that the production of "free" software requires substantial effort.

    It is somewhat unfortunate that there have been such peculiar positions as:

    A friendly dissuasion from this yielded the first posting of a document in electronic text, and Project Gutenberg was born as Michael stated that he had "earned" the $100,000,000 because a copy of the Declaration of Independence would eventually be an electronic fixture in the computer libraries of 100,000,000 of the computer users of the future.
    It did not add to the project's credibility when they on the one hand indicated that their funding was maxxing out at around $30K per year, whilst claiming that they were producing "billions" of dollars in value. (Note that the PostgreSQL HOWTO [linuxdoc.org] suffers from the same sort of thing...)

    A claim of $30K on the one hand, and $Billions on the other, do not reconcile very well.

    Not unlike the situation with the FSF, they could probably more readily use contributions of time rather than of money, although some of both doubtless prove valuable to some degree...

  • by Anonymous Coward
    I really liked that link.

    Here is another: classics.mit.edu [mit.edu]
  • Anyone who is doing the distributed.net project, you can vote for the charity money that will be won to go to Project Gutenberg.
  • I've followed them and downloaded their etexts for a number of years now, and I must say that Gutenberg is one of the finest, most selfless projects on the internet.

    My favorite thing to do with Gutenberg etexts is to load them up in TextEdit Plus on my Powerbook and "Speak document" while I work. It's very cool ... and the voices in OS 9 are much better than they have been in the past.

    Three cheers to project Gutenberg, and anyone out there who hasn't already checked them out should do so ASAP!!

  • Is there a copyright-free English dictionary that could be distributed with open-source software like word processors for Linux? This seems to be an important feature that current Linux distributions are missing. I have in mind something better than /usr/dict/words.

    Creating a good dictionary from scratch is hard work, but if you can get the structure and the word list e.g. from a copyright-free source then the hardest part is done. Therefore, a good starting point would be to take the structure of the copyright-free Spiers English-French Internet Dictionary [june29.com], i.e. cut out the French translations to leave the English core. Is anyone else interested in this?

  • by wilkinsm ( 13507 ) on Wednesday November 17, 1999 @05:32PM (#1523507)
    Every submitter formats the text differently, and the inline ("botton of page") footnotes are a real annoyance.

    However, I would like to say that via GB, I've read every Charles Dickens and Sir Arthur Conan Doyle novel they have e-published, to much satifaction. I started on other authors, but then a friend introducted me to the Dune and Hyperion series. :)

    I think it's safe to say now that webifing the text would be a wonderful idea. If you were to index them in the web search engines, you would then definately get more hits. I'd love to be able to type in a search engine "to be, or not to be" and get sent to the correct page in the GB e-text.

    Once you do that, launch a ad banner campaign with suggestive quotes. ie. "The staircase was darken with gloom...(click here to read more...)"

    BTW: I read "Sun Tsu" as well. Way cool...
  • Well, English is widely spoken throughout Africa, in the former British Empire, and i think that eg. Shakespeare is of value anywhere. The site is perhaps too oriented towards Western literature - theres plenty of Islamic/Indian/Chinese literature is available out of copyright (as well as folt tales, mythology, etc). Actually, it would be very interesting to have a related project transcribing old textbooks, classics of science and the like. (I did notice that PG has the 1st million digits of 1/pi)
  • A friend of mine introduced me to this during undergrad, and I was instantly a fan. Sure, the texts aren't that pretty to look at as is, but dump them into an Emacs buffer, add a few LaTeX markup tags, and suddenly you've got a decent-looking copy of whatever. (This is especially nice with texts which are relatively short -- I remember in particular having a tex-ified version of the Communist Manifesto. :-))
  • by Alan Shutko ( 5101 ) on Wednesday November 17, 1999 @05:38PM (#1523511) Homepage
    I am not afraid to belittle Project Gutenberg. I sign my name, too! That's why you're an anonymous coward, right?

    At 2337 books (last time I ran a count), PG is nearly 1/4 the OLBP. It's been doing this for a long time, and somehow, keeps meeting its goals of exponential production. No, it doesn't list books by Dewey or by category like IPL or OLBP. Also unlike IPL and OLBP, it's actively involved in putting works in online format.

    Indexes are nice, but third parties can (and are) doing indexes. Without people doing scanning and keyboarding, those indexes won't have much to index. PG provides a single point of contact if you want to scan, proofread, or archive etexts. Ever try looking for a book and found the server down? Collections like PG minimize that problem, because a work is no longer in the hands of a single person who may decide they're sick of it taking up their web space.

    As for preferring rich markup to plain text, it's easy for you to _add_ that markup. Usually, at least 70% of the work can be quickly automated. And keeping plain text means it's maximally useful, since you have a simple base to provide whatever markup you want, be that HTML, LaTeX, MS Word, whatever. If they'd been rich markup from the start, do you think that the texts from 1991 would be in HTML? What about when we switch over to some XML-based markup? And what happens after that?

    If it makes you feel better, don't think of PG as finished product. Think of it as raw material, and put together your own site with richly marked up texts and scripts to do web-cites of specific chapters, sections, etc. I think that would be a very valuable thing to have, but it will certainly be easier for having so much of the grunt work done for you.

  • Here's another: http://www.bibliomania.com/ [bibliomania.com]. This site doesn't have any tech books, but it's a very good resource nonetheless.

    --

  • It surely is like unto free software in that it promotes the free availability of literature.

    It may not use the GPL, per se; it may not attempt to challenge copyright in the way that RMS seeks to challenge the notion of proprietary software.

    But it certainly represents an analogue to free software.

    Books are rather less ephermal than computer software, as the Gutenberg Project surely shows that there is literature a hundred years old that is still worth reading, whilst much of the computer software of ten years ago isn't worth using. (The "computer literature of UNIX and Lisp" representing occasional literate exceptions...)

    There used to be somewhat wild claims about the value of the Gutenberg Project; as it has grown from nothingness into being a fairly significant library with diverse users, the values have become clearer.

    The project has suffered from some texts being of somewhat questionable quality; their transcriptions of some religious works have been useful in bringing in both a touch of religious fervor, to more actively stiumulate verification, as well as in perhaps pulling in some of the "scribely" skills mostly associated with religious traditions.

    I think they had some problems with some of the early OCR technology, finding accuracy to be a bit low. Time, independent OCR attempts on differently published editions of books, and learning curve can provide improving results...

  • ...a text to speech company. It would put a new spin on "AudioBooks."

    BTW: Gutenberg texts suffer from alot of typos - about half way through the work, the quality really started to suffer badly...

  • I was thinking recently of this very same thing. On the Solaris system I used back in college, there was a utility licensed to the engineering dept called 'noah' which was sort of an online dictionary/encyclopedia. It would give just enough definitions to be useful for a word, as well as little related facts, and pronunciations, etc.

    I was wondering about how feasible it would be to start a GPL version of this. Ie, start with a bunch of words that would be commonly looked up, and we'd come up with definitions for them paraphrased from a variety of sources, so it wouldn't be plagiarizing.

    If the resulting text wasn't stored in plain-text (too large) but compressed, there could also be specialized tools to grep for keywords anywhere in the definitions, etc. Is this a decent enough idea? Having noah was really REALLY handy, just at a prompt type, "noah asperity" for a good definition of 'asperity'. Really useful

  • I think it would be better now if PG made appropriate use of basic XML. Formatting and indexing can EASILY be taken out. But putting it back in is WAY HARD. One example is the PG king james bible. Parsing and figuring out the chapters, verses etc it a semi-hard computing problem. But stripping out some XML markers for these things is trivial. Or converting it to some future format is trivial. But if the markers aren't there it's a bit hard.

    Plain text is much better than nothing, but it's not the way to go, especially these days.
  • >>Seems PG is a conservative outfit that resists
    >>change in technology,

    >Unfortunately, I fail to comprehend how
    >translating texts printed on paper to
    >an easily-reproducable format that can be
    >easily obtained via the internet qualifies as
    >resisting change in technology.

    It doesn't. However, PG in general, and Hart in particular (as if you can really separate the two) are stuck in a reasonably old-fashioned mindset when it comes to textual information. Because the project started way back in the Seventies (I believe), the choice to use only plain ASCII might have made sense then. It certainly doesn't do so now.

    PG would benefit greatly from a structured information format, preferably one that could be transformed down to plain ASCII when needed (most formats that would be appropriate already do this). Using something like SGML or XML would give them the benefit of structure in the information, like footnotes, italicized sections, page breaks, etc., in a machine-readable format. Also, they would have the option of using Unicode, which would benefit them greatly, since 7-bit really doesn't cut it for anything but English text.

    I, and I'm sure many others, would be happy to provide an XML system for them free of charge, but as I've understood from interviews, Hart has his mind set on continuing to use ASCII, because he feels it makes it available to everyone. Personally, I think it reduces everyone to the lowest common denominator, and could be solved in a better way. My two centavos.

  • by Mr. Protocol ( 73424 ) on Wednesday November 17, 1999 @05:57PM (#1523522)
    Some folks want Gutenberg to move past ASCII and become more web-friendly, more non-English-language friendly, more Y2K-friendly, whatever. I happen to believe they're on the right track. They are trying to provide a baseline of texts which can be adapted to specific purposes.

    That's how I use 'em. I've downloaded a few such texts and made them into Newton books, which I put on my Web site. (I'm a retro-geek. I prefer Newton to Palm.) I couldn't do that with an HTML page, or at least, not as easily.

    The one thing I found in doing this myself is that some Gutenberg texts, at least, aren't error-free, even if they have been proofread. I've proofed two such books so far and I h've had to correct around a dozen errors in each. Now, the books I'm converting are by a British writer named Ernest Bramah who's completely obscure today. I happen to have original editions in hardback, but with a writer as obscure as Bramah, there are damn few of us out here with original editions to check. I could wish the Gutenberg proofing process were a little more thorough. There isn't even a central place to report such errors to: the Gutenberg help line just told me to forward the corrections to the original text provider, which I did.

    On the other hand it does make me feel like I'm actually giving something back.
  • They are the premier group doing online texts. You really have to give them that.

    And the oldest! There's pretty much as old as the net.
    I have nothing but respect for PG and what they represent. I actually feel bad because I haven't visited the site in a while, but I remember them being the reason I started using the net...remember Gopher?

    --GnrcMan--
  • There's pretty much as old as the net

    "There's"? Sorry, that should be "they're". I'll be going out back to beat myself with a clue stick now.

    --GnrcMan--
  • Simply make an IPO on NASDAQ. If possible, associate yourself with Linux as well. Maybe get an endorsement from Bob Young of Red Hat. Wall Street will be beating down your door to give you money, without knowing why or what you do. Use the money to buy banner ads. :-)
  • Hmm. I looked at that site, and it *looks* like they expect authors to use Word to enter documents. It talks about putting words in italics, which doesn't make much sense for pure ASCII editing. It's not particularly clear, though; will they accept a tagged HTML document?

    Also, quick dummy's question: what is the situation with HTML and Unicode? I've always assumed the HTML docs were ASCII, but presumably our international friends have some nicer way to work with HTML and different alphabets.
  • Excellent, I must agree.

    I used it back then when I didn't want to buy a copy of Flatland... I'd already read it before, and it's much easier to search on online version.

    It again proved invaluable when I had to look up lots of random British poetry for a class, and makes searching and citing lines so much easier.

    However, what do you expect for popularity when the web site is somewhat organized and the ftp site (where everything is) is worse--last I checked, organized by the year it was retyped or something... But I haven't looked in a while, and I can usually find a link to it on the web. However, it ain't pretty, even from my '93-'94 web publishing standards. ;)

    I wish there was something similar for movies. Time to start collecting movie scripts! (I found an annotated script, basically, of "Shadow of a Doubt", and it helped a lot with a paper I was writing, but I didn't find a central place for scripts. Although I bet the imdb would take them, or link to a site that had them.)
    ---
    pb Reply rather than vaguely moderate me.
  • by Anonymous Coward

    If it makes you feel better, don't think of PG as finished product. Think of it as raw material, and put together your own site with richly marked up texts and scripts to do web-cites of specific chapters, sections, etc. I think that would be a very valuable thing to have, but it will certainly be easier for having so much of the grunt work done for you.

    Some of us have tried. But what if PG actually loses information in their process. For example, they are not careful to reproduce italics or bold face or accents or vertical spacing. And they don't give the exact edition used, so one can go back and compare with the original, even for proofreading obvious typos.

    So, unfortunately, in many cases it turns out to be easier in most cases to OCR the book again and get a clean copy to work with.

    And when doing that, with HTML, one can even include the illustrations that were left out of the PG edition--even if the text refers to them.

    Plain ASCII text is only maximally useful if it conveys all the information in the original. Since each PG text has its own conventions for markup (unlike HTML) it should not be called plain ASCII text, but some sort of arbitrary structured ASCII. It's not useful at all, in too many cases.

  • The GPL is really a license for open-source program/documentation development, and nothing more. You don't want 100 different revisions of a fiction text out there, each slightly "improved" by a different author. Can you imagine reading a book, perhaps a chapter at a time, and having it constantly change on you? The plot inconsistencies would make The Phantom Menace look like Shakespeare. I love the GPL, but please, let's be serious about where to use it.
  • The Project needs to focus on being easier to use. A sort of "Avant Go"-ish interface where I could select a text online and have it sync to my Pilot without my having to think about anything would be a good start. I mean, I know I can put these texts onto my Palm, but I want it to be really easy.

    If getting more users is really as important to them as getting more texts online (and there really isn't an awe-inspiring amount there yet, so far as I can tell), then they need to be able to pass the mom test (you know, could my mom use it?). I mean, I really *like* having a book on my Pilot at all times -- it saves me in situations where I'm unexpectantly bored. I'd bet I'm not the only one. PG needs to cater to this.

    ----

  • I just tried to register "gutenberg.org" so that I could give them a domain name. gutenberg.com, net and org are all taken. :( Any sugestions for a good domain name I can point at them?

    --GnrcMan--
  • The original Cmdr. Taco posting at the head refers to this letter, which originally was at the link, but PG changed it. Who is at fault, PG or Taco?

    I suppose that Hemos (who posted, not Taco) wouldn't be at fault if PG changed their site. However, it appear(ed/s) that the link simply points to the PG site itself, judging from the context and the construction of the link. If PG had the text of the letter present in their index.html file, and changed it after the story was posted, I missed out seeing it. If that's the case, I'm clearly in the wrong in my previous post.

    Thus they should be aware that current U.S. copyright law (since the Sonny Bono Copyright Term Extension Act of 1998) extended copyright from 50 to 70 years after the author's death, or extended it from 75 years after publication to 95 years, whichever was applicable. Evidently the content of the letter was copied from some 1992 writing, and needs correction.

    Very well. This is the sort of information I was looking for. Given their mission, they should be on top of this sort of thing, even if they don't handle materials published after 1922.

    And, in retrospect, given Marquez's birthdate and the like, I should have readily realized that his stuff wouldn't have been their. Quite a lapse in my thinking there . . .

    I'm glad you've now heard of Project Gutenberg. Perhaps it will make you think of the differences between it and open source. You have heard of that, haven't you?

    Open what?

  • so!!?!?!? What was the freakin answer?

    heheh... I'm no Perl expert and don't care to figure it out myself. Hook a lazy brother up!

  • > I am not afraid to belittle Project Gutenberg. > I sign my name, too!

    A bit odd for a AC post. :)

    And so hostile, rambling and full of what are apparently personal issues that I can't even comment, since it seems so, er, out there. Sorry can't help you.

    PG is a good project. There should be more like it. And I don't understand all the hostility for a project whose goals are to provide and preserve texts in an electronic form.

    jf
  • Beowulf [unc.edu] cluster. Awww yeah!!!
  • by HamNRye ( 20218 ) on Wednesday November 17, 1999 @06:21PM (#1523541) Homepage
    Ok, that's a falsehood, I have used it, once. About 2 years ago I downloaded Notes From The Underground. It lingered on my hard drive with some Mark Twain that I had also downloaded at the time. I don't believe that I ever read them, because it's too darn uncomfortable to read a full novel on a computer.

    Eventually I picked up Notes from the Underground As a Dover Thrift Edtiton. It cost me all of $1.00. I couldn't print it myself for that much. Also I picked up Faust, The Theory Of The Leisure Class, The Devil's Dictionary, The Queen of Spades, Oedipus Rex... and the list goes on. These were brand new. None of them were more than $2.00. And that was suggested retail. Used books fall into much the same category, as they are usually $2.00 for a paperback.

    In this era we publish more books than ever before but fewer authors than 30 years ago. Why not use E-texts to promote some authors who cannot get published by the big boys like Bantam, Del, Tor, etc... Why not have a more user friendly site? Why not invite reviews? Reccommendations? Etc...

    Why not make it so that PG is accessible to the masses. Let people have their stake in PG, make them a part of something. That is what draws people to participate in these projects. Slashdot is not the best news site out there for news, but it is the best community out there for news.

    When I first found PG it seemed like one of those great ideas. I bookmarked it. I stopped back, nothing had changed, A year later I stopped back, still didn't see anything that really caught my eye.

    In short, I appreciate what PG is trying to accomplish, but I cannot find where it has any real relevancy to me. Not when the price of the information on a user-friendly, portable media that never needs winding or batteries is available for so little. To truly draw attention and keep it, you need to fight our pitifully short attention spans, and our desparate need for convenience. Why not encourage people to write for PG, not copy. Why not encourage the stockpiling of information, not fiction. What about an app that facilitates the finding and reading of e-texts, something more than "more"...

    PG has been around long enough to have garnered the recognition it deserves. If it is concerned that it is not busy enough, then it should be wondering why. It has always seemed to me that PG tries to lure it's readership with the mantra that "This is for the greater good..." Help us... Instead of playing on our consciences, fufill a need. As of this writing there are ~50 responses from people who have all heard of PG. Some use it, some don't. But they all know about it.

    PG, give me the slightest reason to come and keep coming, and I will. Until then, I can get Vonnegut for $0.25 at the library and PK Dick for $2.00 at Novel Futures. And god knows that our independent booksellers are struggling too. (Tangent: Don't buy from book behemoths, as smaller booksellers die out our culture moves further into the realm of vanilla pop garbage!)

    ~Jason Maggard
    "Give me convenience or give me death." ~Jello
  • I remember reading a Wired interview with the PG founder back in '96 or '97.

    They were talking about how movies were begining to come out of their copyright period, and how he wanted to make a public domain MPG of "Gone with the Wind" before he died.

    I'm not quite sure what the copyright status of early (say, pre-WW2) movies is, now. Anyone?

    --Donate food by clicking: www.thehungersite.com [thehungersite.com]

  • by Anonymous Coward

    Although the Spiers French-English Dictionary [june29.com] website often mentions MS-Word, they do state on the how to join the project page [june29.com], that

    Typing can be done in Microsoft Word, WordPad, Word Perfect, or other common processing programs, and should be entered exactly as it looks on the page.
  • I agree here. It wouldn't kill anyone to run everything through a simple Perl script that would tidy the texts up a bit and make them a bit easier on the eyes. It's not as if you have to burn the ASCii copy to make a HTML-formatted version available.

    What I'd really like to see, however, are versions that take advantage of hyperlinking. I once saw a HTML copy of Dante's Inferno which was fully linked up with an in-depth annotation which explained references and other aspects of the work which I would have missed unless I'd taken a class about the book. It was incredibly useful; it still stands out in my mind as the most incredible thing I'd ever seen done to a book. It let me understand Dante in a way I couldn't have otherwise.

    HTML and Palm-formatted versions would be great. Again, it's not like they have to ditch the plaintext version to provide others.

    ----

  • I'm sure most authors nowadays use their computers to type up books anyway, but instead of having the list of not-online books continue to grow, all publishers should have to submit the "Plain Vanilla Text" of each book they print to a government database. Then it can be decided if the author will allow the book to contribute to the Gutenberg(spelling?) Project after a year or so, after the book has taken in most of its sales. I believe this is a great idea, and should be put into effect immediately. Don't allow the burden of these great volunteers to grow exponentially. :(

    "There are no shortcuts to any place worth going."

  • I don't quite understand why this post was moderated down to 0 for being a "troll." Maybe I'm just weird but I saw nothing about this post that is in any way a troll. It might be somewhat redundant but certainly not a troll.
  • :) Yep, that's the one. It's okay, because I'd read the book before, and other people had copies of it too.

    ASCII illustrations? What are you talking about, that's awesome! Put it in HTML with the PRE tag! ;)

    If they put it in standard HTML, they'd have to use table art, or use separate image files. Then they'd have to decide on (copyrighted) GIF's, (unsupported) PNG's, or (wasteful for line art) JPG's as an image format. Then they'd have multiple files for individual books... And then they'd say "Why didn't we use .pdf's?" And we'd say "Isn't that proprietary-speak for ps.gz files?" Aaaaggghhhh!

    So I can see why they used text, even if there are three different conflicting conventions for ending a line... The nice thing about standards is that there are so many to choose from!
    ---
    pb Reply rather than vaguely moderate me.
  • Actually Dover buys the rights to out-of-print books. The copyrights on all of Dover's books are still in effect.
  • Back in '97, Wired did a feature [wired.com] on PG. The original Gutenberg ftp site was hosted on a UIUC [uiuc.edu] machine. I have some friends who were there at the time, and have regaled me with stories of what a pain in the ass the guy was. The FTP site that is alluded in this article by one Mark Zinzow [uiuc.edu] was on a machine, mrcnext (which no longer exists but still has a DNS entry) adminned by a friend of mine at one point. Anyway, the point is, this article [wired.com] has a lot of interesting things to say about the Project and especially Michael Hart. Check it out.

    --
  • I think, actually, that project gutenberg ought to store their files in a simple semi-formatted way.
    Like, lines beginning with \ are escapes with codes for 'title' 'author' 'chapter' 'paragraph' and 'footnote' Like,
    \title The Slashdot Effect
    \author Rob Malda
    \chapter Chapter One: What is The Web
    etc.
    (apologies if there's a real book by that title)
    Most of it'd just be plain text. With -just- enough formatting that a perl-script (or future-language script) can transform it into the pretty-format of the day with a bit of analysis,
    but not so much as to make it unreadable.

    Just a thought.

    --Parity
  • by Anonymous Coward

    Project Gutenberg is a great thing. I have no affiliation with it other than as a reader. I think it's great that slashdot took the time to post this story and I hope that PG is successful in getting more publicity.

    I'd like to address some of the rather negative posts that people have made regarding PG because frankly their short-sighted:

    PG's etext's should be made more web-friendly:

    Most of the people who make this comment also suggest a lot of nonsense about adding HTML formatting to texts. This would be a huge mistake. The beauty of having the texts in as simple a format as possible is that it is always possible to add the formatting later. I'm sure that the creative readers at slashdot could come up with about fifty ways that the formatting could be added later, dynamically if need be appropriate to the display device. It is also particularly annoying to see the aforementioned comment on slashdot for the same reason. If you want to see the texts presented in HTML grab the texts (I have seen many folks on here bragging about their great bandwidth), write up a perl script to format them and start serving them up from your site, BANG! You've just created a great new companion site to PG.

    This is not helping folks (somewhere else):

    The gist of this comment seemed to be that PG was not useful, because the texts were mainly western literature and in english? This is so bogus it's hard to address. The bottom line is that you have to start somewhere. The first document ever done by PG was the United States Declaration of Independence. Yes, this shows a bias, but, you have to start somewhere. I have never seen anything from the project saying that any works were to be excluded and in the meantime yes, the works that are there are useful to people all over the world. If PG got some more publicity, maybe more people from around the world might hear about it and could contribute.

    Anyway, anybody interested in great literature should take a look a PG. They have a lot of great stuff to read available for free.

  • My first ever post to SlashDot! What a moment!

    I'm impressed by the texts available - from Charles Dickens to Geoffrey Chaucer, and Mark Twain, and even The Hackers' Dictionary of Computer Jargon [promo.net].

    My question is, what portable devices are available these days for reading texts such as these downloaded from the Internet. I would love to able to use one these on the train and tram, on the way to and from work - better than a broadsheet newspaper. I had a look at the Rocketbook and Softbook mentioned by a previous poster, but those devices seem to be very restrictive in terms of availability of books. I guess WinCE machines could be an (expensive) alternative. What about Palms? I don't actually own one myself, so I don't know about how hard they are on the eyes for extended periods.

    "Who makes Steve Guttenberg a star? We do, we do."
  • projectgutenburg.org
  • Maybe not the GPL per se, but something along those lines could easily be implemented for a novel. A main author could write an intro chapter, post it to the web, set deadlines for each chapter submission, and then piece together a finished project. Of course the publishing industry being what it is, this is unlikely to happen anytime soon. Nice dream for literature types like me though.
  • As distasteful as the thought may sound to some of us, it may be time to solicit the help of the government. Specifically, I believe it was California's governor, Gray Davis, who was recently talking about building a virtual library of all the texts in the California State university system. In this case, collaboration may make sense.
  • Article I, Section 8 of the US Constitution enumerates the relevant federal power as "Congress shall have the power... To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries;" The key phrase here is "for limited Times". Retroactively extending the duration of copyrights or bestowing perpetual copyrights is plainly unconstitutional.

    Needless to say, unconstitutionality has never prevented legislators from passing unconstitutional acts, and the late Sonny Bono had his day [ober.com] in Congress a year ago and changed the rules [unc.edu], and we'll all suffer for it. There are some legal battles being fought on this issue, but I can't seem to drag up the references.
  • Take a look at WordNet [princeton.edu]. You can use their online version [princeton.edu] or download [princeton.edu] it.

    It has also been formatted for the DICT [dict.org] protocol. I wrote a web interface [antiflux.org] that accesses WordNet and a number of other dictionaries. (dict.org has one too, but I like mine more... and also I noticed theirs after I was finished.)

  • I noticed that that one was free, but that seems a bit long. I really think they deserve a very good domain name (as a side note, etext.* is taken as well!)

    --GnrcMan--
  • by Anonymous Coward

    .... a lot of nonsense about adding HTML formatting to texts. This would be a huge mistake.

    Why? How is a book supposed to provide proper illustrations if it is restricted to ASCII characters?

    If you want to see the texts presented in HTML grab the texts ... write up a perl script to format them and start serving them up from your site.

    But the point was that PG loses information in their obsolete process. It is not possible to restore the lost information unless one can refer to the original text. But PG won't tell you which text was used. So it turns out to be easier just to OCR the text and illustrations anew.

    The gist of this comment seemed to be that PG was not useful, because the texts were mainly western literature and in english? This is so bogus it's hard to address. The bottom line is that you have to start somewhere.

    Yes, PG was a good start in 1971. But it hasn't kept up. Its backwardness in restricting everything to ASCII will prevent it from accommodating non-ASCII texts. Already there are other archives publishing works that are not in ASCII. Concentrating our attention--and channelling our funds--to PG wrongly demeans those other projects, that obviously cannot restrict themselves to ASCII.

    PG can do whatever it wants. But what it is doing should be open for discussion, not controlled by one person's prejudices.

  • What about Palms? I don't actually own one myself, so I don't know about how hard they are on the eyes for extended periods.

    Palms are nice, and are cheap when compared to other handheld devices, and are amazingly useful, but they are limited to about 13 lines of text per screen, with an average of about 30 characters per line (that's my guess anyway . . .), so I don't think they are best suited for something like this.

    The development guides that are available from 3com even note that they are meant to be used as an auxilary device to a PC for the most part. That being said, in a pinch, one would certainly work, though.

  • There are some legal battles being fought on this [copyright extension] issue, but I can't seem to drag up the references.

    See the Berkman Center site [harvard.edu] for the case to overturn the Bono Act. All the briefs are there. Also, the briefs are being written in a novel openlaw process--everybody, including the other side, gets to contribute!

    Project Gutenberg is not part of the suit. PG is restricting itself to pre-1923 works. Also, Open Source people have had odd reactions--they seem to believe that strong copyright laws are good, as long as there is a free license.

    The spirit that originally motivated Project Gutenbergers ought to move on to a larger movement that unites with Open Source advocates in many fields other than free books. The public domain needs to be as important in our thinking as the environment has become since the 1970s. Open Source and free online book people need to unite with other advocates of a better intellectual property principle for our laws and public policy. This would include the human genome, software patents, patents on the food (agricultural products) necessary for life on this earth, all digital media, vaccines and medicines, and many other areas where large multi-national corporations now based in the U.S. are attempting to assert exclusive intellectual property rights.

    The lawsuit against the copyright term extension is only a first step, but it could present the Supreme Court of the U.S. with some ideas that could form a better intellectual property theory to move on to the 21st century. Otherwise, if we Open Source people are continually turned down, we face being isolated and marginalized, over against a rampant free-market capitalist monopoly of our ideas.

  • Someone is writing open source sci fi novels:
    Mike Combs, mikecombs@aol.com
    who has his sci fi stories at:
    http://members.aol.com/howiecombs/hard_s-f.htm
    of which I really liked the novel 'A Bridge to Space':
    http://members.aol.com/howiecombs/bridge.htm

    Here are links to free online book sites:
    http://www.stanford.edu/~sothy/books.html
    http://samizdat.mines.edu/
    http://www.icemall.com/free/free_books.html
    http://www.ipl.org/
    http://www.itlibrary.com/
    http://www.cs.cmu.edu/books.html
    including Project Gutenberg:
    http://promo.net/pg/list.html
  • I agree that Project Gutenberg is a great thing, and I also think it's wonderful that it's getting some additional publicity.

    It definately does need to have markup codes, though. I'd personally prefer XML, because it would allow the documents to include book-relevant tags like <chapter number> and such, which would make the e-texts a great deal more machine-readable, and accessible for everyone. (in addition, it would make it a lot easier to re-publish the texts.)

    I design and typeset books for a living, so I know what I'm talking about when I say that it's a lot easier to remove or process existing codes than it is to insert them. A machine can easily reformat a marked-up document; if plain text is wanted, a one-line perl script can be written to remove everything within angle brackets. The reverse is not the case. Computers are not currently smart enough to know where to add tags, so right now, every single tag in a document has to be inserted by a person.

    All too often, that person is me. ;-)

  • I don't really understand licenses that well -- this is just my uneducated opinion.

    I don't think the GPL would work well with something other than software. Once I tried to think about how people could copyright music under less restictive licenses. You'd want to copyright a song (not necessarily a given recording of the song) so that coffee shop/bar bands could legally sing it, but you have to do something to preserve the integrity of the art. I don't think the GPL really does that, because with the GPL people can modify your work and distribute those modifications. This has practical value in the software community, but in the music community people want their work to remain unique and intact. I assume that authors would feel the same way.

    What you'd need, in my view, is a copyright license that allows people to distribute an etext freely and ensures that no one down the line can take that freedom away. However, people should be forbidden from altering the etext, and the author should always receive credit for the work. That way, you can give your stuff to Project Gutenberg without fear of compromising its integrity.

    This is just an idea, and I know that it isn't a perfect solution yet. But I think that a license based on these ideas could be worked up and actually used by authors, musicians, and artists to promote the exchange of ideas and information. That's really the spirit of the GPL anyway, right?

    Take care,

    Steve
  • *Amazon (someone mentioned this) is a _bad_ idea. Profit motive and releasing free documents don't coincide well.

    I suggested Amazon earlier, but I guess I should have argued my point rather than just suggesting it. Why don't profit motive and releasing free documents coincide well?

    Profit motive and free documents can coincide perfectly, and work to each other's mutual benefit. The free software world shows that the profit motive, demonstrated by companies like Red Hat, may in fact be the *best* way of supporting the development of free stuff (software, documents, and who knows what else). What Project Gutenberg needs is publicity, and who can do publicity better than companies like Amazon with plenty and of money and marketing skills?

    But why would Amazon want to help PG? For the same reason why enlightened bookstores make it easy for customers to browse through books -- letting customers browse increases sales; putting links to PG texts brings this browsing experience online (imo, there isn't much worry that the customer will just read the whole book online rather than buy it -- reading a whole book online is just too unpleasant).

    Furthermore, the PG deals with copyright expired books, so the market is different in most cases; linking to PG is just another value added service that online booksellers like Amazon can provide for their customers.
  • Being a professional philologist, I must criticize the code quality of Gutenberg e-texts. Gutenberg texts rarely acknowledge the edition they rely upon and lack any structural markup (indicating the pagination, italics, spelling variants etc. of the original text). From the viewpoint of scholars and 'professional' readers, they are practically unusable because of that. Imagine Linux and GNU were not cleanly coded re-implementations of a sophisticated operating system (Unix), but a DOS clone hacked in BASIC, and you get the picture.

    The question here isn't whether to use ASCII, HTML or LaTeX, because there already is a highly developed, sophisticated markup language for electronic text editions, TEI-SGML [uic.edu], specifically designed to preserve all structural information of the original text. Some e-text projects such as the Victorian Women Writers Project [indiana.edu] code in TEI-SGML. This is not only good for scholars/literature hacks, but also allows lossless reformatting of the source code into HTML, ASCII, PDF, RTF, etc..

    The Gutenberg Project certainly was a good idea and a great achievement when it is founded, but might have to rethink its coding policy. Other e-text projects are already doing better here.

  • The Palm computing platform is the big plus

    I read PG texts on my Palm IIIe all the time, but I have to take the time to download the text, convert it to Palm format, then install. If PG were to take their top X number of downloads and make them conspicuously available to Palm users, it might go a long way to increasing visibility.

    The problem is, you may be preaching to the converted. Palm users are tech savvy; tech savvy people are already aware of PG.

    Still, it would help me. Start with Shakespeare.
  • I absolutely agree. PG make the argument that ASCII text is the only format universally readable to (almost) all computers.

    However, if they were to mark up their texts in the XML-derived-markup-language of their choice, then their work would be so much more of a service to humanity.

    From Elliotte rusty Harold's "XML: Extended Markup Language" (a bit old now), discussing Jon Bosak's marked up versions of the complete plays of Shakespeare:

    "What does this system offer over a book or even a plain text file? To a human reader, the answer is not much. To a computer doing textual analysis, however,it offers the opportunity to easily distinguish between the different elements into which the plays have been divided. For instance, this system makes it simple to extract all lines spoken by Romeo in Romeo & Juliet."


    Then there's stuff like text to speech -- markup would help the reader with intonation, etc.

    ... and so on.


    --
  • ""The Wizard of Oz" was originally written to promote the coinage of silver in the late 19th century."

    And Frank L. Baum was also a racist (as many were of his time). He actively promoted and supported the idea of extinguishing all Native Americans to "put them out of their misery" so to speak.

    History differs a lot according to who tells it ;)
  • I agree, but I wouldn't denigrate the need for pure ASCII represnetations either. XML is more useful than plain ASCII, but ASCII is going to be useful to everyone who can pipe the output to a printer, no matter what software they have.

    I think they should accept XML submissions, process the XML to produce pure ASCII, and then make both availbale. That way they have the power of XML while still having archived the least common denominator.

  • I think that XML would be a better choice. It would be a bad idea to stake our intellectual heritage on a platform that is subject to a fight between vendors. However, XML could be rendered a number of different ways, including browser specific HTML or plain old ASCII, and will eventually be rendered directly by browsers.


    It would also facilitate content based indexing. After all, it's the content that counts.

  • I think people misunderstand when they complain the PG texts are too hard to read because they're not formatted or they don't like reading on their computer.


    In fact, you can use the Gutenburg text create your own edition, with your own illustrations, introduction and footnotes, and publish it. Perhaps for distribution to your students if you are a teacher,or perhaps to the world at large.


    The Guteburg copyright restrictions are a lot like the BSD license. They're aimed to get the work used as widely as possible. If you modify the work, you just strip out the Gutenburg notices and it leaves you with the unencumbered text to do what you will.


    This is a incredible idea, and one that deserves support. Maybe they should be nominated for a MacArthur grant?

  • Most people seem to agree that the PG provides a useful service by putting paper texts into electronic format. Almost everyone thinks that it would be better in some other format (ASCII, XML, SGMS etc.)


    Since the content is freely reusable, strip off the Guteburg notices and make your own archive of e-text's in your favorite format! They've done the hardest part of the work, HTML-izing the work with CSS should be a breeze.

  • Who says it has to be this way [mostly meant for stuff like Shakespeare or other important literary works]?

    Project Gutenburg.

    Who says books have to be this way to be online?

    Project Gutenburg.
    http://www.promo.net/pg/history. html#beginningphil [promo.net]

    Where does it say that this is PG's purpose,

    Here: http://www.promo.net/pg/history.h tml#theselection [promo.net]

    and who wrote that?

    Project Gutenburg.

    Wow, you've impressed me with your fine reading skills. Next time, try reading the source material before engaging in combative behavior.

    You are doing a hell of a lot of whining about something that is completely free and explicitly tells people to add markup if they so choose. If you can't understand them, it's probably because they are thinking in the long term, and you arent'. Quote from the above referenced page:

    Alice in Wonderland, the Bible, Shakespeare, the Koran and many others will be with us as long as civilization. . .an operating system, a program, a markup system. . .will not.

    Quit whining, AC.
  • Every submitter formats the text differently, and the inline ("botton of page") footnotes are a real annoyance.

    Definitely. I'd like to see a more "interactive" way of doing footnotes. Check out how they're done in LyX. That's really nice. Maybe LyX versions of the books should be done...


    BTW: I read "Sun Tsu" as well. Way cool...

    A suggestion: Read Machiavelli. There are two books, often put together, called "The Prince" and "The Discourses." I started reading the PG version and ended up buying the dead-tree version because I wanted to read it on the bus and didn't want to waste my laptop's battery life.

    Basically, If you enjoyed playing Civilization, you'll find a fair bit of familiar stuff in there. Now I have CTP and my playing style has changed dramatically.

    Now, if I could only fit the texts on my TI-86....
  • One problem might be that Micro$oft has claimed in the Calera suit that it lost the source code to DOS.

    Hmm, seven major versions, countless minor versions, over the span of many years...

    Ooops, all lost. Even Win95/98... Joe Bob brought them home for an elementary school project and his dog ate them all...

    The court didn't actually *believe* that did it? For a lie, that's pretty damn boldfaced!
  • English is widely spoken and important international language. It would help any developing country to have people who read and speak English, because it can improve their access to information, capital and trade. It benefits the country to develop a class of literate people who at least understand western culture, even if they don't have to agree with it.

    To learn English, you need books. Maybe the kids don't have a computer, but it's a fair bet that somebody in the country has a computer and access to a printing press. That person could print Gutenburg texts, with local language introductions and footnotes on difficult words or phrases. They don't have to wait for some western publisher to decide there's a market in third world edition of English language classics.

    Does it solve all the problems of the third world? Of course not. But freedom of information benefits everyone, even when they don't have immediate access to it.
  • Actually, they are very nearly as old as the net. PG was started in 1971. The original 4 computers of ARPANET were hooked up in 1969.

    --GnrcMan--
  • After reading this link, I'm definitely going to write Mr. Hart a generous check. Not just because the project is worthy (which it is), but because eccentricity on this grand scale deserves support in its own right.

    If I had a dime for every time I compromised on something I believed in because everyone around me seemed to be sure that doing it a different way would be better, I'd send it to him.
  • I don't know why they don't use the name [networksolutions.com] (names [networksolutions.com]) they own.

    Anyway, the reason I'm posting is that I've registered the name anthology.org (not yet active) to provide a directory to etexts from various collections/projects. Sorta like a card catalog (or maybe the interlibrary loan database? whatever) -- probably a yahoo-style navigation. I'm sorta surprised that I no-one else has done this (with high visibility, anyway) -- does anyone else think this would be useful?
    -
    <SIG>
    "I am not trying to prove that I am right... I am only trying to find out whether." -Bertolt Brecht

  • Much of the work of the Supreme Court is in ferreting out the original intent and applicability of the Constitution and federal laws. In this case, the original intent is clear, as documented by Tim Phillips [asu.edu]. (Well worth reading, as is the entire ~dkarjala site). Thomas Jefferson wanted a term that was equal to the mean remaining life expectancy for an adult. This was 19 years at the time.


    In other writing, he is very explicit about naturalness of public domain.

    It would be curious...if an idea, the fugitive fermentation of an individual brain, could, of natural right, be claimed

    in exclusive and stable property. If nature has made any one thing less susceptible than all others of exclusive
    property, it is the action of the thinking power called an idea, which an individual may exclusively possess as
    long as he keeps it to himself; but the moment it is divulged, it forces itself into the possession of every one, and
    the receiver cannot dispossess himself of it. Its peculiar character, too, is that no one possesses the less,
    because every other possesses the whole of it. He who receives an idea from me, received instruction himself
    without lessening mine; as he who lights his taper at mine, receives light without darkening me.

    --
  • Since I entered the Web in 94 Gutenberg has been one of the most important points I have found there. What it has done is of fundamental importance. Let us note that some of the texts are considered World Literature. Besides project Gutenberg allows us to reach literature that hardly one can find today.

    However I am very critical of project Gutenberg in other point. It is good to be conservative. Specially if we consider the nature of this project. However it is too much conservative.

    Project Gutenberg always suffered from a illness of having a very primitive search interface. Or by preserving for too long an interface that is morally old. The problem is that sometimes it may not be only necessary to search books by author or title. There are a lot of other search classifications and tools. One of the most important is to search for specific context much like Altavista or Excite do. If project Gutenberg wants to deliver availability then it needs to work on this.

    The other point is the cumbersome nature of texts. I agree that it was rather dangerous to choose a text format that could deliver some incompatibility in the future. But that was good in 1994. Today HTML is standard, SGML is standard, XML is new but it is also a standard, TEX may not be so popular but it is also a standard, PDF may carry a commercial tone but anyway it is a standard. And there are tons of tools for converting and reconverting from one standard to the other. So it is time to rethink the standards.

    Other point is organisation. Project Gutenberg was and is badly organised. This may look as a seen for some but I really think that a little bit of marketing would help the project a lot. And maybe a little commercial flavour would help even more. Much like what RedHat is to Linux. Gutenberg needs a face. It needs a design. It needs to deliver people something. Frankly no one is borned with the name Oesopus burned in big letters in the brain.

    I don't pretend that Gutenberg should become another Amazon. But I think that by making literature a free tool, by delivering an infrastructure in a very GPL'ed nature and, by building a commercial basis for more complex tasks and material support, I believe that Gutneberg may become another lighthouse of the Web.

    Sincerly it would be sad to see project Gutenberg closing its doors. Yes we hackers may give some help on making tools and helping project Gutenberg with some design and technical support. Humanists may help by translation, classification and analysis. We may try to push a marketing campaign all over it on our own resources. But this will not save the project if there is not an organisation. If there is not a mechanism to deliver people that the world does not end on The Matrix and Coca-Cola. And if we don't take the care to feed the project with some material resources that may be needed for its future.

"Beware of programmers carrying screwdrivers." -- Chip Salzenberg

Working...