Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Media

Universal Ebook Format Debated 277

Amy Hsieh writes "A well-known ebook industry expert, Jon Noring, recently wrote an interesting article for eBookWeb, formally calling upon the ebook industry to adopt a single universal ebook distribution format. Right now there's a plethora of essentially incompatible ebook formats, and this format 'babel' is hampering the growth of the ebook industry. In the article, Mr. Noring proposes a promising open-standards candidate which appears to meet a list of basic requirements: The Open eBook Forum's OEBPS Specification. Andy Oram, a Linux programming editor for O'Reilly, wrote an interesting reply to the article that should also be read." On the other hand, Noring's proposal has also met with some skepticism elsewhere.
This discussion has been archived. No new comments can be posted.

Universal Ebook Format Debated

Comments Filter:
  • My ebook format (Score:4, Informative)

    by RenQuanta ( 3274 ) on Thursday June 05, 2003 @07:44AM (#6122353) Homepage
    Is Project Gutenberg [promo.net] and a Palm Pilot.
  • FictionBook XML (Score:4, Informative)

    by ironhide ( 803 ) on Thursday June 05, 2003 @07:50AM (#6122375) Journal
    There also is this e-book xml format:
    http://haali.cs.msu.ru/pocketpc/FictionBo ok_descri ption.html

    I use his excellent HaaliReader as a text reader on my pocketpc (fullscreen, landscape mode). There are also html2xml and word2xml tools on his site.

  • by Anonymous Coward on Thursday June 05, 2003 @07:51AM (#6122377)
    The Project Gutenberg Etexts should so easily used that no one should ever have to care about how to use, read, quote and search them ...

    This has created a need to present these Project Gutenberg Etexts in "Plain Vanilla ASCII" as we have come to call it over the years.

    The reason for this is simple. . .it is the only text mode that is easy on both the eyes and the computer.

    However, this encourages others to improve our etexts in a variety of ways and to distribute them in a variety of the available media, as follows:

    Once an etext is created in Plain Vanilla ASCII, it is the foundation for as many editions as anyone could hope to do in the future. Anyone desiring an etext edition matching, or not matching, a particular paper edition can readily do the changes they like without having to prepare that whole book again. They can use the Project Gutenberg Etext as a foundation, and then build in any direction they like.

    Thus any complaints about how we do italics, bold, and the underscoring, or whether we should use this or that markup formula are sent back with encouragement to do it any ways any person wants it, and with the basic work already done, with our compliments.

    The same goes for media. We have had a long-standing work ethic of providing our etexts in any medium people wanted: Amiga, Apple, Atari. . .to IBM, to Mac, to TRS-80. . .

    However, now that our etexts are carried in so many BBS's, networks and other locations, it is easier to download the file in a manner that puts them in your format than we can make and mail a disk, so we don't really do that too much.

    The major point of all this is that years from now Project Gutenberg Etexts are still going to be viable, but program after program, and operating system after operating system are going to go the way of the dinosaur, as will all those pieces of hardware running them. Of course, this is valid for all Plain Vanilla ASCII etexts. . .not just those your access has allowed you to get from Project Gutenberg. The point is that a decade from now we probably won't have the same operating systems, or the same programs and therefore all the various kinds of etexts that are not Plain Vanilla ASCII will be obsolete. We need to have etexts in files a Plain Vanilla search/reader program can deal with; this is not to say there should never be any markup. . .just those forms of markup should be easily convertible into regular, Plain Vanilla ASCII files so their utility does not expire when programs to use them are no longer with is. Remember all the trouble with CONVERT programs to get files changed from old word processor programs into Plain Vanilla ASCII?

    Do you want to go through all that again with every book a whole world ever puts into etext?

    The value of Plain Vanilla ASCII is obvious. . .so is very much of the value of most of the various markup systems we have in the world. But until some real standards arrive-- we would be limiting our options a great deal if we do not keep copies of all etexts in Plain Vanilla ASCII as well.

    We don't have anything against markup. Not vice versa.

    Alice in Wonderland, the Bible, Shakespeare, the Koran and many others will be with us as long as civilization. . .an operating system, a program, a markup system. . .will not.

    This includes the many requests we have for compression in particular formats. There are only two formats we know of that are suitable for transfer to a wide general audience: Plain Vanilla ASCII (.txt files) and ZIPped files of them, (.zip files). Requests for other compression formats must be ignored as they are appropriate only for small portions of our target audience. However, (programmers take note: we will need help) we are planning to put some compression links on our files so they can be transmitted in any of an assortment compression formats on the fly. i.e. we should be able to generate any kind of file asked for, but we can keep only one copy of each etext on our servers. . .as the .Z compression format does in a similar manner today.
  • Re:Babel? (Score:5, Informative)

    by aldousd666 ( 640240 ) on Thursday June 05, 2003 @07:51AM (#6122379) Journal
    I thought we already had a standard: HTML

    Shows what I know.

    A couple of side notes: And how can you not know what babel is? Babel: Tower of babel: a story from the bible where King Nebekenezur (there is no correct spelling for that in english, just commonly accepted ones) wanted to build a tower to god, so god being jealous, put a spell on everyone, and they all ended up speaking a different language. It's how the christians believe that there came to be multiple languages.

    Now the website babelfish gets its name from 'The Hitchikers Guide to the Galaxy' by Douglas Adams, where the characters 'stick a babelfish in their ear' to act as a universal translator.

  • Re:Babel? (Score:2, Informative)

    by jpmahala ( 181937 ) on Thursday June 05, 2003 @08:13AM (#6122450)
    BZZZT. Wrong.

    Nebuchadnezzar lived during the time of Daniel. The events of the Tower of Babel are chronicled in the book of Genesis.

  • You obviously didn't read the article.

    PDF (while a great standard) doesn't do reflow very well. So on a handheld - page size becomes a total pain in the arse.
  • by SkewlD00d ( 314017 ) on Thursday June 05, 2003 @08:29AM (#6122549)
    Everyone in academia uses LaTeX and PostScript, since PDF is silly and HTML doesn't have layout features.
  • by Anonymous Coward on Thursday June 05, 2003 @08:30AM (#6122555)
    Then the writer didn't do his research - Adobe has an ebook compatible PDF format
  • by Anonymous Coward on Thursday June 05, 2003 @08:33AM (#6122582)
    I don't think he was debating that - he was debating standards. The iTMS doesn't use a standard encoding format - there are also COMMON standards out there that would have been as eay to use (in my opinion) and I suppose the ogg vorbis format would have made a lot of /.'s happy too! But in the end, I suppose they didn't go with ogg because they knew Linux hacks would QUICKLY and easily break the encoding/encryption scheme.
  • pdf{tex,latex} (Score:3, Informative)

    by Grendel Drago ( 41496 ) on Thursday June 05, 2003 @10:39AM (#6123687) Homepage
    While everyone may use LaTeX, PDF has become more and more popular for web distribution of papers. PS works fine when you're just sending it to the printer, but because Adobe didn't include PS support in Acrobat, Windows users don't bother.

    But TeX/LaTeX has the advantage of being pretty much immutable, second only to plain TXT on that count. The standard hasn't changed since, what, 1982? Hopefully we'll be able to process the same documents with the same tools fifty years from now.

    I think the important distinction between, say, Word format and TeX is that TeX is a piece of systems programming---it performs a well-defined task in a well-defined matter, much like lex or yacc do. An attempt to add 'features' is nonsensical. (Though functionality can be extended through the use of, say, pdftex.)

    --grendel drago
  • On Beyond ASCII (Score:4, Informative)

    by Creosote ( 33182 ) * on Thursday June 05, 2003 @10:43AM (#6123728) Homepage
    I understand the support in a lot of the comments here for the plain-vanilla ASCII Project Gutenberg approach to ebooks. Paradoxically, however, a simple ASCII conversion from print to digital form provides less assurance of future survivability and usability of your book than rendering it with the structured XML markup specified by the Open eBook standard (where well-formed XHTML is the least common denominator).

    Why? Well, an ASCII text version of a printed book is really more like an analog facsimile than is a version in XML that has been tagged for structural features. Leaving aside issues of non-English characters, illustrations, and unusual typography, ASCII does a relatively poor job of capturing all of the structural conventions that exist in printed books. Books have copyright pages, tables of contents, chapter titles, subtitles, bylines, epigraphs, block quotations, footnotes, running headers and footers, citation lists, etc. ASCII can provide rough format equivalents of some of these, very poor equivalents of others. With an appropriate XML tagset, however, it's a relatively simple matter to tag most of the structural features of a book and then use stylesheets for presentational rendering. That's the whole assumption of the Open eBook specification.

    Suppose you're in a world where all printed copies of Huckleberry Finn have been lost. You have two CD-ROMS that somehow you've managed to decode so that you can read the files and interpret their character sets. One of them contains the Project Gutenberg [ibiblio.org] etext of the novel, an ASCII transcription. The other contains an XML encoding tagged according to a DTD from the Text Encoding Initiative [tei-c.org], the current best standard for encoding literary (and many other) texts. It has all of the textual content of the PG version, as well as some that's missing (like the table of contents and the copyright page from the transcribed edition, which the PG version unaccountably omits). XML tags mark all the line and page breaks of the original. In addition, there are tags to mark quoted speech, unusual typography, words in foreign languages, and other significant features of the original. The CD-ROM contains the DTD used along with documentation on the tagset.

    In this imaginary scenario, even if all of the XML documentation were missing it would be pretty straightforward for 31st-century programmers to strip out the tags and recreate the ASCII transcription. But with the documentation, it's possible to reconstruct something much closer to the original than the plain-vanilla PG version allows. And suppose your 31st-century archaeologist found a trove of TEI-tagged books on CD: with all of the structural tagging and metadata about authorship, publication dates, etc., a 31st-century librarian will be able to plug all of the books into a cataloging system that allows sophisticated searching. If instead you had a trove of plain-ASCII books, the best you could do with the collection would be simple full-text searches.

    Leaving aside the sci-fi scenario, the reality is that our documents, over the next few decades, will move from format to format and be used for purposes that we can only guess at right now. Of course plain ASCII, or even proprietary formats, will be better than no documents at all. But the work involved in converting them will be a lot higher than if they are tagged in a well-documented, structured markup language.

    Incidentally, there's already at least one project underway [hwg.org] to take Project Gutenberg texts and add minimal XHTML or XML markup to capture structure and make them more readable via stylesheets. The Open eBook specification is just a more sophisticated way of doing the same thing.

  • by gblues ( 90260 ) on Thursday June 05, 2003 @10:57AM (#6123843)
    PDF (while a great standard) doesn't do reflow very well. So on a handheld - page size becomes a total pain in the arse.


    Like hell it doesn't. Like many things to do with PDF, it all depends on what you use to create your PDF. You'll find that a PDF created from a page layout program (PageMaker, InDesign, FrameMaker) through Acrobat Distiller reflows a lot better than a PDF made from MS Works using some archaic version of Acrobat.

    Nathan
  • Dynamic fonts (Score:3, Informative)

    by yerricde ( 125198 ) on Thursday June 05, 2003 @11:43AM (#6124323) Homepage Journal

    With CSS there's not a lot HTML can't do with layouts.

    No free, mature implementation of HTML and CSS can render a font not installed on the user's machine from outline data stored in the document. Mozilla has a bug on this open in bugzilla.mozilla.org (bug 52746), but it doesn't look like it's going anywhere. And no, "just replace with Helvetica, which is installed everywhere" is not an option because Helvetica for every non-Latin writing system is not installed on every reading device.

  • Re:HTML? (Score:3, Informative)

    by jc42 ( 318812 ) on Thursday June 05, 2003 @04:13PM (#6126717) Homepage Journal
    Plain ascii text isn't too bad, if you only write in English, and you don't want your text to look nice on tiny PDA screens.

    But sensibly-done "plain" HTML is generally better. For an example, look at Baen.com [baen.com]. In the upper right is a "free" link that points to a bunch of sci-fi works that are online. You can get them in several formats. The HTML is a good choice in most cases, because it's not overly fancy, but produces good rendering in just about any HTML-capable window on any size screen.

    So, even if you have a big screen, you can load the text into a narrow window along one side of your screen, and read it while you're waiting for a compile or a test run.

    Of course, there's the inevitable problem of junk HTML produced by such things as Microsoft's various editors and word processors. But this isn't HTML's fault; it's the fault of the idiots who foisted such software on unsuspecting customers. And even then, most HTML renderers will display it sensible, so the only real problem it causes is the long download time for all the spurious junk that clutters up the text.

    (Baen.com also had a thoughtful and entertaining essay on why they give out a lot of their books for free. It's an interesting summary of the impact of the Internet from an author's and a publisher's perspective.)

"The only way I can lose this election is if I'm caught in bed with a dead girl or a live boy." -- Louisiana governor Edwin Edwards

Working...