Effective XML 269
Effective XML | |
author | Elliotte Rusty Harold |
pages | 304 |
publisher | Addison-Wesley |
rating | 8 |
reviewer | James Edward Gray II |
ISBN | 0321150406 |
summary | A guide to the correct use of XML. |
Before I tell you what's inside though, let me tell you what you won't find in these pages. Primarily you need to know that this book does not teach XML. I know a lot of books say that, yet still include an introduction or appendix that covers the basics, but this isn't one of them. You're expected to know XML from page one. Even syntax is only covered from a proper usage angle. Personally, I appreciated this. It always bothers me when an obvious non-beginner's book starts off by wasting a chapter on things I should already know. You just need to be aware when you buy that you won't learn XML here. Knowledge of namespaces, DTDs, the W3C's Schema Language, XSLT, and more aren't strictly required to get something out of this book, but they certainly would help you get a lot more out of it.
What you will get here is coverage of fifty miscellaneous topics spread across four sections on "Syntax", "Structure", "Semantics", and "Implementation". In "Syntax", ten topics delve into the details of things like DTDs, entity references and the XML declaration itself. It may sound silly to dig deep into a single line of XML that simply declares the format, but I doubt you will think so after reading that topic. There's a lot going on in that line and you want to be in control of those decisions instead of just copying and pasting. Entity references are an even smaller chunk of XML output, but they too get illuminated by a rare insight on how and when they should be used, and for what. Did you know that it is possible to write a namespace savvy DTD? I do now and I learned that in this section as well.
The second section of the book covers "Structure", and to me it was the best part. This collection of seventeen topics is loaded with good advice about how to build an XML document that will be ideal for anyone who needs to work with it. Here you see how metadata should be stored in XML, get tips on embedding binary content, learn which schema language is better for which tasks, and finally understand rare XML constructs like processing instructions and exactly what they are for. Additionally, there's a lot of general advice on the right way to mark up content that's really worth its weight in gold. Just one example of what I learned here is that I under appreciate mixed content for great constructs like <name><given>John</given> <family>Doe</family>, <title>Ph.D.</title></name>. If you like that, you'll enjoy this whole section.
Section three, "Semantics", deals primarily with parsers and their APIs. Again, you won't learn any APIs here. What's covered is their strengths and weaknesses and why you should choose a given API for a given task. SAX and DOM are the main focus of these ten topics, but there are other details sprinkled in, like XPath.
The fourth and final section is all about "Implementation". The thirteen topics here address client-side XML styling, server-side transformations, signatures, encryption, compression, and more. My favorite topic here was a terrific coverage of Unicode and how it affects XML. All developers should know at least as much about Unicode as what's printed here and this is a fine source to learn it from.
One thing that really stands out in the whole text is that the author isn't afraid to cover the dark side of XML. He will tell you where the design process was less than perfect, which tools have little practical value, and some of the problems with where XML technologies are headed. This isn't complaining though. All of this is targeted at how it affects XML developers today. You learn what you can safely skip and what should be outright avoided. The author even tells you what XML is bad at and gives you advice about when you shouldn't use it. That's the mark of a man who knows his subject, if you ask me.
All told, I think the author failed to completely convince me his way is perfect on only 2 topics. That means I learned 48 expert XML tricks. Surely that's worth the cost of the book in time and money. This isn't the first XML book you need, but I think it is the second XML book everyone should read.
You can purchase Effective XML from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
Binding (Score:3, Funny)
Re:Binding (Score:2)
Re:Binding (Score:3, Funny)
Re:Binding (Score:5, Funny)
Surely you mean physics book a lagrange
Hey, come on... (Score:5, Funny)
Get with the program.
Re:Hey, come on... (Score:2)
Not the same as "program to an interface, not an implementation"... (The three amigos book)
hmmm (Score:2, Interesting)
Any ideas what those 2 are?
Re:hmmm (Score:2)
Any ideas what those 2 are?
1. XML is a good idea.
2. XML is an efficient format for wire protocols, internal program messages, and databases.
Actually I'm just kidding; there are definitely places where it has a purpose. Although I will probably never get why a closing tag requires a repeat of the file opening tag name...
Re:hmmm (Score:2, Informative)
Not sure if you were serious here or not, but this is necessary to disambiguate the following improperly formed XML:
<start> Now is the time for all good men to come to the aid of their <noun>country</noun></phrase>
which is either missing a "phrase" start tag or mixed up the start & end tags... in a long XML document, the parser can give you a better hint where to look for the e
Re:hmmm (Score:5, Insightful)
SGML (XML's precursor) did have minimized end-tags like . Experience proved this caused more pain than it alleviated. Hence the lack of minimized end-tags in XML.
I believe you meant... (Score:3, Informative)
The Problem With XML (Score:5, Interesting)
Re:The Problem With XML (Score:2)
Re:The Problem With XML (Score:2)
Well among other problems, you typically have to parse & load the whole document in order to extract even a single piece of information. A DOM parser operates thus. A SAX parser may let you read from the top to where your data is.
Contrast this with a binary format where you could navigate directly to your data and read a value. Imagine if you had to parse your whole database file in order to run a select statement.
Re:The Problem With XML (Score:3, Interesting)
But the question was if it is a universal data description language. Sending binary will kill your data the first time you try to comunicate to a macitosh or Unix system (big endian, little endian).
The common lowest denominator is just text, so to describe any structure we have trees in XML.
Probably the confusion is the influence of Object Oriented design with Entity Relationship schemas in databases. The way that one-many relationships are described in both areas makes sparks
Re:The Problem With XML (Score:3, Informative)
Take XPath as an example. How do you extract the fragment pointed to by the expression
foo/bar/fie[@naja='hehe']
? You read the document, counting opening and closing tags, until you read in a foo-tag at topp-level then you continue, counting as before, until you, before a foo-ending tag at topp level, reaches a bar-tag at second level, and then until you reach a fie-tag with the attribute naj
Re:The Problem With XML (Score:2)
Try writing an XML parser in binary (no compiler) and see how difficult it is.
Seriously, your arguments are not sound in this context.
Re:The Problem With XML (Score:2, Informative)
Re:The Problem With XML (Score:3, Informative)
http://www.joelonsoftware.com/articles/fog0000000
Re:The Problem With XML (Score:2)
Many RDBMSs use variable length records.
I think you're talking about fixed-length record databases, like might be found in some embedded databases (like Berkeley DB, which also offers variable-length records).
An RDBMS is very different from XML data.
Re:The Problem With XML (Score:2)
Re:The Problem With XML (Score:2, Insightful)
The validation etc is more difficult, but then it's not a matter of parsing the XML in the first place.
It matters what you mean, but in general XML is easily parsed by machines... and easily represented in internal datastructure which are however efficient you make them.
Re:The Problem With XML (Score:2)
Re:The Problem With XML (Score:5, Insightful)
People are trying to use XML for something other than for which it was intended then complaining at the sub-standard results. Surprise? XML is a common format to make it possible to move data between different, I'll use the word "domains" (as in division not URL), it should be used for "just" that.
In other words, XML should be a "transport" mechanism. It's so I'm not writing a new parser by hand everytime some wanker like you sends me a file in yet another made-up-on-the-spot type. Your example is relatively clean but in the real world as the data gets harder to describe, humans start to make more ignorant made-up-on-the-spot rules like "Well ok if theres a sub record the line will start with a -, well ok it could be a + too, if the subrecord can only contain numbers... no you know what lets make it -n if the sub records can contain numbers only..". No matter how ingenious your "format" is, the problem isn't your format, its that your format isn't my other customers format.
XML should be used in scenarios where the time spent being able to use all the readily available XML parsing and validating tools you don't have to re-invent the wheel writing is more than the milliseconds saved parsing a longer document "once".
Don't use XML as your main, permanent, datastore for a gigantic database and complain. It's not for that. Its for when I need a copy of your data and I don't want to pay for a copy of "JackoffDb version 5" that you run, or hire a team of programmers to write a translator just to read your files. Gimme XML, I can take that and understand its contents and schema with ease, then Ill import it into my own system here.
Re:The Problem With XML (Score:3, Interesting)
To be specific having spent the last 3 years working on XML I can suggest that there are numerous problems with XML.
XML Tagging is tedious and stupidly top heavy in overhead. Contrary to being human friendly it isn't. XML Tagging should be shortened to a simple set of defined tag names and then type definitions. After that each name would be addressed by an index. Typing of data should be contained in a process to extract that is associated with either the tagging index or an over the top wrapper which
Re:The Problem With XML (Score:5, Insightful)
I would not be so quick to dismiss XML because of traditional arguments. Having worked with several different ways of storing and transmitting structured information, I can say without question XML comes out easiest in the end.
If you're only transmitting 10 characters, then yes XML is not for you. However, if you're describing dynamically changing, complex data, even in large amounts, XML is very handy.
There are turnkey parsers for XML that are well tested and which allow the client to see an abstracted view of the data as an object, at any level of detail desired.
Platform independence is built in.
It's easy to syntactically validate XML, as it's done automatically. It's also easy to isolate logical validation into discrete units since XML couples easily to object oriented designs.
Very large XML messages can be processed quickly using a pull parser. Pull parsing is faster than SAX and has the intuitive benefit of being client driven, not event driven.
Re:The Problem With XML (Score:2)
Re:The Problem With XML (Score:2)
However your pre-cooked parser comes with a severe limitation... complete lack of flexibility which implies verbosity. Grammar files are immensely more flexible and just as precise as XML schemas. But it's true that for many people they seem to look harder to develop.
Re:The Problem With XML (Score:3, Interesting)
In my experience the main reason our clients want their data in XML is that most of them are afraid of single-vendor lock-in to proprietary formats, especially to smaller vendors they perceive could more easily go under - in other words, they want data longevity and a format they can easily process their data if they need to. And this trumps the inefficiency. Especially as people mostly transfer such documents across high-speed LANs and store them on modern 120+ GB hard disks and open them on machines with
Re:The Problem With XML (Score:2)
Re:The Problem With XML (Score:2)
It doesn't have to be SOAP. But if you are going to publish a web service you need at least to describe what the data looks like. And the client is not going to install your software to establish communication, he/she will only make a call to your server like a http request and you will respond with a stream of data.
That data would be VERY useful if it is described in a standard format like XML.
Re:The Problem With XML (Score:5, Funny)
A standard worse than XML.
A standard I am loath to use
Though offered parsers to abuse;
The designers couldn't pass a class,
CS201 can kiss their ass;
A structure no one can traverse
pre and post order routes are cursed;
What are it's types you cannot tell;
Though it promised self referential.
Standards are assigned by committee,
But any fool can make a tree.
I agree (Score:2)
Re:The Problem With XML (Score:2)
You don't think it's a very good universal data description language? Well, let me confirm your suspicions: XML absolutely and totally sucks as a universal data description language.
The thing is, I don't understand for the life of me why people got the idea that XML should be used for data description to begin with -- it wasn't designed as such. XML was designed to be a document markup language, and that's precisely what it is. It's a g
Re:The Problem With XML (Score:2)
Instead of agreeing with the authors/creators of all these systems in defining how a number or a date or a floati
Re:The Problem With XML (Score:2)
And just what of this does XML help to alleviate? How is it easier to agree on a common XML schema than to agree on a binary protocol or any other ASCII-based protocol? It's exactly the same thing.
Re:The Problem With XML (Score:2)
Re:The Problem With XML (Score:4, Insightful)
Re:The Problem With XML (Score:2)
Compare vs. YAML or any other similar solutions, and its obvious.
The principle of XML is nice - but for text documents only, as a superset of HTML. For the attribute files, properties
Re:Not only understandable and parseable.. (Score:2)
Being text, it is also not tied to a specific vendor or platform.
Re:The Problem With XML (Score:2)
Re:The Problem With XML (Score:2)
Re:The Problem With XML (Score:2)
In particular note the WAP Binary XML [w3.org] format (WBXML) that is used to transfer XML to and from mobile devices.
EricRe:Mod parent up (Score:2)
Re:The Problem With XML (Score:2)
Join the Dark Side (Score:2, Funny)
[Obligatory Star Wars joke]
Re:Join the Dark Side (Score:5, Funny)
HTML: No, XML....I am your father!
XML: That's impossible!
HTML: Grep your code...you know it to be true.
XML: NOOOOOOOOOOOOOOOOOO!
Re:Join the Dark Side (Score:2)
The reason why I ask, is because I thought "XML" is older than HTML, and is a simpler version of SGML (?), which HTML is also derived from
Re:Join the Dark Side (Score:2, Informative)
damn (Score:5, Funny)
Re:damn (Score:5, Funny)
Re:damn (Score:2, Insightful)
n00b - help! (Score:5, Interesting)
Re:n00b - help! (Score:5, Insightful)
Think of it more like CSV than mySQL. It's just a format for representing structured data. It also happens to be that it's quite easily read by humans.
Yes, you can do incredibly advanced things with XML, but there is nothing you can do in XML compared to your own propietary data storing language.
The reason people use XML instead of writing their own data storing format is simple:- there is a lot of tools for parsing it, which you'd have to write yourself if you had your own format.
As for the javascript and XML example, it's impressive, but it's far more javascript than XML.
Re:n00b - help! (Score:5, Insightful)
RelaxNG, for instance, lets you verify that your XML file is built correctly for your app: you write a RelaxNG spec for your XML file format, and then it verifies that all the mandatory fields are there, in whatever order is necessary, with the correct datatypes, etc, etc. RelaxNG processors are part of most major XML libraries now, so if you're writing Perl you can just tell your Perl library to validate your file and it's done. If you're editing in Emacs (with nxml-mode), you can point Emacs at your RelaxNG file, and have tab completion, error highlighting, etc, etc-- all customized for your file format.
XSLT lets you take an XML file and perform transformations on it into another (possibly XML) file format. Need to convert XML into SQL INSERTS? Piece of cake. I use it to extract particular parts of an XML file and convert them into a significantly differently-ordered Lisp structure.
Most modern web browsers are becoming CSS engines rather than HTML engines. So you can stick a CSS stylesheet reference at the top of your XML file, and have the CSS generate something that looks like what you want the user to see. The data file looks good to the app, and looks good to the user. You can also (with some browsers) use more powerful transformations using something like DSSSL or XSLT.
DOM for a standard data manipulation API, so each program you write doesn't have a different data access language. XPath as a language to perform more complex queries. XML Namespaces to let users or apps tag their data with extensions. XInclude for data sharing. All of these are things you get for free with XML.
All of these are general technologies, not specific apps. So they should be usable in most major libraries in most languages. (If you're using Perl, I'd recommend XML::LibXML.)
Don't think of XML as just a file format, because that part sucks. Think of it as a buffet table of technologies. When you write a program, 10% is to do the program's processing; the other 90% is to handle I/O, data management, and other housekeeping. Using XML lets you get a lot of that for free.
PS: I'm not an XML fanatic. A year ago, I was told to use XML for one particular project and was disgusted at the idea. I still think that XML gets a lot wrong, but I've come to recognize what benefits XML provides.
Re:n00b - help! (Score:2)
Re:n00b - help! (Score:3, Funny)
I really like XSLT for code generators, with the meta-data in XML. I do, however, miss the sheer perversity of using Access VBA to generate Java.
Re:n00b - help! (Score:2)
If the only tool you have is a machine gun, everything starts to look like enemy soldiers.
Re:n00b - help! (Score:2)
Re:Dear Lord make the madness stop! (Score:2)
Also, most of my stuff nowadays involves transforming with XSLT, so I need to create a DOM object anyway.
BTW, the example I did was for PHP5.
Re:Dear Lord make the madness stop! (Score:2)
For almost anything non-trivial I'd use DOM over echo. The only reason not to is laziness or 1337ness.
Re:n00b - help! (Score:2)
Re:n00b - help! (Score:2)
This is the one thing I've always hated about XML. It's an incomplete solution; you end up needing six different programming languages/abstractions layers/formatting layers/document validators/rendering layers... it seems like all we're doing is adding more and more overhead to something that used to be as simple as ""
Dear XML-Junkies, (Score:5, Funny)
<salutation>Dear XML-Junkies</salutation>
<body>
I type all my business letters in <link href="http://www.google.com/?q=XML>XML</link>. Sometimes it can be a bit <link href="http://dictionary.reference.com/search?q=ve
</body>
<signature>
<nam
</signature>
</letter>
Re:Dear XML-Junkies, (Score:3, Funny)
nsgmls:letter.xml:4:78:W: character "" is the first character of a delimiter but occurred as data
nsgmls:letter.xml:4:78: open elements: letter body
nsgmls:letter.xml:4:114:W: character "" is the first character of a delimiter but occurred as data
nsgmls:letter.xml:4:114: open elements: letter body
nsgmls:letter.xml:4:132:E: net-enabling start-tag not immediately followed by null end-tag
nsgmls:letter.xml:4:132: open elements: le
XML Seems Cool (Score:3, Insightful)
Do I have the wrong impression?
Re:XML Seems Cool (Score:2, Insightful)
Re:XML Seems Cool (Score:2)
Everyone here at
As for performance, for 99% of your applications, it just doesn't matter. Software analysis and development time is much more expensive than clock cycles.
Would I use XML for a database? Probably not without a lot of convincing. Do I use it for da
Re:XML Seems Cool (Score:2, Informative)
Really, schemas are just convenient tools for a few special purposes. Not everyone needs them, and no one needs them all the time. Schemaless XML is a lot more interesting and practical.
Re:XML Seems Cool (Score:3, Insightful)
This is sort of like saying that programming in C is a bad idea, because what happens if you mistype a function name, and your program refuses to run? That's what debuggers are for. Likewise, the XML world is full of open-source or low-cost schema-aware editors and validators. Minimally you should use an editor that knows which elemen
Yes, it's a great book ... so far (Score:2, Informative)
FYI (Score:5, Informative)
PS, I don't work for Bookpool, I hate it when
Try the other "Effective" books, too (Score:5, Informative)
If you like this book, don't forget to check out Scott Meyers' Effective C++ or Joshua Bloch's Effective Java. Both are great. I devoured Meyers' book when it first came out, and I was happy to see Bloch's book was similarly useful. There is also an Effective Perl book out, but I don't know how good it is -- it follows the same general format, but hasn't been updated since 1997. (Neither has the C++ book, but C++ hasn't changed that much since then.)
EricSee your HTTP headers here [ericgiguere.com]
Just because you CAN... (Score:5, Insightful)
Sometimes, though, your data can be simple enough that XML is overkill. Software developers need to make themselves aware of situations when they might be better served by a simple "flat file" of delimited data. In situations like this, using XML can amount to what I like to call "gratuitous complexity."
Always use the right tool for the job.
Re:Just because you CAN... (Score:2)
True enough, however, simple data all too often becomes complex data. That's why it's a good thing to be "extensible".
Re:Just because you CAN... (Score:2, Informative)
These days data has to be pretty damn simple to justify using a flat file rather than XML. I wrote more about this in my previous book, Processing XML with Java [cafeconleche.org] than in this one, though. Chapters 1-4 discuss this in some detail.
Real-world data often gets messy in ways that don't lend themselves to flat files. For instance, two of the thorniest problems:
Re:Just because you CAN... (Score:2)
Both of these are completely solved by XML with no extra effort on your part, and these are hardly the only issues.
CDATA is delimited at the end by ]]>. There is no way to escape this delimiter. If you need to enclose one XML fragment in another using CDATA, you had best base64 encode it, because there simply isn't any way to nest them.
This is not what I call "well thought out"
overstock (Score:2, Informative)
A perfect eXaMpLe of a good use for XML (Score:5, Funny)
MOD PARENT UP (Score:2)
Thanks, I needed that today.
-Z
Disgruntled with XML.... (Score:2)
Yet, every time I see XML (mis)applied in those cases I keep asking the fundamental question. What does it allow me to do that a decent Lexer and Parser does not? You could be sending grammar files just as easily and without the ridiculous verbosity of XML. Most parsers can work with either text or binary and BNF has been a golden standard for decades. XML
Re:Disgruntled with XML.... (Score:2, Informative)
I suspect what it offers is that you don't have to define and write your own BNF grammar, and then implement it in lex and yacc or similar tools.
Grammar design is non-trivial, especially if you need to consider issues like internationalization. Picking XML as the underlying format means you don't have to do this work yourself. Why reinvent the wheel?
Sometimes you do need something different, but a lot of alternative formats don't really have a good reason to ex
Re:Disgruntled with XML.... (Score:2)
As far as XML development being "easier"... I find that questionable (but it may be my personal view). If the problem domain is trivial then it might be the case that your XML schema happens to be simpler than your BNF grammar. In most non-trivial cases I find it's about even. As far as verbosity goes, 99% of the time your custom grammar will be a lot more space effic
Re:Disgruntled with XML.... (Score:2)
What are the big gotchas of those XML binding APIs as you see them?
Re:Disgruntled with XML.... (Score:2)
Let me start this off by saying that I'm no fanboy of XML. If anything, I'm the local "get XML away from me" person.
But config files are one place that I actually like XML. And I like it because these files are (1) typically fairly small and (2) I don't want to have to write and debug another lexer/parser. I have a utility (castor) that handles the parser for me based on the objects I han
Re:Disgruntled with XML.... (Score:2)
Antlr (actually that's my personal favourite) is harder to learn than JDOM b
Delicious irony (Score:4, Funny)
Re:Delicious irony (Score:4, Funny)
so do we love or hate Mozilla and FireFox today? (Score:5, Insightful)
Good Old Rusty (Score:2)
What's so bad about XML? (Score:4, Insightful)
When I receive their data, I can check that it matches the specification, because my machine can read it. If there is something wrong with their data, I can point out where it's broken, because it's human-readable.
Writing specifications is easy. Writing generators and parsers is easy. The tools are ubiquitous. Generation and parsing are usually fast 'enough'. The standards are freely available. Complex data structures may be described. Data may be transformed using a common language based on XML itself.
Yes, I'd like it to be easier to write XML parsing tools. Yes, I'd like it to be easier to write tools which handle XML more efficiently. No, the two points above don't make XML the devil's data encapsulation.
Rik
Re:What's so bad about XML? (Score:2, Insightful)
People just fail to realise what XML is (or isn't). Basically XML is just a way for you to define your own (markup) language for any purpose.
That it. Is not a database replacement. It won't walk on water or feed the hungry or kill all the communists/terrorists.
But if you want to persist textual data with structure, in a form that will most probably be readable in 20 years time, XML is for you.
By reading the introduciton... (Score:2)
Vocabulary increases our understanding of the entities that we want to work with, so we don't spend our time arguing about what we are trying to say...
For this I remember Ludwig Wittgenstein and his methodology of achieving the Truth by establishing the meaning of words and their relationship with thoughts and their link to reality. I
XML as a fall-back standard (Score:2, Interesting)
Importance vastly overstated (Score:3, Informative)
Re:Bah (Score:4, Funny)
XML is not the end of our problems, it is the beginning of our problems. - ditto
Shortly after the release of XML, some folks, including some very important folks in W3C and its members, who had been big supporters of XML, actually got around to reading the spec, and discovered to their horror that they had an XML which included entities, DTDs, PIs, and assorted other baggage. - Tim Bray
When XMI came out, I had just been studying up on UML, and I thought "Cool! I'll print out the DTD so that I can look it over on the subway ride home!" When I saw how big the XMI DTD was, I decided not to print it out--I prefer not to spend that much time in the subway. - Robert DuCharme
XML was monocase until quite late in its design, when we ran across this ugliness. I had a Java-language processor called Lark - the world's first - and when XML went case-sensitive, I got a factor of three performance improvement, it was all being spent in toLowerCase().- Tim Bray
XML-based technologies seem particularly susceptible to the "if we standardize it, everyone will use it" fallacy. - Simon St. Laurent
Re:Really? (Score:3, Insightful)
You have obviously never looked into soap [w3.org], which seems to be able to addr
Re:Really? (Score:2)
I'm talking more about migrating running processes from one machine to another, by persisting the objects, and recreating them. More along the lines of having MOSIX built into the application.
A decent soap implementation can provide you with the xml representation of an object's state, which can be recreated and manipulated on almost any platform. what else would you like to transfer? You cannot store lang
Re:Really? (Score:2, Interesting)
Most importantly, while I tend to be writing about just one topic at
Re:Really? (Score:2)
Is this not the situation which we'll have