1387663
story
GFD writes:
"The EETimes has a story about a relavtively old protocol for structured information call ASN.1 could be used to compress a 200 byte XML document to 2 bytes and few bits. I wonder if the same could be done with XHTML or even regular HTML."
ASN.1 (Score:1)
This damn thing is part of the OSI thing (remember this crap that worked on paper but was hell to implement)...
It's probably the telco people trying to inflict this stuff upon us.
HTML and XML are there for a reason, the current information technologies are fast enough so that there is no need to "compress" things and that documents are human readable. Getting back into ASN.1 is going back in the past, in binary files hell.
200 bytes - 2 bytes and some bits? (Score:1)
So to argue this is an effective protocol/technique to use, I bet there will be lots of other ways to send 20 bits of information. I really would like to see and XML document with only 20 bits of information, quite empty right?
It is not always important to look at the compression rates, unless you clearly have a bandwidth problem.
Now the strength of XML... that's an entirely other story.
Hoax. (Score:2)
"could be used to compress a 200 byte XML document to 2 bytes and few bits."
This is a hoax. Someone played a trick like this on Byte Magazine (before Byte quit publishing). It is amazing that the editors didn't immediately recognize the impossibility of extreme claims of compression.
I searched the comments for the word "hoax", but no one commenting here has used the word. Anyhow, it can't happen.
Re:Hoax. (Score:1)
Re:Hoax. (Score:2)
I couldn't find anything that really explained how ANS.1 works, and the specs appear to require payment, but from the apparently more knowledgeable posts on /. it appears that it substitutes binary numbers for tags and other repeated parts of messages. The substitution table is fixed in advance and it is assumed that both sender and receiver already have it. So it is only effective if the format is pretty much pre-defined and highly repetitive. Satellite telemetry is a good example. E.g., it might turn "Temperature of engine 2 nozzle, zone 4 = 65" into 2.4.6.5. Or ANS could do a pretty good job of compressing stock market prices by replacing those long corporation names with a short code -- but the exchanges long ago assigned short text codes...
LZW (*zip) compression also uses a substitution table, but in LZW most substitutions are not predefined. The software adds to the table as needed while processing a particular file, and puts each new substitution in the compressed file. So it's flexible; if you are compressing an XML files and someone uses a new tag, word, or phrase repeatedly, LZW will just assign a new code to that string, send the full string once (per file), and every subsequent use only requires the code.
In summary, 200 bytes to 2 bytes is B.S. or a contrived case -- about all you can do in 2 bytes is identify one string previously agreed upon, and if you ever might have to send a free-form message (even an update to the table of pre-defined strings) you're going to need at least one byte just to ID the message type. But if you have a large set of large files that are quite repetitive in both content and format, it might be possible to pre-define a substitution table for the whole set and get 100 to 1 lossless compression. But that's going to work with XML on the web only if you browse just one site whose contents meet the repetitiveness criteria...
By the way, I have seen 98% (50-1) compression using PKZIP. This was on AutoCAD DXF files, which is a remarkably bloated ASCII format representing CAD drawings. And it takes several megabytes before the compression becomes that good. You might get over 90% compression on XML if the files are big enough, but you really shouldn't put that much on one web page.
Postum primus? (Score:3, Funny)
Lossy-soft! (Score:4, Funny)
An excerpt from LampreySoft's page:
LossySoft! [smart.net]
Re:Postum primus? (Score:2)
Errr... just realised that most /. posts can be also transferred at higher speeds.
PS: did that information appear in early April? I missed it.
Re:Postum primus? (Score:2)
Re:Postum primus? (Score:1)
I appreciate your effort, I really do, but any attempt at humor on Slashdot based on misspelling is doomed from the start, as the replies readily indicate.
The audience just isn't ready for this sort of thing. Sorta like Dennis Miller on Monday Night Football.
Re:Postum primus? (Score:2)
Re:Postum primus? (Score:2, Insightful)
Re:Postum primus? (Score:2)
Re:Postum primus? (Score:2, Informative)
LZW is lossless, and GIF isn't lossy in the normal cumulative sense, but since most images are naturally produced using more than 2^8 distinct colors, the first quantization does lose a great deal of information. (Apparently some people claim the GIF spec allows multiple palettes and thus more colors, but since this is in dispute I wouldn't count on it working.)
Truth doesn't vary with the speaker. Identity is only useful for bigots.
Re:Postum primus? (Score:2, Funny)
They don't build 'em like they used to. (Score:3, Interesting)
Re:They don't build 'em like they used to. (Score:3, Interesting)
I'm not sure what protocol you're referring to when you say Exchange. Are you talking about, perchance, Microsoft Exchange Server? The one that uses X.400 for site-to-site communication? The X.400 that uses ASN.1 encoding?
mod_gzip ? (Score:4, Informative)
Re:mod_gzip ? (Score:3, Informative)
It dosn't work with SSL easily. See this [over.net] thread if curious. I ran into this when I wanted to force Open Webmail [ncku.edu.tw] to use https only and found the pages were not getting compressed.
And take note of possible problems [over.net] with caching proxies serving pages to browsers that can't handle it.
It has a few other quirks, but overall I for one am quite satisfied with it.
Curious about the savings it brings? Use this [krakow.pl].
Machines are always broken till the repairman comes.
Re:mod_gzip ? (Score:2)
First of all, this seems a bit off topic. Second, you can read about HTTP compression on the W3C website [w3.org]. It's definatly not a HUGE impact (and has some bugs with certain browsers base on my own tests). Finally, AFAIK, ALL major web servers have this built in as it is part of the HTTP1.1 spec. Nothing to see here, move on please
Re:mod_gzip ? (Score:1)
Re:mod_gzip ? (Score:1)
Bandwidth or CPU? (Score:1)
It's a case of what you want to optimise for.
Do you want to save CPU? (An issue on heavily loaded sites with oodles of cheap bandwidth.) Continue as you are without mod_gzip.
Do you want to save bandwidth? (An issue with expensive bandwidth.) Then sure, use mod_gzip and convert some of that CPU into bandwidth savings.
This is only thinking about the server end of things. On the other end of the connection is a user who also has limited bandwidth and CPU available.
So it varies. Athlon 800 serving huge text files on a 56K modem? mod_gzip. P90 dishing out 1x1 GIFs? Leave it as is.
One example of this CPU vs bandwidth I came across was when I was scp'ing a file across a Fast Ethernet (100MB) network. On one end was a K6/200, and the transfer was taking ages! Then I realised I had told SSH to compress data. It was eating CPU like crazy! So I stopped the transfer, and left off the compression flag. It went about three times faster.
Hello, haven't we read Comer's book? (Score:4, Interesting)
This is the same philosophy of IP, ATM, or any modern network technology. Simple, but fast.
Re:Hello, haven't we read Comer's book? (Score:3, Informative)
ASN.1 is well known outside of the IETF fundamentalist crowd. With its PER (packed encoding rules), it is very efficient of bandwidth and not all that CPU intensive either. Nor is it difficult, if used correctly (and anything can be tough if used wrong). It's a simple tag-length-value notation which can recurse. The only reason the Internet doesn't use it more is the usual NIH.
Re:Hello, haven't we read Comer's book? (Score:2)
Always nice to start with a nice Ad Hominem jibe. I'll try one myself "ASN.1 is supported mainly by the failled has-beens who designed OSI".
With its PER (packed encoding rules), it is very efficient of bandwidth and not all that CPU intensive either.
Utterly misleading. ASN.1 encoding rules are relatively simple, the data model is the big smelly dung heap to be avoided. Although the encoding rules are 'simple' the Derranged Encoding Rules (DER) used in X.509 require multiple recursive passes through the data structure to encode it.
The only reason the Internet doesn't use it more is the usual NIH.
On the contraty, several IETF protocols have used ASN.1 and the experience has been pretty miserable. The biggest problem being that ISO keeps tweaking the spec in ways that break existing implementations. ASN.1 is simply too much of a pain in the ass for the limited advantage it provides.
The group's attempt to claim ASN.1 as the savior of HTTP is ignorant and stupid. There have been many proposals to compress HTTP headers and ASN.1 is actually one of the worst performers on both overhead and performance. The reason none of the proposals have gone anywhere is that there is no point in a backwards-incompatible change that saves 100 bytes or so on the headers if you don't do something about compressing the body. The biggest mistake we made in HTTP was not putting a simple huffman coding compression algorithm for ASCII text into the server and browsers. Actually the reason we didn't get arround to it was that nobody wanted to mess arround with the patent minefield.
Still it is always easier to explain that the reason the world is not using your idea is because they are stupid and ignorant and not because your idea is stupid and ignorant. In the case of ASN.1 the idea is a good one but the execution if third or fourth rate at best.
Re:Hello, haven't we read Comer's book? (Score:2)
If you didn't live through those horrible days when the trendy crowd was all for OSI and claiming that OSI was the One True Way and would and should eliminate the scourge of the Internet and TCP/IP from the face of the earth, then you really don't get the evil of ASN.1 and its ilk...
Re:Hello, haven't we read Comer's book? (Score:2)
But I agree that a generalization of fiber capacity to bandwidth must be done with extreme caution.
bandwidth is cheap (Score:2, Insightful)
bandwidth is cheap? On what planet? (Score:2)
You're kidding right? Most CS people I know cringe at the fact that XML can more than double the size of a document with largely redundant tags. The only thing to be thankful for is that the documents typically compress very well due to the large number of redundant tags and that HTTP 1.1 supports compression especially know that XML over HTTP (i.e. web services) is being beaten to death by a lot of people in the software industry. Numerous [xml.com] articles [irt.org] about [att.com] XML compression [xml.com] also tend to disagree with you that it is not an issue.
PS: If bandwidth is so cheap how come DSL companies are going out of business and AOL owns Time Warner? This would tend to imply that low bandwidth connections are still the order of the day.
Re:bandwidth is cheap? On what planet? (Score:2)
Re:bandwidth is cheap? On what planet? (Score:2)
DSL companies are going out of business because... bandwidth is so cheap. And it's their own fault.
and AOL owns Time Warner? This would tend to imply that low bandwidth connections are still the order of the day.
Why? Are you saying AOL=dialup, and Time-Warner=cable? There's a LOT more to both of those companies than either of those two things...
Re:bandwidth is cheap (Score:4, Informative)
Translated:
You have a powerful, general-purpose computer at your disposal. Why should you care if the protocol can be inspected with the naked eye? Do you use an oscilloscope to pretty-print IP packets? No, you use ethereal [ethereal.com]! If XML is encoded using ASN.1, then the tools will be modified to decode ASN.1 before showing it to the human. Ethereal already knows about ASN.1 [ethereal.com] because it uses it to display LDAP traffic. If you don't like ethereal, try Unigone [unigone.com].
Use your CPU, not your eyeballs!
Re:bandwidth is cheap (Score:2)
Re: Leave compression to the hardware (Score:2)
Willy
Re:bandwidth is cheap (Score:2)
I'm typing this over a 56k connection. If I want faster in this area, I can either pay for a leased line, an ISDN line, or a satellite connection. If these options are cheap, could you buy me one please?
ASN.1 "compression" vs XML (Score:3, Insightful)
ASN.1 uses integers as its symbols. Remember the protocol used for SNMP? Did you really like it? It's not too human-readable or writable.
Also, the idea of promoting it through a consortium is rather old-fashioned.
Bruce
Re:ASN.1 "compression" vs XML (Score:2)
I always thought there was a reason the X windowing system seemed a bit old-fashioned...
Re:ASN.1 "compression" vs XML (Score:2)
Re:ASN.1 "compression" vs XML (Score:2)
Regarding ASN.1, Yes, there are tools to make this easier. I do still find it more difficult to code and test. And in general my development time is more expensive than bandwidth. That probably applies to most people.
Thanks
Bruce
Re:ASN.1 "compression" vs XML (Score:2)
Re:ASN.1 "compression" vs XML (Score:2)
Multimedia? (Score:3, Interesting)
Yes, it would be nice to make the internet move faster with current technology, and I would support this for people on very slow connections. It might also be a boon for servers that get hit hard and often (though I doubt it would stop the Slashdot effect
Of course, I hope I'm wrong. More effective bandwith is a Good Thing.
Re:Multimedia? (Score:2, Funny)
You've obviously never saved a 5k Word doc in HTML. *sigh*.
ASN.1 not suitable (Score:5, Informative)
Not to mention, ASN.1 does not generally reduce the document size by more than 40% compared to XML. Think about it: how much space is really taken by tags?
It's also worth noting that there is lots of documentation surrounding XML. With ASN.1 you have to download the spec from ITU which is an INCREDIBLY annoying organization and their specs are barely readable and they charge money to look at them, despite the fact that they are supposedly an open organization. The IETF and the W3C are actually open organizations; ITU just pretends to be. ITU does whatever it can to restrict the distribution of their specifications.
Re:ASN.1 not suitable (Score:5, Informative)
This is pretty much right. I do a lot of work on X500 / ldap / security, and ASN1 is used throughout all this. It does a pretty good job, but as the poster points out, the ITU is a completely brain damaged relic of the sort of big company old boys club that used to make standards. It's very difficult to get info out of them. (Once you get it though, it's usually pretty thorough!)
As for the 'compression', well, yes, it sorta would be shorter under many circumstances. ASN1 uses pre-defined 'global' schema that everyone is presumed to have. Once (!) you've got that schema, subsequent messages can be very terse. (Without the schema you can still figure out the structure of the data, but you don't know what its for). For example, I've seen people try to encode X509 certificates (which are ASN.1) in XML, and they blow out to many times the size. Since each 'tag equivalent' in ASN.1 is a numeric OID (object identifier), the tags are usually far shorter than their XML equivalents. And ASN.1 is binary, whereas XML has to escape binary sequences (base64?).
But yeah, ASN.1 is a pain to read. XML is nice for humans, ASN1 is nice for computers. Both require a XML parser/ ASN.1 compiler though. ASN.1 can be very neat from an OO point of view, 'cause your ASN.1 compiler can create objects from the raw ASN.1 (a bit like a java serialised object). But I can't see ASN.1 being much chop to compress text documents, there are much better ways of doing that around already (and I thought a lot of that stuff was automatically handled by the transport layer these days?)
And just for the record... the XML people grabbed a bunch of good ideas from ASN.1, which is good, and LDAPs problems are more that they screwed up trying to do a cut down version of X500, than that they use ASN.1 :-)!
Re:ASN.1 not suitable (Score:3, Interesting)
Heh. How is this different from XML?
I'm always amused by people that assume XML will be the magic lingua franca of the Internet and everyone will be able to parse every last bit of meaning out of your document just because it's encased in <handwaving><readable by="human"><tags /></readable></handwaving> without ever agreeing on any of those nasty "standards" things. Guess what, people: until we have a solution to the strong AI problem, human readable don't mean squat.
Re:ASN.1 not suitable, but XML is still good (Score:2)
Apparently you've never had to write a parser for EDI [everything2.com], or any other binary data interchange format.
I'm not going to claim that XML is a magic bullet for data interchange -- but I will attest that human-readable data formats are superior to binary formats when it comes to data interchange. I have lost track of the number of custom parsers I've had to write over the last 15+ years in order to convert data from one system to another, simply because the systems in question didn't have a shared data format. The big wins for XML are that (1) you can visually inspect your before-and-after results, (2) you don't have to write the parser, even if you have to write code to call it, (3) there are actually two sensible APIs to match two very different ways to look at the data, each of which is parser independent, and best of all (4) if you don't have documentation for the schema (or it's misimplemented), you still have a prayer of interpreting the data correctly.
Anyone who's ever had to write an EDI application will *instantly* understand the appeal of XML.
Re:ASN.1 not suitable, but XML is still good (Score:2)
Because both formats are supposed to be good for data interchange, and only one of them really is -- XML. With EDI, the standard had to be so all-encompassing that one group of programmers would read the spec one way, and one another way, and so you could spend months trying to correctly interpret data that was "standard".
Those who do not undestand ASN.1 .... (Score:3, Informative)
I have had to deal with dozens of binary protocols that do the same thing as ASN.1, and do it worse.
As to comparisons, XML and ASN.1 are designed for different jobs. Designing a Web page in ASN.1 would be ridiculous. Sending (say) telemetry data encoded in XML is equally ridiculous. I can believe that *data* transmissions could be 100 times larger in XML than in ASN.1. You have the header, DTD, some namespace delcarations, and a bunch of nested tags, just to express a couple of numbers.
Problem is, XML is one of the latest forms of fairy dust that Management has latched onto. "Sprinkle this on your project and it will fly!" So programs have XML grafted onto them anywhere it might fit.
A particularly cute example is SOAP (Microsoft's firewall-bypass protocol) It's going to be fun to watch people try to squeeze some performance out of a SOAP based system that tries to do something interactive.
As to the ISO, yeah, they're seriously obnoxious. They tend to go off into their own little world, redefine standard terminology so they're incomprehensible to outsiders, and come up with stuff that can't be implemented. (Nobody uses ASN.1 -- it's unimplementable. When people talk about using ASN.1 for something real, they're talking about a subset. A subset, of course, cannot claim conformance to the standard.) The crowning insult, of course, is that they fund the organization by selling the standards. Hey, it's a standard -- you *have* to buy it!
"It's all in knowing what wrench to use to pound in the screw."
That's funny. (Score:1)
I don't see what's so bad about judiciously applied XML. If you'd like to piddlefart around with obscure offsets and byte counts in binary transfers, knock yourself out. XML doesn't bloat transmissions up that much (argue about node overhead, then remember filler columns) and every machine in existence speaks text.
Of course it's not all things for all people, but in the right place at the right time, it's just fine.
Re:Those who do not undestand ASN.1 .... (Score:2)
Problem is, XML is one of the latest forms of fairy dust that Management has latched onto. "Sprinkle this on your project and it will fly!" So programs have XML grafted onto them anywhere it might fit.
XML is no magic bullet; however, that doesn't change the fact that it is incredibly useful in many different circumstances. XML, realistically used, can make some projects simpler, and data transfers much more comprehensible.
A particularly cute example is SOAP (Microsoft's firewall-bypass protocol) It's going to be fun to watch people try to squeeze some performance out of a SOAP based system that tries to do something interactive.
SOAP, XML-RPC and similar protocols are designed for generic, highly interoperable, communications, not performance. Anybody who expects blinding performance out of an XML encoded procedure call shouldn't be programming. You want performance, use a custom protocol, or at least CORBA. SOAP is for when you can sacrifice performance to gain interoperability.
I'd even go a step farther: anything that can be done using an XML-based data format can be done smaller and faster by some other design. However, as machines get larger, faster and cheaper, getting that last bit of performance becomes less and less important for most computing tasks. XML is great for tasks that don't need every last ounce of speed. Save the custom-tuned binary formats and protocols for the few apps that really need them.
Binary Bits (Score:2)
First, you shouldn't assume that available bandwidth will steadily increase. It will take some major breakthroughs -- not just technical, but political and economic -- before there's a megabit internet connection every place where it might be useful. And wide-area wireless networking is in an even worse state. Not to mention that radio spectrum is a finite resource.
Your point about tags is well-taken. But you can compress the content too. Using 8 bits for every character is very inefficient, especially considering that there are only 128 characters to represent. With the right scheme, you could certainly get the average character width to somewhere between 4 and 5 bits.
ASN.1 was designed to be efficient (Score:4, Informative)
Missing the point? (Score:2, Insightful)
Bandwidth is cheap now, but it may not be forever. Yes, we'll most likely continue to see order of magnitude increases for years and decades to come, but it'll slow down sometime.
Also, consider wireless devices. Their bandwidth isn't there right now, and maybe with 3G we'll see a nice increase, but I can see that as a practical application for this type of compression.
Let's also not forget that even though it's compressed, you can always uncompress it into regular old XML to actually read it and understand it, for you human folks that actually need like LETTERS and stuff! That's it. I'm just going to start writing everything in integers soon. Time to change my .sig!
HTML could be compressed (Score:2, Flamebait)
each character a different color. For each character you'd need data approxately equal to:
a
This entire sequence could be compressed into 4 bytes or less, but you would require an html compiler instead of coding it by hand (unless you're one of those crazy people that prefer coding opcodes straight over using C).
The issue with html, and the reason why we don't worry about the inefficiency much is the fact that you could have a rather extensive html file with one link to a single picture, and that picture would easily take up the space of the entire html file.
-Restil
Re:HTML could be compressed (Score:1)
If you are using a modem with a "V54" or "Vnn".
If anywhere in the network two CISCO or two NORTEL routers are talking to each other, if your backbone provider is reasonably competant and wants to make money.
Then your web traffic is already being compressed.
One of the great things about HTML and XML is that it compresses really easily using comparitively simple compression algorithms.
So any effort you put in "compressing" XML traffic is wasted as your network hardware would probably have done it anyway.
Bandwidth Versus Computational Effort (Score:2, Insightful)
With the current over-supply of domestic bandwidth and the move to database-driven, customised web sites, is it worth spending CPU cycles compressing small data files on-the-fly?
Most popular websites don't suffer from poor connectivity -- they suffer from too little back-end grunt.
Re:Bandwidth Versus Computational Effort (Score:2)
ASN.1 resources on the web. (Score:3, Informative)
Those of you that want to find out more about ASN.1, can pick up free e-books on ASN.1 here [oss.com]. There's some blatant propaganda in them for OSS Nokalva's ASN.1 compiler, but of course there's also snacc [gnu.org], an GPL'd open source ASN.1 compiler. Snacc however only generates code for encoding to BER, so you might also want to check out the a hacked version [qut.edu.au] of snacc from Queensland University of Technology.
ASN.1 is a base technology for a lot of standards out there like X.509, PKCS and LDAP, the OSI application layer protocols etc.
Reverse Engineer hax0r3d! (Score:4, Funny)
Totally misses the point (Score:5, Insightful)
ASN.1 achieves good compression because the designer must specify every single and parameter for all time. The ASN.1 compiler, among other things, then figures out that that "Letterhead, A4, landscape" mode flag should be encoded as something like 4.16.3.23.1.5, which is actually a sequence of bits that can fit into 2 bytes because the ASN.1 grammar knows exactly how few bits are sufficient for every possible case.
In contrast, XML starts with *X* because it's designed to be extensible. The DTDs are not cast in stone, and in fact a well-behaved application should read the DTD for each session, and only extracting the items of interest. It's not an error if one site decides to extend their DTD locally, provided they don't remove anything.
But if you use ASN.1 compression, you either need to cast those XML DTDs into stone (defeating the main reason for XML in the first place), or compile the DTD into an ASN.1 compiler on the fly (an expensive operation, at least at the moment).
This idea is actually pretty clever if you control both sides of the connection and can ensure that the ASN.1 always matches the DTD, but as a general solution it's the wrong idea at the wrong time.
Re:Totally misses the point (Score:2)
Re:Totally misses the point (Score:2, Insightful)
Say what? Heh heh...
Let's say you have an XHTML document (one DTD) that contains MathML (another DTD) and some SVG for good measure (third DTD). This would not be handled in your static DTD compile unless you made specific provisions for all of them in a single document. But what if the next document only has one of them used? Or two? Or includes some other one later? Are you going to compile every permutation of DTD that could ever occur?
This is where the strength of XML is not necessarily compatible with the strengths of ASN.1.
Missing the point as to why XML is good (Score:4, Insightful)
XML, by virtue of being text-based, may be easily inspected and understood. Sure, it's a little bulky, but if you're transmitting something like an XML-encoded vCard versus an ASN.1 encoding of the same info, the bulk is negligible.
Yes, for mp3-sized data streams, or real-time systems, there would be a difference. But many interesting applications don't require that much bandwidth.
ASN.1 achieves its compactness by sacrificing transparency. Sure, it's probably straightforward enough if you have the document which says how the tags are encoded, but good documentation of anything is rare as hen's teeth, and not all software companies are willing to play nice with the developer community at large and share their standards documents. And some of them get downright nassssssty if your reverse engineer...
Transparency is one of the reasons for the rapid growth of the Web: both HTML and HTTP were easy enough to understand that it took very little tech savvy to throw up a website or code an HTTPD or a CGI program.
Transparency and extensibiliy also make XML an excellent archival format; so if your protocol messages contain data you want to keep around for a while, you can snip out portions of the stream and save them, knowing that 10 or 15 years from now, even if all the relevant apps (and their documentation) disappear, you'll still be able to grok the data.
ASN.1 -- excellent choice (Score:4, Informative)
Some people in this forum think that ASN.1 is a replacement for XML; others think of it as a "lossy" compression algorithm. ASN.1 is neither. Read the article and learn a bit about ASN.1 before forming an opinion. Most important, ASN.1 has been an interoperability standard for at least 10 years prior to the introduction of XML.
ASN.1 is a standard interoperability protocol (ISO IS 8824 and 8825) that defines a transfer syntax irrespective of the local system's syntax. In the scenario described in the article, the local syntax is XML and the transfer syntax is ASN.1. ASN.1 is a collection of data values with some meaning associated with them. It doesn't specify how the values are to be encoded. The semantics of those values are left to the application to resolve (i.e. XML). ASN.1 defines only the transfer syntax between systems.
ASN.1 codes are defined in terms of one or more octets (bytes) joined together in something called an encoding structure. This encoding structure may have values associated with it in terms on bits rather than bytes. An encoding structure has three parts: Identifier, Length, and Contents octets. Id octects are used for specifying primitive or constructor data types. Length octets define the size of the actual content. A boolean is thus represented by a single bit, and digits 0-9 could be BCD encoded. Each encoding structure carries with it it's interpretation.
An XML document could thus be encoded by converting the tags into a lookup table and a single octect code. If the tags are too many, or too long (i.e. FIRST-NAME) then there are significant savings by replacing the whole tag with an ASN.1 encoded datum. If we assume there are up to 255 different potential tags in the XML document definition, then each could be assigned to a single byte. Thus, encoding the tag <FIRST-NAME> would only take two bytes: One for the ID, one for the length octet, and zero for the contents (the tag ID could carry its own meaning).
I used to work with OSI networks at IBM. All the traffic was ASN.1-encoded. I personally think this is an excellent idea because ASN.1 parsers are simple and straightforward to implement, fast, their output is architecture independent, and the technology is very stable. Most important, this is a PRESENTATION LAYER protocol, not an APPLICATION LAYER protocol. The semantics of the encoding are left to the XML program. Carefully encoded ASN.1 will preserve the exact format of the original XML document while allowing its fast transmission between two systems.
http://www.bgbm.fu-berlin.de/TDWG/acc/Documents/as n1gloss.htm has an excellent overview if you're interested.
Cheers!
ERe:ASN.1 -- excellent choice (Score:1)
That's fine, but leaves the X out of XML: eXtensibility. A lot of existing XML schemas have slots of the form <xs:any namespace="##other"/> which allows any foreign tag, known or unknown, defined or not, to be incorporated at that point. As far as I know, ASN.1 can't cope with that without both explicit tagging and a fully-expanded OID for the incorporated entity (since it's not enumerable), which creates metadata bloat all over again.
Another XML design goal is that a document be parsable (at least as far as an abstract syntax tree) without foreknowledge of the type structure. A couple of mechanisms from SGML that were forbidden in XML but don't defeat this goal are empty end-tags and unquoted (single-token) attribute values. Empty end-tags would knock a large chunk out of the size of a complex XML document by allowing a simple </> to close whatever element was last opened. Unquoted attribute values can save 2 characters per attribute and also feel more natural when the values aren't stringlike in nature; quoting small integers just grates on me, anyway.
Another approach is defining a general binary shorthand coding for XML; a place I worked at had one in use for wire transmission of XML between hosts running their code base.
Actually... (Score:2)
If HTML is written properyly, it is XML. Browsers nowadays let you cheat, and mix tags, and ignore quotes, but if the HTML is written to spec, then it is technically XML.
Captain_Frisk
ASN.1 isn't efficient--for a binary protocol (Score:2, Informative)
This is funny ... (Score:2, Informative)
I remember when I first came across ASN.1 years ago. Everybody hated it because the parser was sssooo big and complex. Why not just use a simple ASCII file was a common refrain. Sure ASN.1 was capable of representing just about any data structure in a reasonably compact form, but most information did not need complex data structures to represent it so why does anybody use ASN.1?
Well a decade or two later we get the ASCII version of ASN.1 - XML. And guess what? It's arguably harder to write a generic parser for XML that it is for ASN.1. (I still have not found a good open source validating parser for XML.) But guess what - everybody is wildly enthusiastic this time round. My how times change!
Actually ASN.1 and XML in some ways are very similar. They try to solve the same problem - how to represent complex data structures in a generic way. And they do it in a similar way. Because ASN.1 is binary and uses numbers instead of text tags it does use a lot less space to represent the same thing, although 2 verus 200 bytes claim is at best misleading. Most of the 200 bytes would probably be XML header (dtd's and stuff) which you would not put in the ASN.1 encoding.
And yes, XML is too fat for some applications. For example, if you are pumping out a 60k row SQL table to your 1000 clients every day you probably would not choose XML. That is why this idea has merit. It could give you the benefits of XML without the fat. To work someone who have to come up with a standard way of translating a DTD to ASN.1 encoding. I know it's a good idea because I came up with it myself a while back :).
XML is BAD BAD BAD :) (Score:2)
Why human-readable formats are critical (Score:4, Insightful)
Contrast that with what I'm dealing with right now: I'm using JDBC to access an MS SQL Server. MS bought their SQL Server from Sybase many years ago, and inherited the binary TDS data stream protocol. As efficient as this might be, when you run into problems, you're in trouble. The TDS format is undocumented, so you can't easily determine what the problem might be, whereas a text format would be easy to debug. Anytime you have a binary protocol, you become totally reliant on the tools that are available to interpret that protocol. With text protocols, you're much less restricted.
Another example of this is standard Unix-based email systems vs. Microsoft Exchange. Exchange uses a proprietary database for its message base, which makes it effectively inaccessible to anything but specialized tools and a poorly-designed API. If your email is stored in some kind of text format, OTOH, there are a wealth of tools that can deal with it, right down to simple old grep.
The bottom line is that the human-readability (and writability!!) of HTML was one of the major factors in the success of the web. It's no coincidence that everything on the web, and many other successful protocols, such as SMTP, are text-based. To paraphrase your subject line, binary protocols are BAD BAD BAD.
Calling human-readable formats "irrational" is a bit like Spock on Star Trek calling things "illogical" - what that usually really meant was that the actual logic of the situation wasn't understood. What's irrational is encoding important information, which needs to be examined by humans for all sorts of reasons that go beyond what you happen to have imagined, into a format which humans can't easily read.
Human-readable formats and protocols will remain important until humans have been completely "taken out of the loop" of programming computers (which means not in the forseeable future).
Humans and Tools (Score:2)
Of course, you never have to deal with that because the SSL stream is already decoded for you. That might not help with a new format, but maybe someone could come up with a special language that's really good for rearranging data and making it presentable. We could call is "Practical Language for Extracting and Reporting." Yeah, PLER. That has kind of a nice ring to it. There are quite a few jobs that need this kind of data munging, but are too small for Java and would take too long to write in C++, so I'd be there'd be a lot of interest in this hypothetical PLER language.
Re:Why human-readable formats are critical (Score:2)
I have programmed something "on the web", but before it became such a fad, I used to like assembly language programming... Decoding a simple binary format is trivial and if the usual format for web pages was binary, Browsers would still allow you to use a "view source" command (to decode the binary format, probably giving a much more readable presentation of the structure of the document than the HTML code you can see nowdays)
Re:Why human-readable formats are critical (Score:2)
When was the last time you saw a web page designer or web application programmer dealing with any of this stuff?
Sounds like they're spewing buzzwords... (Score:2)
No! Not ASN.1! Make it stop! Make it stop! (Score:2)
Since XML was designed for humans to be able to look at to a certain extent, why not just have a standard compreession method that's included with all XML parsers? Whenever you transmit or save the XML file, it should be saved in the compressed format.
GPL'ed ASN.1 encoder/decoder (Score:2, Informative)
Oh yeah? (Score:2)
Oh yeah?? I wrote a protocol that can take a 6 MB MP3 file and compress it to under 10 bytes!
(Some sound quality degragation may occur, use at own risk)
The ASN.1 faithful just don't get it (Score:5, Insightful)
There are a vast number of differences between ASN.1 and XML. To think that ASN.1 is in any way related to XML demonstrates that they just don't "get it".
1. Why not XDR or just raw binary?
Why not just specify your own binary format for you application? The thing that the ASN.1 bigots don't understand is that in most real-world applications, the ASN.1 formatting provides only overhead but no realworld value. This happens in XML, too, but the value proposition for XML is much clearer. A good example is the H.323 series PER encoding which is just plain wrong: well-documented custom encoding would have been tons better.
2. DTD or no DTD
The ASN.1 language is essentially a DTD; it gets encoded in things like BER. The trick is that I can parse "well-formed" XML content without knowing the DTD. This is impossible with current ASN.1 encoding. The idea of DTD-free "well-formed" input and DTD-based "valid" input is at the core of XML. Yes, both ASN.1 and XML both format data, but proposing ASN.1 as being a valid substitute means you just don't grok what XML is all about
3. Interoperability
The Internet grew up in an environment that parsers should be liberal in what they receive. This was important in early interoperability, but now is a detriment. For example, it is impossible to write an interoperable HTML parser. XML took the radical zen approach of mandating that any parser that excepts malformed input is BAD. As a result, anybody writing an parser knows the input will be well-formed. There is one-and-only-one way to represent input (barring whitespace), so writing parsers is easy. ASN.1 has taken the opposite approach, there are a zillion ways to represent input.
As a result, non-interoperable ASN.1 implementations abound. For example, most SNMP implementations are incompatible. They work only "most" of the time. Go to a standard SNMP MIB repository and you'll find that the same MIB must be published multiple times to handle different ASN.1 compilers.
The long and the short of it is that ASN.1 implementations today are extremely incompatible with each other, whereas XML libraries have proving to extremely interoperable. Right now, XML has proven the MOST interoperable way to format data, and ASN.1 has proven to be the LEAST.
4. Bugs
Most XML parsers have proven to be robust, most ASN.1 parsers have proven to be buggy. You can DoS a lot of devices today by carefully crafting malformed SNMP BER packets.
5. Security
You can leverage ASN.1's multiple encodings to hack. For example, my SideStep program shows how to play with SNMP and evade network intrusion detection systems: http://robertgraham.com/tmp/sidestep.html [robertgraham.com] At the same time, ASN.1 parsers are riddled with buffer-overflows.
Anyway, sorry for ranting. I think XML advocates are a little overzealous (watch carefully your possessions or some XMLite will come along and encode it), but ASN.1 is just plain wrong. The rumor is that somebody through it together as a sample to point out problems, but it was accidentally standardized. It is riddled with problems, it should be abandoned. An encoding system is rarely needed, but if you need one, pick XDR for gosh sakes.
Re:The ASN.1 faithful just don't get it (Score:2, Insightful)
Re:The ASN.1 faithful just don't get it (Score:1)
The same struggle in the VoIP world (Score:2)
H.323 interoperability is tough. Some problems are due to differences in how one entity encodes a piece of data and another decodes it. Many H.323 implementations, um, do not fail gracefully under such circumstances.
SIP call signalling looks like HTTP. There have been complaints that it's too verbose, and needs to be replaced with something binary. One proposal [ietf.org] suggests using a binary encoding. It uses LZW [google.com] compression and shared "codebooks" (schemas?)
That's just for call signalling. Both these VoIP protocols (and others) use RTP [columbia.edu] ("Real Time Protocol") for voice, video, etc.; that's encoded and compressed pretty darned seriously.
(I'm not speaking for my employer, I'm just speaking my mind.)
Re:The same struggle in the VoIP world (Score:2, Funny)
Simply cut out the un-needed words.
[dials]
Broken down. Main street. Need spare tyre.
[hangs up]
See, it'll half your phone bills!
BFD (Score:2)
Amen (Score:2)
XML is a file format.
XML shows that you *can* use a single file format for everything. That doesn't mean it a good idea, except in a couple of particular places.
The reason it's caught on is that the average programmers is getting stupider. It's genuinely difficult for these people to write a simple parser, so they use XML for everything. Nevermind that it's harder to read/write for humans than some custom HCI format or insanely more verbose and slower to scan than some custom binary format. They preach interoperability when this is irrelevent to cover for laziness and incompetence.
If I hear one more fuckwit say, "hey lets create an XML based programming lanugage", I'll scream.
Re:Using XML is _ASKING_ for bloat (Score:1)
XML is a very wasteful and generic file format.
So what if it's wasteful ? Bytes are cheap. The entropy content of XML isn't inefficient (as could also be said of ASN.1), so low-level compression algorithms can equally well compress them. The message "Your Amazon order has billed your credit card $23 and sent you a copy of 'Fly Fishing'" compresses down to much the same size in either encoding.
If your network transport layers don't do compression, blame the network not the content.
Secondly, when did "generic" become a criticism ?
Thirdly, XML isn't just a serialization format. Admittedly it is now, was even more so in the early days, and the "XML For Morons" books get it entirely wrong, but the XML Infoset [w3.org] WG are trying to steer it back. Think data model, not just bytes on the wire - that's the real reason why ASN.1 is an inappropriate comparison.
ASN.1 is like EDI and Read Codes. It's an application-level solution to byte squashing. The things are nightmares to work with, and simply not needed any more.
Re:Check this out! (Score:1)
OT (slightly) : SNMP (Score:1)
As far as computers go, I was under the impression you could manage computers running Windows (and maybe even Linux and Unix) using SNMP, so maybe someone can provide more detail.
Re:ASN.1 is evil (Score:1)
Re:ASN.1 is evil (Score:3, Informative)
Its tha devils spawn I tell ya. Its extremely complex and hard to debug.
Having worked with ASN.1 and CMIP I can certainly state that most examples for ASN.1 data types I've seen (M3100 and that lot) are far too complex (too many CHOICE, ANY values). But I still think ASN.1 and BER/PER are a decent way to efficiently encode data in a platform-independent manner. ASN.1 data types can be really simple or really complex, so blame the designers defining complex types in ASN.1 not the notation itself.
The whole reason the net has taken off so quickly is the simple, open and clear protocols used. You need to debug your email server? Just telnet in and talk to it! With ASN.1 you need a compiler to make each damn data packet.
I think it is only fair to state that a lack of good (I mean open and free, of course) ASN.1 decoders/encoders contributes to the lack of widespread adoption of technologies like ASN.1. Not that tools like SNACC are all that bad, but were good tools around in the early days of ASN.1? Certainly CMIP never had good free toolkits.
The standards bodies play a role here. Making sure you advocate for your standard early on and doing your best to promote good open reference implementations goes a long way towards helping a standard gain widespread adoption.
I think SNMP is a good example of how ASN.1 can be used effectively. Just because ASN.1 allows for complex types doesn't mean people have to build complex types into their standards/protocols.
I'm growing tired of the "I've got the world on a String" school of data typing ;->
Sometimes efficient, compact encoding/decoding is just what the solution calls for, whether it is ASN.1 BER/PER or the OMG IDL using CDR.
Re:200 bytes to 2 +/- (Score:1)
ASN.1, as I understand it is structured as follows:
[data_type][data_length][data......]
so, to convert
data string
(30 bytes)
to an ASN.1 format would result in:
[4][11][data string]
(13 bytes)
BUT the sender and receiver need to have already agreed that a data_type value of "4" indicates a datatype of "xml_tag", that the length code that follows is of size 8 bits - thus removing the self-describing value of an XML file type.
If you want to compare apples to apples, you need to add in the size of the tables that will map the "data_type" values to their corresponding xml tag types...
How is this a huge improvement over comma-delimited text, since the sender and receiver have to know the layout before the data can be sent???
Ken
Re:You sure you read the same story (Score:1)
Re:What was it used for? (Score:2, Informative)
X.509 digital certs, among other things (Score:2)
Re:What was it used for? (Score:2, Informative)
Re:100:1 text compression ? (Score:3, Informative)
200 BYTE (!) XML documents are pretty rare. They probably standarized a few headers and instead of sending they just send some code.
Don't believe for a second we're talking about a compression scheme here. The usual slashdot lack of information applies.
Re:not quite (Score:2, Funny)
Re:What? No way. (Score:3, Funny)
Original XML (130 bytes):
Binary encoded (1 byte):
10110010
That's a 130:1 ratio.
Re:Try UDP with bigger packets (Score:2, Informative)