Forgot your password?
typodupeerror
News

Old Protocol Could Save Massive Bandwidth 287

Posted by Hemos
from the reduce-reuse-recycle dept.
GFD writes: "The EETimes has a story about a relavtively old protocol for structured information call ASN.1 could be used to compress a 200 byte XML document to 2 bytes and few bits. I wonder if the same could be done with XHTML or even regular HTML."
This discussion has been archived. No new comments can be posted.

Old Protocol Could Save Massive Bandwidth

Comments Filter:
  • by sxpert (139117)
    The hell with this.
    This damn thing is part of the OSI thing (remember this crap that worked on paper but was hell to implement)...
    It's probably the telco people trying to inflict this stuff upon us.
    HTML and XML are there for a reason, the current information technologies are fast enough so that there is no need to "compress" things and that documents are human readable. Getting back into ASN.1 is going back in the past, in binary files hell.
  • Forget the 200 bytes for a moment. The think about how much information could be kept in 2 bytes and some bits. Lets say 20 bits. Well if you know about information you would clearly see that this amount of information is small, no matter what the original document contained.

    So to argue this is an effective protocol/technique to use, I bet there will be lots of other ways to send 20 bits of information. I really would like to see and XML document with only 20 bits of information, quite empty right?

    It is not always important to look at the compression rates, unless you clearly have a bandwidth problem.

    Now the strength of XML... that's an entirely other story.

  • "could be used to compress a 200 byte XML document to 2 bytes and few bits."

    This is a hoax. Someone played a trick like this on Byte Magazine (before Byte quit publishing). It is amazing that the editors didn't immediately recognize the impossibility of extreme claims of compression.

    I searched the comments for the word "hoax", but no one commenting here has used the word. Anyhow, it can't happen.
    • I believe you're right. Compressing 200 bytes into 16-24 bits is superior than the Hammingway code. I've never heard of better compression than Hammingway, and this is basically used for strings, compressing XML, you need to process the hierarchial structure of XML documents into the compressed format. I couldn't find a compression spec. so, until I haven't seen that, I tend to be sceptical.
  • by hivolt (468311) on Tuesday August 07, 2001 @06:22PM (#2167439)
    Sounds like a lossy compression program I heard about early April....it could compress to 0 bytes, if I remember correctly.
    • Lossy-soft! (Score:4, Funny)

      by D. Mann (86819) on Tuesday August 07, 2001 @11:48PM (#2149820) Homepage
      Why, that sounds like LossySoft! Compress gigabytes of files to bits!

      An excerpt from LampreySoft's page:
      After a typical LossySoft HSV compression cycle you achieve a 16:1 compression ratio, or


      9 gigabytes = approx 600 megabytes. You've compressed your data on your very expensive hard drive into a size that will fit on an average 2 gigabyte hard drive with PLENTY of room to spare.

      Here's where the REAL excitement comes in - let's run the compression cycle TEN TIMES!

      Cycle Size in bytes

      9,663,676,416 (9 gigs, it takes a huge hard drive to hold)
      603,979,776 (approx 600 megs, fits on an Iomega Jaz disk, a Syquest SyJet disk, or a CD-R)
      37,748,736 (approx 35 megs, fits on an Iomega Zip disk, a Syquest Ezflyer disk, or a LS-120 disk)
      2,359,296 (approx 2 megs, transfers fairly quickly on a 28.8K or faster modem)
      147,456 (approx 150K, fits on all current removable media)
      9,216 (9K - wow!)
      576 (just over HALF a K!)
      36 (that's BYTES, folks!)
      2.25 (incredible, isn't it?)
      0.140625 (AMAZING!)
      Current technology can't split bytes very well, so the minimum you can compress any disk to is 1 bit.

      (Note: future LampreySoft products will use advanced features of quantum mathematics to reduce the lowest unit of information measure to sub-bit levels)


      LossySoft! [smart.net]
    • If it can compress to 0 bits, not only we can save lot of bandwidth transferring those 0 bytes, but also lot faster. Light-speed is only a limit if the transfered "thing" convey information, so we don't have such a limit.

      Errr... just realised that most /. posts can be also transferred at higher speeds.

      PS: did that information appear in early April? I missed it.

    • Your Latin is incorrect. "Primus" should agree with "postum." It should be "postum primum."
  • by pjbass (144318) on Tuesday August 07, 2001 @06:24PM (#2167458) Homepage
    When you look at it, it's pretty cool to see that protocols that go back many years (Ethernet for example) just keep coming back with positive results, and scale way beyond what they were ever intended for in their respective RFC. What happened to most current protocols developed recently? Exchange is one that comes to mind...
    • What happened to most current protocols developed recently? Exchange is one that comes to mind...

      I'm not sure what protocol you're referring to when you say Exchange. Are you talking about, perchance, Microsoft Exchange Server? The one that uses X.400 for site-to-site communication? The X.400 that uses ASN.1 encoding?
  • mod_gzip ? (Score:4, Informative)

    by AdamInParadise (257888) on Tuesday August 07, 2001 @06:26PM (#2167480) Homepage
    Ever heard of mod_gzip? It compress anything that goes trough your Apache webserver and it is supported by most browsers. With everything running over http theses days, this is the way to go...
    • Re:mod_gzip ? (Score:3, Informative)

      by Nemesis][ (21247)
      Yes, it's a very welcome and needed addition to a "bloated" protocol. But just be aware of some possible drawbacks when using it.

      It dosn't work with SSL easily. See this [over.net] thread if curious. I ran into this when I wanted to force Open Webmail [ncku.edu.tw] to use https only and found the pages were not getting compressed.

      And take note of possible problems [over.net] with caching proxies serving pages to browsers that can't handle it.

      It has a few other quirks, but overall I for one am quite satisfied with it.
      Curious about the savings it brings? Use this [krakow.pl].

      Machines are always broken till the repairman comes.
    • Ever heard of mod_gzip? It compress anything that goes trough your Apache webserver and it is supported by most browsers. With everything running over http theses days, this is the way to go...

      First of all, this seems a bit off topic. Second, you can read about HTTP compression on the W3C website [w3.org]. It's definatly not a HUGE impact (and has some bugs with certain browsers base on my own tests). Finally, AFAIK, ALL major web servers have this built in as it is part of the HTTP1.1 spec. Nothing to see here, move on please :).
      • First of all, how the hell is mod_gzip being mentioned in a bandwidth saving setting offtopic? Second, it depends on what kind of files you are serving. Text files will get better compression than tarballs. Thirdly I would like to know more about "some bugs with certain browsers base on my own tests" that you report. I would also like to know what idiot modded the previous post up.
  • by Karpe (1147) on Tuesday August 07, 2001 @06:27PM (#2167487) Homepage
    I believe it was Internetworking with TCP/IP, or perhaps Tanenbaum's Computer Networks, and the "conclusion" of the chapter on SNMP (which uses ASN.1) was that today, it is much more important to make protocols that are simple to handle, than stuff that conserves bandwidth at the price of performance, since the "moore's law for bandwidth" is stronger than the "moore's law for cpu power". You could use (and already uses) compressed communication links, anyway.

    This is the same philosophy of IP, ATM, or any modern network technology. Simple, but fast.
    • I've done real quantitative studies on the topic, and quite frankly you got it wrong. Moore's Law (for CPU power) is far stronger than "Moore's Law for bandwidth". Bandwidth growth has been on the order of 30-40%/year, while CPU power has grown faster than that for at least two decades.

      ASN.1 is well known outside of the IETF fundamentalist crowd. With its PER (packed encoding rules), it is very efficient of bandwidth and not all that CPU intensive either. Nor is it difficult, if used correctly (and anything can be tough if used wrong). It's a simple tag-length-value notation which can recurse. The only reason the Internet doesn't use it more is the usual NIH.
      • ASN.1 is well known outside of the IETF fundamentalist crowd.

        Always nice to start with a nice Ad Hominem jibe. I'll try one myself "ASN.1 is supported mainly by the failled has-beens who designed OSI".

        With its PER (packed encoding rules), it is very efficient of bandwidth and not all that CPU intensive either.

        Utterly misleading. ASN.1 encoding rules are relatively simple, the data model is the big smelly dung heap to be avoided. Although the encoding rules are 'simple' the Derranged Encoding Rules (DER) used in X.509 require multiple recursive passes through the data structure to encode it.

        The only reason the Internet doesn't use it more is the usual NIH.

        On the contraty, several IETF protocols have used ASN.1 and the experience has been pretty miserable. The biggest problem being that ISO keeps tweaking the spec in ways that break existing implementations. ASN.1 is simply too much of a pain in the ass for the limited advantage it provides.

        The group's attempt to claim ASN.1 as the savior of HTTP is ignorant and stupid. There have been many proposals to compress HTTP headers and ASN.1 is actually one of the worst performers on both overhead and performance. The reason none of the proposals have gone anywhere is that there is no point in a backwards-incompatible change that saves 100 bytes or so on the headers if you don't do something about compressing the body. The biggest mistake we made in HTTP was not putting a simple huffman coding compression algorithm for ASCII text into the server and browsers. Actually the reason we didn't get arround to it was that nobody wanted to mess arround with the patent minefield.

        Still it is always easier to explain that the reason the world is not using your idea is because they are stupid and ignorant and not because your idea is stupid and ignorant. In the case of ASN.1 the idea is a good one but the execution if third or fourth rate at best.

      • This is ridiculous: The "IETF crowd" has been proven right time and time again. And they are well aware of the horrors that await in ASN.1 and other relics of OSI stupidity. Read Marshall Rose's books for more insight on this: "The Simple Book" is a good treatment of ASN.1 and SNMP, "The Internet Message" rails on (quite correctly) about the indescribable stupidity of X.400 mail, another OSI idiocy.

        If you didn't live through those horrible days when the trendy crowd was all for OSI and claiming that OSI was the One True Way and would and should eliminate the scourge of the Internet and TCP/IP from the face of the earth, then you really don't get the evil of ASN.1 and its ilk...
      • Check the graphs in this page [freerepublic.com]. Altough this is not a complete reference, the same data, suggesting the bandwidth of opctical fibers to be growing faster that doubling every 18 months can be found in many other articles.

        But I agree that a generalization of fiber capacity to bandwidth must be done with extreme caution.
  • bandwidth is cheap (Score:2, Insightful)

    by Proud Geek (260376)
    So who cares about compression. Personally, I'd much prefer the open and obvious standards of XML to some obfuscated form. Data is confusing enough already; at least XML gives a clear description that I can use with a packet sniffer when trying to debug something.
    • So who cares about compression. Personally, I'd much prefer the open and obvious standards of XML to some obfuscated form. Data is confusing enough already; at least XML gives a clear description that I can use with a packet sniffer when trying to debug something.

      You're kidding right? Most CS people I know cringe at the fact that XML can more than double the size of a document with largely redundant tags. The only thing to be thankful for is that the documents typically compress very well due to the large number of redundant tags and that HTTP 1.1 supports compression especially know that XML over HTTP (i.e. web services) is being beaten to death by a lot of people in the software industry. Numerous [xml.com] articles [irt.org] about [att.com] XML compression [xml.com] also tend to disagree with you that it is not an issue.

      PS: If bandwidth is so cheap how come DSL companies are going out of business and AOL owns Time Warner? This would tend to imply that low bandwidth connections are still the order of the day.
      • DSL companies are going out of buisness because of poor planning poor support poor service [and the list goes on...] Many dialup isp died too or got bought out, so your logic doesn't apply.
      • PS: If bandwidth is so cheap how come DSL companies are going out of business

        DSL companies are going out of business because... bandwidth is so cheap. And it's their own fault.

        and AOL owns Time Warner? This would tend to imply that low bandwidth connections are still the order of the day.

        Why? Are you saying AOL=dialup, and Time-Warner=cable? There's a LOT more to both of those companies than either of those two things...
    • by Jeffrey Baker (6191) on Tuesday August 07, 2001 @06:56PM (#2167641)
      at least XML gives a clear description that I can use with a packet sniffer when trying to debug something.

      Translated:

      My debugging tools are inadequate, and my brain is inadequate for improving them.

      You have a powerful, general-purpose computer at your disposal. Why should you care if the protocol can be inspected with the naked eye? Do you use an oscilloscope to pretty-print IP packets? No, you use ethereal [ethereal.com]! If XML is encoded using ASN.1, then the tools will be modified to decode ASN.1 before showing it to the human. Ethereal already knows about ASN.1 [ethereal.com] because it uses it to display LDAP traffic. If you don't like ethereal, try Unigone [unigone.com].

      Use your CPU, not your eyeballs!

    • I agree, leave XML uncompressed. Let modems compress the data - it might not be as efficient but it keeps things simple.

      Willy

    • "bandwidth is cheap"

      I'm typing this over a 56k connection. If I want faster in this area, I can either pay for a leased line, an ISDN line, or a satellite connection. If these options are cheap, could you buy me one please?

  • by Bruce Perens (3872) <bruce@perens.com> on Tuesday August 07, 2001 @06:29PM (#2167497) Homepage Journal
    What we're really saying here is that XML is a very verbose protocol, and that ASN.1 isn't. But verbosity, or lack thereof, is hardly unique. Also, there is no compression claim here - only the difference in verbosity.

    ASN.1 uses integers as its symbols. Remember the protocol used for SNMP? Did you really like it? It's not too human-readable or writable.

    Also, the idea of promoting it through a consortium is rather old-fashioned.

    Bruce

  • Multimedia? (Score:3, Interesting)

    by starseeker (141897) on Tuesday August 07, 2001 @06:32PM (#2167517) Homepage
    Isn't most of the bandwith on the internet is consumed by multimedia - images, music files, and the odd video? I have seldom encountered an html file larger than a meg, and even those are in my experience very rare.

    Yes, it would be nice to make the internet move faster with current technology, and I would support this for people on very slow connections. It might also be a boon for servers that get hit hard and often (though I doubt it would stop the Slashdot effect ;-) For the majority of single use internet concerns, however, I just don't see this doing a whole lot.

    Of course, I hope I'm wrong. More effective bandwith is a Good Thing.
    • by Sir Robin (9082)
      I have seldom encountered an html file larger than a meg, and even those are in my experience very rare.

      You've obviously never saved a 5k Word doc in HTML. *sigh*.
  • ASN.1 not suitable (Score:5, Informative)

    by cartman (18204) on Tuesday August 07, 2001 @06:33PM (#2167523)
    ASN.1 is the basis of a great many protocols, LDAP among them. What is not mentioned in the article is that ASN.1 is a binary protocol and is therefore not human-readable. It may save space for bandwidth-constrained applications. However, bandwidth has a tendency to increase over time. When all wireless handhelds have a megabit of bandwidth, we would sorely regret being tied to ASN.1, as LDAP regrets it now.

    Not to mention, ASN.1 does not generally reduce the document size by more than 40% compared to XML. Think about it: how much space is really taken by tags?

    It's also worth noting that there is lots of documentation surrounding XML. With ASN.1 you have to download the spec from ITU which is an INCREDIBLY annoying organization and their specs are barely readable and they charge money to look at them, despite the fact that they are supposedly an open organization. The IETF and the W3C are actually open organizations; ITU just pretends to be. ITU does whatever it can to restrict the distribution of their specifications.
    • by pegacat (89763) on Tuesday August 07, 2001 @07:08PM (#2167720) Homepage

      This is pretty much right. I do a lot of work on X500 / ldap / security, and ASN1 is used throughout all this. It does a pretty good job, but as the poster points out, the ITU is a completely brain damaged relic of the sort of big company old boys club that used to make standards. It's very difficult to get info out of them. (Once you get it though, it's usually pretty thorough!)

      As for the 'compression', well, yes, it sorta would be shorter under many circumstances. ASN1 uses pre-defined 'global' schema that everyone is presumed to have. Once (!) you've got that schema, subsequent messages can be very terse. (Without the schema you can still figure out the structure of the data, but you don't know what its for). For example, I've seen people try to encode X509 certificates (which are ASN.1) in XML, and they blow out to many times the size. Since each 'tag equivalent' in ASN.1 is a numeric OID (object identifier), the tags are usually far shorter than their XML equivalents. And ASN.1 is binary, whereas XML has to escape binary sequences (base64?).

      But yeah, ASN.1 is a pain to read. XML is nice for humans, ASN1 is nice for computers. Both require a XML parser/ ASN.1 compiler though. ASN.1 can be very neat from an OO point of view, 'cause your ASN.1 compiler can create objects from the raw ASN.1 (a bit like a java serialised object). But I can't see ASN.1 being much chop to compress text documents, there are much better ways of doing that around already (and I thought a lot of that stuff was automatically handled by the transport layer these days?)

      And just for the record... the XML people grabbed a bunch of good ideas from ASN.1, which is good, and LDAPs problems are more that they screwed up trying to do a cut down version of X500, than that they use ASN.1 :-)!

      • by vsync64 (155958)
        Once (!) you've got that schema, subsequent messages can be very terse. (Without the schema you can still figure out the structure of the data, but you don't know what its for).

        Heh. How is this different from XML?

        I'm always amused by people that assume XML will be the magic lingua franca of the Internet and everyone will be able to parse every last bit of meaning out of your document just because it's encased in <handwaving><readable by="human"><tags /></readable></handwaving> without ever agreeing on any of those nasty "standards" things. Guess what, people: until we have a solution to the strong AI problem, human readable don't mean squat.

        • I'm always amused by people that assume XML will be the magic lingua franca of the Internet and everyone will be able to parse every last bit of meaning out of your document just because [it's human-readable] without ever agreeing on any of those nasty "standards" things.

          Apparently you've never had to write a parser for EDI [everything2.com], or any other binary data interchange format.

          I'm not going to claim that XML is a magic bullet for data interchange -- but I will attest that human-readable data formats are superior to binary formats when it comes to data interchange. I have lost track of the number of custom parsers I've had to write over the last 15+ years in order to convert data from one system to another, simply because the systems in question didn't have a shared data format. The big wins for XML are that (1) you can visually inspect your before-and-after results, (2) you don't have to write the parser, even if you have to write code to call it, (3) there are actually two sensible APIs to match two very different ways to look at the data, each of which is parser independent, and best of all (4) if you don't have documentation for the schema (or it's misimplemented), you still have a prayer of interpreting the data correctly.

          Anyone who's ever had to write an EDI application will *instantly* understand the appeal of XML.

    • are condemned to repeat it. Badly.

      I have had to deal with dozens of binary protocols that do the same thing as ASN.1, and do it worse.

      As to comparisons, XML and ASN.1 are designed for different jobs. Designing a Web page in ASN.1 would be ridiculous. Sending (say) telemetry data encoded in XML is equally ridiculous. I can believe that *data* transmissions could be 100 times larger in XML than in ASN.1. You have the header, DTD, some namespace delcarations, and a bunch of nested tags, just to express a couple of numbers.

      Problem is, XML is one of the latest forms of fairy dust that Management has latched onto. "Sprinkle this on your project and it will fly!" So programs have XML grafted onto them anywhere it might fit.

      A particularly cute example is SOAP (Microsoft's firewall-bypass protocol) It's going to be fun to watch people try to squeeze some performance out of a SOAP based system that tries to do something interactive.

      As to the ISO, yeah, they're seriously obnoxious. They tend to go off into their own little world, redefine standard terminology so they're incomprehensible to outsiders, and come up with stuff that can't be implemented. (Nobody uses ASN.1 -- it's unimplementable. When people talk about using ASN.1 for something real, they're talking about a subset. A subset, of course, cannot claim conformance to the standard.) The crowning insult, of course, is that they fund the organization by selling the standards. Hey, it's a standard -- you *have* to buy it!

      "It's all in knowing what wrench to use to pound in the screw."

      • I could've sworn I saw something on the W3C about SOAP [w3.org]?
        I don't see what's so bad about judiciously applied XML. If you'd like to piddlefart around with obscure offsets and byte counts in binary transfers, knock yourself out. XML doesn't bloat transmissions up that much (argue about node overhead, then remember filler columns) and every machine in existence speaks text.
        Of course it's not all things for all people, but in the right place at the right time, it's just fine.
      • StormyMonday writes:

        Problem is, XML is one of the latest forms of fairy dust that Management has latched onto. "Sprinkle this on your project and it will fly!" So programs have XML grafted onto them anywhere it might fit.

        XML is no magic bullet; however, that doesn't change the fact that it is incredibly useful in many different circumstances. XML, realistically used, can make some projects simpler, and data transfers much more comprehensible.

        A particularly cute example is SOAP (Microsoft's firewall-bypass protocol) It's going to be fun to watch people try to squeeze some performance out of a SOAP based system that tries to do something interactive.

        SOAP, XML-RPC and similar protocols are designed for generic, highly interoperable, communications, not performance. Anybody who expects blinding performance out of an XML encoded procedure call shouldn't be programming. You want performance, use a custom protocol, or at least CORBA. SOAP is for when you can sacrifice performance to gain interoperability.

        I'd even go a step farther: anything that can be done using an XML-based data format can be done smaller and faster by some other design. However, as machines get larger, faster and cheaper, getting that last bit of performance becomes less and less important for most computing tasks. XML is great for tasks that don't need every last ounce of speed. Save the custom-tuned binary formats and protocols for the few apps that really need them.
    • ...bandwidth has a tendency to increase over time. When all wireless handhelds have a megabit of bandwidth, we would sorely regret being tied to ASN.1, as LDAP regrets it now.

      Not to mention, ASN.1 does not generally reduce the document size by more than 40% compared to XML. Think about it: how much space is really taken by tags?

      I share your dislike of unnecessary bit squishing. But I have to pick some nits.

      First, you shouldn't assume that available bandwidth will steadily increase. It will take some major breakthroughs -- not just technical, but political and economic -- before there's a megabit internet connection every place where it might be useful. And wide-area wireless networking is in an even worse state. Not to mention that radio spectrum is a finite resource.

      Your point about tags is well-taken. But you can compress the content too. Using 8 bits for every character is very inefficient, especially considering that there are only 128 characters to represent. With the right scheme, you could certainly get the average character width to somewhere between 4 and 5 bits.

  • by Anonymous Coward on Tuesday August 07, 2001 @06:33PM (#2167526)
    If I remember the history right, ASN.1 was designed during the era of X.25 and charging for every 64 byte packet. I used to use ASN.1 for remote communications in a commercial product, but later changed it to a hybdrid of CORBA and XML, mostly due to more modern techologies, and since the actual bandwith did not cost that much anymore, it did not make sense to keep an old protocol alive. ASN.1 has it's drawbacks too--8 different ways to encode a floating point number. It was a political reason, because everyone involved wanted their own floating point format included, and as a net result, everyone has to be able to decode 8 different formats. A encoding designed by a committee (a stoneage telcom committe as a matter of fact).
  • Missing the point? (Score:2, Insightful)

    by MikeyNg (88437)

    Bandwidth is cheap now, but it may not be forever. Yes, we'll most likely continue to see order of magnitude increases for years and decades to come, but it'll slow down sometime.

    Also, consider wireless devices. Their bandwidth isn't there right now, and maybe with 3G we'll see a nice increase, but I can see that as a practical application for this type of compression.

    Let's also not forget that even though it's compressed, you can always uncompress it into regular old XML to actually read it and understand it, for you human folks that actually need like LETTERS and stuff! That's it. I'm just going to start writing everything in integers soon. Time to change my .sig!

  • What you would lose is the readability. Any symbol in an html file could be reduced to a byte or less depending on the total number of symbols used. Consider a 80 character line of text with
    each character a different color. For each character you'd need data approxately equal to:

    a

    This entire sequence could be compressed into 4 bytes or less, but you would require an html compiler instead of coding it by hand (unless you're one of those crazy people that prefer coding opcodes straight over using C).

    The issue with html, and the reason why we don't worry about the inefficiency much is the fact that you could have a rather extensive html file with one link to a single picture, and that picture would easily take up the space of the entire html file.

    -Restil
    • If you are using a modem with a "V54" or "Vnn".

      If anywhere in the network two CISCO or two NORTEL routers are talking to each other, if your backbone provider is reasonably competant and wants to make money.

      Then your web traffic is already being compressed.

      One of the great things about HTML and XML is that it compresses really easily using comparitively simple compression algorithms.

      So any effort you put in "compressing" XML traffic is wasted as your network hardware would probably have done it anyway.

  • When the web was lots of static pages and images, and bandwidth was scarce, compression made sense.

    With the current over-supply of domestic bandwidth and the move to database-driven, customised web sites, is it worth spending CPU cycles compressing small data files on-the-fly?

    Most popular websites don't suffer from poor connectivity -- they suffer from too little back-end grunt.

    • Imagine starting your own website. When you are paying for bandwidth on a site that has a >100KB front page (like slashdot on my configuration) then it is definitly worth it. Not everyone is on broadband and many people won't be for a long long time. Saving bandwidth is always good, whatever the situation. And besides, many many page serves can be had (10,000 a day) off a very inexpensive computer (K6-2 400 Mhz) even on a complex website (scoop driven).
  • by gd23ka (324741) on Tuesday August 07, 2001 @06:43PM (#2167577) Homepage
    Actually ASN.1 is a formal way of specifying how to encode data into binary representations like BER, CER, DER and PER which do save bandwidth compared to XML.

    Those of you that want to find out more about ASN.1, can pick up free e-books on ASN.1 here [oss.com]. There's some blatant propaganda in them for OSS Nokalva's ASN.1 compiler, but of course there's also snacc [gnu.org], an GPL'd open source ASN.1 compiler. Snacc however only generates code for encoding to BER, so you might also want to check out the a hacked version [qut.edu.au] of snacc from Queensland University of Technology.

    ASN.1 is a base technology for a lot of standards out there like X.509, PKCS and LDAP, the OSI application layer protocols etc.
  • by TroyFoley (238708) on Tuesday August 07, 2001 @06:46PM (#2167596) Homepage Journal
    I figured it out. They do it by removing the data pertaining to popup/popunder banners! 100 to 1 ratio seems about right.
  • by coyote-san (38515) on Tuesday August 07, 2001 @06:46PM (#2167599)
    This idea totally misses the point.

    ASN.1 achieves good compression because the designer must specify every single and parameter for all time. The ASN.1 compiler, among other things, then figures out that that "Letterhead, A4, landscape" mode flag should be encoded as something like 4.16.3.23.1.5, which is actually a sequence of bits that can fit into 2 bytes because the ASN.1 grammar knows exactly how few bits are sufficient for every possible case.

    In contrast, XML starts with *X* because it's designed to be extensible. The DTDs are not cast in stone, and in fact a well-behaved application should read the DTD for each session, and only extracting the items of interest. It's not an error if one site decides to extend their DTD locally, provided they don't remove anything.

    But if you use ASN.1 compression, you either need to cast those XML DTDs into stone (defeating the main reason for XML in the first place), or compile the DTD into an ASN.1 compiler on the fly (an expensive operation, at least at the moment).

    This idea is actually pretty clever if you control both sides of the connection and can ensure that the ASN.1 always matches the DTD, but as a general solution it's the wrong idea at the wrong time.
    • The ASN.1 compiler only has to run when the DTD changes, if the compiler can output a program that converts XML to ASN.1.
      • What if the XML document is representative of a dynamic aggregate of multiple schemas?

        Say what? Heh heh...

        Let's say you have an XHTML document (one DTD) that contains MathML (another DTD) and some SVG for good measure (third DTD). This would not be handled in your static DTD compile unless you made specific provisions for all of them in a single document. But what if the next document only has one of them used? Or two? Or includes some other one later? Are you going to compile every permutation of DTD that could ever occur?

        This is where the strength of XML is not necessarily compatible with the strengths of ASN.1.
  • by Eryq (313869) on Tuesday August 07, 2001 @06:50PM (#2167615) Homepage

    XML, by virtue of being text-based, may be easily inspected and understood. Sure, it's a little bulky, but if you're transmitting something like an XML-encoded vCard versus an ASN.1 encoding of the same info, the bulk is negligible.

    Yes, for mp3-sized data streams, or real-time systems, there would be a difference. But many interesting applications don't require that much bandwidth.

    ASN.1 achieves its compactness by sacrificing transparency. Sure, it's probably straightforward enough if you have the document which says how the tags are encoded, but good documentation of anything is rare as hen's teeth, and not all software companies are willing to play nice with the developer community at large and share their standards documents. And some of them get downright nassssssty if your reverse engineer...

    Transparency is one of the reasons for the rapid growth of the Web: both HTML and HTTP were easy enough to understand that it took very little tech savvy to throw up a website or code an HTTPD or a CGI program.

    Transparency and extensibiliy also make XML an excellent archival format; so if your protocol messages contain data you want to keep around for a while, you can snip out portions of the stream and save them, knowing that 10 or 15 years from now, even if all the relevant apps (and their documentation) disappear, you'll still be able to grok the data.

  • by ciurana (2603) on Tuesday August 07, 2001 @07:02PM (#2167672) Homepage Journal

    Some people in this forum think that ASN.1 is a replacement for XML; others think of it as a "lossy" compression algorithm. ASN.1 is neither. Read the article and learn a bit about ASN.1 before forming an opinion. Most important, ASN.1 has been an interoperability standard for at least 10 years prior to the introduction of XML.

    ASN.1 is a standard interoperability protocol (ISO IS 8824 and 8825) that defines a transfer syntax irrespective of the local system's syntax. In the scenario described in the article, the local syntax is XML and the transfer syntax is ASN.1. ASN.1 is a collection of data values with some meaning associated with them. It doesn't specify how the values are to be encoded. The semantics of those values are left to the application to resolve (i.e. XML). ASN.1 defines only the transfer syntax between systems.

    ASN.1 codes are defined in terms of one or more octets (bytes) joined together in something called an encoding structure. This encoding structure may have values associated with it in terms on bits rather than bytes. An encoding structure has three parts: Identifier, Length, and Contents octets. Id octects are used for specifying primitive or constructor data types. Length octets define the size of the actual content. A boolean is thus represented by a single bit, and digits 0-9 could be BCD encoded. Each encoding structure carries with it it's interpretation.

    An XML document could thus be encoded by converting the tags into a lookup table and a single octect code. If the tags are too many, or too long (i.e. FIRST-NAME) then there are significant savings by replacing the whole tag with an ASN.1 encoded datum. If we assume there are up to 255 different potential tags in the XML document definition, then each could be assigned to a single byte. Thus, encoding the tag <FIRST-NAME> would only take two bytes: One for the ID, one for the length octet, and zero for the contents (the tag ID could carry its own meaning).

    I used to work with OSI networks at IBM. All the traffic was ASN.1-encoded. I personally think this is an excellent idea because ASN.1 parsers are simple and straightforward to implement, fast, their output is architecture independent, and the technology is very stable. Most important, this is a PRESENTATION LAYER protocol, not an APPLICATION LAYER protocol. The semantics of the encoding are left to the XML program. Carefully encoded ASN.1 will preserve the exact format of the original XML document while allowing its fast transmission between two systems.

    http://www.bgbm.fu-berlin.de/TDWG/acc/Documents/as n1gloss.htm has an excellent overview if you're interested.

    Cheers!

    E
    • An XML document could thus be encoded by converting the tags into a lookup table and a single octect code. If the tags are too many, or too long (i.e. FIRST-NAME) then there are significant savings by replacing the whole tag with an ASN.1 encoded datum. If we assume there are up to 255 different potential tags in the XML document definition, then each could be assigned to a single byte. Thus, encoding the tag <FIRST-NAME> would only take two bytes: One for the ID, one for the length octet, and zero for the contents (the tag ID could carry its own meaning).

      That's fine, but leaves the X out of XML: eXtensibility. A lot of existing XML schemas have slots of the form <xs:any namespace="##other"/&gt which allows any foreign tag, known or unknown, defined or not, to be incorporated at that point. As far as I know, ASN.1 can't cope with that without both explicit tagging and a fully-expanded OID for the incorporated entity (since it's not enumerable), which creates metadata bloat all over again.

      Another XML design goal is that a document be parsable (at least as far as an abstract syntax tree) without foreknowledge of the type structure. A couple of mechanisms from SGML that were forbidden in XML but don't defeat this goal are empty end-tags and unquoted (single-token) attribute values. Empty end-tags would knock a large chunk out of the size of a complex XML document by allowing a simple </> to close whatever element was last opened. Unquoted attribute values can save 2 characters per attribute and also feel more natural when the values aren't stringlike in nature; quoting small integers just grates on me, anyway.

      Another approach is defining a general binary shorthand coding for XML; a place I worked at had one in use for wire transmission of XML between hosts running their code base.

  • I wonder if the same could be done with XHTML or even regular HTML.

    If HTML is written properyly, it is XML. Browsers nowadays let you cheat, and mix tags, and ignore quotes, but if the HTML is written to spec, then it is technically XML.

    Captain_Frisk
  • by Anonymous Coward
    ASN.1 and a way of encoding ASN.1 (BER is commonly used) produces output that's binary. Encoded like this it represents everything using type, length, and data. So to represent, say, the integer 255 you'd encode it like this, using BER: [type byte: ?] [length byte: 1] [value byte: 255] So that's three bytes to encode a single byte integer. Great. Basically the advantages of ASN.1 are that it's a well defined way to express data types, and it has encodings that are platform neutral. Compared to other fixed-field binary protocols it's fat and not particularly robust (got a length value wrong anywhere? You can't make any sense of the rest of the data). It's a binary protocol, which means you can't just look at the data and understand it, which I see as a huge disadvantage--in my mind the reason the net is big now is because the protocols are straightforward and easy to understand at a glance. I work with ASN.1 every day in the guise of SNMP and I've learned to become annoyed with it. Ever see ASN.1 in the form of a mib? Bleeh. XML is popular because it's flexible and extendable. You don't really have a prayer of understanding encoded ASN.1 data without the full ASN.1 definition for the data, whereas with XML it's inherently human readable. Maybe there's more to this and it's a good fit, but I am not a big fan of ASN.1. - Bill
  • This is funny ... (Score:2, Informative)

    by ras (84108)

    I remember when I first came across ASN.1 years ago. Everybody hated it because the parser was sssooo big and complex. Why not just use a simple ASCII file was a common refrain. Sure ASN.1 was capable of representing just about any data structure in a reasonably compact form, but most information did not need complex data structures to represent it so why does anybody use ASN.1?

    Well a decade or two later we get the ASCII version of ASN.1 - XML. And guess what? It's arguably harder to write a generic parser for XML that it is for ASN.1. (I still have not found a good open source validating parser for XML.) But guess what - everybody is wildly enthusiastic this time round. My how times change!

    Actually ASN.1 and XML in some ways are very similar. They try to solve the same problem - how to represent complex data structures in a generic way. And they do it in a similar way. Because ASN.1 is binary and uses numbers instead of text tags it does use a lot less space to represent the same thing, although 2 verus 200 bytes claim is at best misleading. Most of the 200 bytes would probably be XML header (dtd's and stuff) which you would not put in the ASN.1 encoding.

    And yes, XML is too fat for some applications. For example, if you are pumping out a 60k row SQL table to your 1000 clients every day you probably would not choose XML. That is why this idea has merit. It could give you the benefits of XML without the fat. To work someone who have to come up with a standard way of translating a DTD to ASN.1 encoding. I know it's a good idea because I came up with it myself a while back :).

    • I've never quite understood why some people found the idea to have machines communicate with a data format designed to be readable by humans so intelligent. Because of this oh-so-intelligent idea, we have:
      • lots of broken pages with wrong HTML syntax
      • lots of broken browsers with different ideas about how to interpret HTML
      • a huge amount of bandwidth wasted with unnecessary whitespace and superfluous characters
      A standard format for web content without a human-readable form (i.e. a compact binary encoding) would have many advantages:
      • syntax could be strict, so no ambiguity would be supported (i.e. there would never have been a reason to support things like both size="123" and size=123)
      • the documents/content would be checked prior to publication on the WWW (because a "compilation" step would be needed in case the content was typed in by a human being, and high level libraries / widgets would probably be used in generators for dynamic content)
      • no waste of precious bandwidth!
      • anyone who wanted proprietary extensions for their encoding would have to give you their parser and generator/compiler
      OK, so XML is more strict and extensible than HTML, but it's still based on the irrational notion of encoding things in a human-readable form - trading bandwidth for readability - when in most cases no human will ever look at them.
      • by alienmole (15522) on Tuesday August 07, 2001 @10:24PM (#2168613)
        I think you simply haven't realized quite how useful it is, in real life, for information to be human-readable. When it isn't, it becomes harder to deal with. If you've programmed anything on the web, you're certainly familiar with using "View Source" to see the final source of a page. If you use XML, you've also examined XML data that's been generated by, say, a database server.

        Contrast that with what I'm dealing with right now: I'm using JDBC to access an MS SQL Server. MS bought their SQL Server from Sybase many years ago, and inherited the binary TDS data stream protocol. As efficient as this might be, when you run into problems, you're in trouble. The TDS format is undocumented, so you can't easily determine what the problem might be, whereas a text format would be easy to debug. Anytime you have a binary protocol, you become totally reliant on the tools that are available to interpret that protocol. With text protocols, you're much less restricted.

        Another example of this is standard Unix-based email systems vs. Microsoft Exchange. Exchange uses a proprietary database for its message base, which makes it effectively inaccessible to anything but specialized tools and a poorly-designed API. If your email is stored in some kind of text format, OTOH, there are a wealth of tools that can deal with it, right down to simple old grep.

        The bottom line is that the human-readability (and writability!!) of HTML was one of the major factors in the success of the web. It's no coincidence that everything on the web, and many other successful protocols, such as SMTP, are text-based. To paraphrase your subject line, binary protocols are BAD BAD BAD.

        Calling human-readable formats "irrational" is a bit like Spock on Star Trek calling things "illogical" - what that usually really meant was that the actual logic of the situation wasn't understood. What's irrational is encoding important information, which needs to be examined by humans for all sorts of reasons that go beyond what you happen to have imagined, into a format which humans can't easily read.

        Human-readable formats and protocols will remain important until humans have been completely "taken out of the loop" of programming computers (which means not in the forseeable future).

          • I think you simply haven't realized quite how useful it is, in real life, for information to be human-readable.
          This is particularly true if the humans work for an intelligence agency, law enforcement, or even a corporation that has decided it has a burning need to know what your information is. Encryption is BAD BAD BAD! You think ASN.1 is a bitch to debug? Try figuring out what's wrong with HTML that has even wimpy 40-bit DES slapped on it.

          Of course, you never have to deal with that because the SSL stream is already decoded for you. That might not help with a new format, but maybe someone could come up with a special language that's really good for rearranging data and making it presentable. We could call is "Practical Language for Extracting and Reporting." Yeah, PLER. That has kind of a nice ring to it. There are quite a few jobs that need this kind of data munging, but are too small for Java and would take too long to write in C++, so I'd be there'd be a lot of interest in this hypothetical PLER language.

        • I think you simply haven't realized quite how useful it is, in real life, for information to be human-readable. When it isn't, it becomes harder to deal with. If you've programmed anything on the web, you're certainly familiar with using "View Source" to see the final source of a page

          I have programmed something "on the web", but before it became such a fad, I used to like assembly language programming... Decoding a simple binary format is trivial and if the usual format for web pages was binary, Browsers would still allow you to use a "view source" command (to decode the binary format, probably giving a much more readable presentation of the structure of the document than the HTML code you can see nowdays)

  • A 200 byte message reduced to 2 bytes? I don't know ASN.1 but I would have to assume tags are counted, and added to an indexed table. Using variable-length encoding you can squeeze some extra compression out of your algorithm but 100:1 compression? So basically you have a 180-byte XML tag with a single value reduced to a single symbol with an index of 1. Meaning that the "benchmark" is a sham. Add to that the fact that the symbol table obviously wasn't counted in their "compression" technique. I would assume you don't LZ-compress the symbol table (creating a symbol table for a symbol table) so basically what you have is after compression the code goes from 200 bytes to 200 bytes + 2 bytes and a few bits. What a joke. The worst part of all is that I'm sure it achieves fairly good compression on a 100k XHTML document but they have to throw bogus numbers at us thinking we'll go all doe-eyed. Very insulting.
  • After writing an SNMP management console with an ASN.1 parser, I have nightmares about the protocol. Sure, it's very efficient yet flexible, but it makes all sorts of neural connections happen in your brain that are better left open. :P

    Since XML was designed for humans to be able to look at to a certain extent, why not just have a standard compreession method that's included with all XML parsers? Whenever you transmit or save the XML file, it should be saved in the compressed format.
  • by foo (143650)
    http://www.fokus.gmd.de/ovma/freeware/snacc/
  • ASN.1 could be used to compress a 200 byte XML document to 2 bytes and few bits

    Oh yeah?? I wrote a protocol that can take a 6 MB MP3 file and compress it to under 10 bytes!

    (Some sound quality degragation may occur, use at own risk)

  • by RobertGraham (28990) on Tuesday August 07, 2001 @09:29PM (#2168402) Homepage
    Preface: I've written parsers for ASN.1 (esp. SNMP MIBs, but also generic), BER/DER (same thing), PER, HTML, XML, and while we are at it, XDR and CORBA IDL. I've written a BER decoder that can decode SNMP at gigabit/second speeds.

    There are a vast number of differences between ASN.1 and XML. To think that ASN.1 is in any way related to XML demonstrates that they just don't "get it".

    1. Why not XDR or just raw binary?
    Why not just specify your own binary format for you application? The thing that the ASN.1 bigots don't understand is that in most real-world applications, the ASN.1 formatting provides only overhead but no realworld value. This happens in XML, too, but the value proposition for XML is much clearer. A good example is the H.323 series PER encoding which is just plain wrong: well-documented custom encoding would have been tons better.

    2. DTD or no DTD
    The ASN.1 language is essentially a DTD; it gets encoded in things like BER. The trick is that I can parse "well-formed" XML content without knowing the DTD. This is impossible with current ASN.1 encoding. The idea of DTD-free "well-formed" input and DTD-based "valid" input is at the core of XML. Yes, both ASN.1 and XML both format data, but proposing ASN.1 as being a valid substitute means you just don't grok what XML is all about

    3. Interoperability
    The Internet grew up in an environment that parsers should be liberal in what they receive. This was important in early interoperability, but now is a detriment. For example, it is impossible to write an interoperable HTML parser. XML took the radical zen approach of mandating that any parser that excepts malformed input is BAD. As a result, anybody writing an parser knows the input will be well-formed. There is one-and-only-one way to represent input (barring whitespace), so writing parsers is easy. ASN.1 has taken the opposite approach, there are a zillion ways to represent input.

    As a result, non-interoperable ASN.1 implementations abound. For example, most SNMP implementations are incompatible. They work only "most" of the time. Go to a standard SNMP MIB repository and you'll find that the same MIB must be published multiple times to handle different ASN.1 compilers.

    The long and the short of it is that ASN.1 implementations today are extremely incompatible with each other, whereas XML libraries have proving to extremely interoperable. Right now, XML has proven the MOST interoperable way to format data, and ASN.1 has proven to be the LEAST.

    4. Bugs
    Most XML parsers have proven to be robust, most ASN.1 parsers have proven to be buggy. You can DoS a lot of devices today by carefully crafting malformed SNMP BER packets.

    5. Security
    You can leverage ASN.1's multiple encodings to hack. For example, my SideStep program shows how to play with SNMP and evade network intrusion detection systems: http://robertgraham.com/tmp/sidestep.html [robertgraham.com] At the same time, ASN.1 parsers are riddled with buffer-overflows.

    Anyway, sorry for ranting. I think XML advocates are a little overzealous (watch carefully your possessions or some XMLite will come along and encode it), but ASN.1 is just plain wrong. The rumor is that somebody through it together as a sample to point out problems, but it was accidentally standardized. It is riddled with problems, it should be abandoned. An encoding system is rarely needed, but if you need one, pick XDR for gosh sakes.

    • Well said! A Silly Notation.1 is a hideous encoding scheme. The BER is simply ambiguous -- you don't need to send malformed packets to devices, rather simply send valid BER packets that just aren't right, but still follow the rules, and watch carnage ensue.
  • Among the voice-over-IP (VoIP) protocols out in the world are H.323 (an ITU-T spec that makes heavy use of ASN.1) and SIP [columbia.edu] (RFC 2543 et. al.)

    H.323 interoperability is tough. Some problems are due to differences in how one entity encodes a piece of data and another decodes it. Many H.323 implementations, um, do not fail gracefully under such circumstances.

    SIP call signalling looks like HTTP. There have been complaints that it's too verbose, and needs to be replaced with something binary. One proposal [ietf.org] suggests using a binary encoding. It uses LZW [google.com] compression and shared "codebooks" (schemas?)

    That's just for call signalling. Both these VoIP protocols (and others) use RTP [columbia.edu] ("Real Time Protocol") for voice, video, etc.; that's encoded and compressed pretty darned seriously.

    (I'm not speaking for my employer, I'm just speaking my mind.)
    • There is a revolutionary new form of voice compression that works, not only over VoIP but also on your analog telephone lines.

      Simply cut out the un-needed words.

      [dials]
      Broken down. Main street. Need spare tyre.
      [hangs up]

      See, it'll half your phone bills!
  • Big fuckin' deal. I compressed an entire Microsoft Operating System into a single byte once. HALT 0

Forty two.

Working...