Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
News

Old Protocol Could Save Massive Bandwidth 287

GFD writes: "The EETimes has a story about a relavtively old protocol for structured information call ASN.1 could be used to compress a 200 byte XML document to 2 bytes and few bits. I wonder if the same could be done with XHTML or even regular HTML."
This discussion has been archived. No new comments can be posted.

Old Protocol Could Save Massive Bandwidth

Comments Filter:
  • by mcspock ( 252093 ) on Tuesday August 07, 2001 @07:23PM (#2167445)
    somehow i find it hard to believe that a method for compressing text at a 100:1 ratio has been buried away forever. standard compression programs get about 10:1 on text, you'd think that a better model would be incorporated if one existed.
  • bandwidth is cheap (Score:2, Insightful)

    by Proud Geek ( 260376 ) on Tuesday August 07, 2001 @07:28PM (#2167491) Homepage Journal
    So who cares about compression. Personally, I'd much prefer the open and obvious standards of XML to some obfuscated form. Data is confusing enough already; at least XML gives a clear description that I can use with a packet sniffer when trying to debug something.
  • by Bruce Perens ( 3872 ) <bruce@perens.com> on Tuesday August 07, 2001 @07:29PM (#2167497) Homepage Journal
    What we're really saying here is that XML is a very verbose protocol, and that ASN.1 isn't. But verbosity, or lack thereof, is hardly unique. Also, there is no compression claim here - only the difference in verbosity.

    ASN.1 uses integers as its symbols. Remember the protocol used for SNMP? Did you really like it? It's not too human-readable or writable.

    Also, the idea of promoting it through a consortium is rather old-fashioned.

    Bruce

  • Missing the point? (Score:2, Insightful)

    by MikeyNg ( 88437 ) <mikeyng AT gmail DOT com> on Tuesday August 07, 2001 @07:37PM (#2167547) Homepage

    Bandwidth is cheap now, but it may not be forever. Yes, we'll most likely continue to see order of magnitude increases for years and decades to come, but it'll slow down sometime.

    Also, consider wireless devices. Their bandwidth isn't there right now, and maybe with 3G we'll see a nice increase, but I can see that as a practical application for this type of compression.

    Let's also not forget that even though it's compressed, you can always uncompress it into regular old XML to actually read it and understand it, for you human folks that actually need like LETTERS and stuff! That's it. I'm just going to start writing everything in integers soon. Time to change my .sig!

  • by DougM ( 175616 ) on Tuesday August 07, 2001 @07:42PM (#2167573)
    When the web was lots of static pages and images, and bandwidth was scarce, compression made sense.

    With the current over-supply of domestic bandwidth and the move to database-driven, customised web sites, is it worth spending CPU cycles compressing small data files on-the-fly?

    Most popular websites don't suffer from poor connectivity -- they suffer from too little back-end grunt.

  • Oh crap. (Score:1, Insightful)

    by G-funk ( 22712 ) <josh@gfunk007.com> on Tuesday August 07, 2001 @07:44PM (#2167584) Homepage Journal
    This is just crap. Let's say it's two bytes and 2 bits. That means that it can only describe 2^20 different files. With 200 bytes to play with, you can have around 80^200 different xml files (80 was pulled from my ass, 2 uppercase + lowercase + symbols).

    Let's put it this way. 2.5 out of 200 is 80. that means .0125% of all 200 possible byte files, can be compressed down to 2.5 bytes, and that's providing perfect compression.

    I'm sure that with the right sample file LZH will compress it down to just a few bytes too.
  • by coyote-san ( 38515 ) on Tuesday August 07, 2001 @07:46PM (#2167599)
    This idea totally misses the point.

    ASN.1 achieves good compression because the designer must specify every single and parameter for all time. The ASN.1 compiler, among other things, then figures out that that "Letterhead, A4, landscape" mode flag should be encoded as something like 4.16.3.23.1.5, which is actually a sequence of bits that can fit into 2 bytes because the ASN.1 grammar knows exactly how few bits are sufficient for every possible case.

    In contrast, XML starts with *X* because it's designed to be extensible. The DTDs are not cast in stone, and in fact a well-behaved application should read the DTD for each session, and only extracting the items of interest. It's not an error if one site decides to extend their DTD locally, provided they don't remove anything.

    But if you use ASN.1 compression, you either need to cast those XML DTDs into stone (defeating the main reason for XML in the first place), or compile the DTD into an ASN.1 compiler on the fly (an expensive operation, at least at the moment).

    This idea is actually pretty clever if you control both sides of the connection and can ensure that the ASN.1 always matches the DTD, but as a general solution it's the wrong idea at the wrong time.
  • by Eryq ( 313869 ) on Tuesday August 07, 2001 @07:50PM (#2167615) Homepage

    XML, by virtue of being text-based, may be easily inspected and understood. Sure, it's a little bulky, but if you're transmitting something like an XML-encoded vCard versus an ASN.1 encoding of the same info, the bulk is negligible.

    Yes, for mp3-sized data streams, or real-time systems, there would be a difference. But many interesting applications don't require that much bandwidth.

    ASN.1 achieves its compactness by sacrificing transparency. Sure, it's probably straightforward enough if you have the document which says how the tags are encoded, but good documentation of anything is rare as hen's teeth, and not all software companies are willing to play nice with the developer community at large and share their standards documents. And some of them get downright nassssssty if your reverse engineer...

    Transparency is one of the reasons for the rapid growth of the Web: both HTML and HTTP were easy enough to understand that it took very little tech savvy to throw up a website or code an HTTPD or a CGI program.

    Transparency and extensibiliy also make XML an excellent archival format; so if your protocol messages contain data you want to keep around for a while, you can snip out portions of the stream and save them, knowing that 10 or 15 years from now, even if all the relevant apps (and their documentation) disappear, you'll still be able to grok the data.

  • by Anonymous Coward on Tuesday August 07, 2001 @07:59PM (#2167654)
    Of course we're talking about a compression scheme. It's just one for structured data and not for plain text files. Looking at some example XML files, they can clearly be compressed by some large amount - 2 orders of magnitude doesn't seem unreasonable when you have 40-character tag names.
  • Re:Postum primus? (Score:2, Insightful)

    by Phork ( 74706 ) on Tuesday August 07, 2001 @08:41PM (#2167903) Homepage
    well, i hate to break it to you, but you use lossy compression all the time. gif, jpeg, and mp3 are all lossy compression, as ar most other image and audio compression schemes.
  • by ttfkam ( 37064 ) on Tuesday August 07, 2001 @09:37PM (#2168150) Homepage Journal
    What if the XML document is representative of a dynamic aggregate of multiple schemas?

    Say what? Heh heh...

    Let's say you have an XHTML document (one DTD) that contains MathML (another DTD) and some SVG for good measure (third DTD). This would not be handled in your static DTD compile unless you made specific provisions for all of them in a single document. But what if the next document only has one of them used? Or two? Or includes some other one later? Are you going to compile every permutation of DTD that could ever occur?

    This is where the strength of XML is not necessarily compatible with the strengths of ASN.1.
  • by RobertGraham ( 28990 ) on Tuesday August 07, 2001 @10:29PM (#2168402) Homepage
    Preface: I've written parsers for ASN.1 (esp. SNMP MIBs, but also generic), BER/DER (same thing), PER, HTML, XML, and while we are at it, XDR and CORBA IDL. I've written a BER decoder that can decode SNMP at gigabit/second speeds.

    There are a vast number of differences between ASN.1 and XML. To think that ASN.1 is in any way related to XML demonstrates that they just don't "get it".

    1. Why not XDR or just raw binary?
    Why not just specify your own binary format for you application? The thing that the ASN.1 bigots don't understand is that in most real-world applications, the ASN.1 formatting provides only overhead but no realworld value. This happens in XML, too, but the value proposition for XML is much clearer. A good example is the H.323 series PER encoding which is just plain wrong: well-documented custom encoding would have been tons better.

    2. DTD or no DTD
    The ASN.1 language is essentially a DTD; it gets encoded in things like BER. The trick is that I can parse "well-formed" XML content without knowing the DTD. This is impossible with current ASN.1 encoding. The idea of DTD-free "well-formed" input and DTD-based "valid" input is at the core of XML. Yes, both ASN.1 and XML both format data, but proposing ASN.1 as being a valid substitute means you just don't grok what XML is all about

    3. Interoperability
    The Internet grew up in an environment that parsers should be liberal in what they receive. This was important in early interoperability, but now is a detriment. For example, it is impossible to write an interoperable HTML parser. XML took the radical zen approach of mandating that any parser that excepts malformed input is BAD. As a result, anybody writing an parser knows the input will be well-formed. There is one-and-only-one way to represent input (barring whitespace), so writing parsers is easy. ASN.1 has taken the opposite approach, there are a zillion ways to represent input.

    As a result, non-interoperable ASN.1 implementations abound. For example, most SNMP implementations are incompatible. They work only "most" of the time. Go to a standard SNMP MIB repository and you'll find that the same MIB must be published multiple times to handle different ASN.1 compilers.

    The long and the short of it is that ASN.1 implementations today are extremely incompatible with each other, whereas XML libraries have proving to extremely interoperable. Right now, XML has proven the MOST interoperable way to format data, and ASN.1 has proven to be the LEAST.

    4. Bugs
    Most XML parsers have proven to be robust, most ASN.1 parsers have proven to be buggy. You can DoS a lot of devices today by carefully crafting malformed SNMP BER packets.

    5. Security
    You can leverage ASN.1's multiple encodings to hack. For example, my SideStep program shows how to play with SNMP and evade network intrusion detection systems: http://robertgraham.com/tmp/sidestep.html [robertgraham.com] At the same time, ASN.1 parsers are riddled with buffer-overflows.

    Anyway, sorry for ranting. I think XML advocates are a little overzealous (watch carefully your possessions or some XMLite will come along and encode it), but ASN.1 is just plain wrong. The rumor is that somebody through it together as a sample to point out problems, but it was accidentally standardized. It is riddled with problems, it should be abandoned. An encoding system is rarely needed, but if you need one, pick XDR for gosh sakes.

  • by haapi ( 16700 ) on Tuesday August 07, 2001 @10:40PM (#2168442)
    Well said! A Silly Notation.1 is a hideous encoding scheme. The BER is simply ambiguous -- you don't need to send malformed packets to devices, rather simply send valid BER packets that just aren't right, but still follow the rules, and watch carnage ensue.
  • by alienmole ( 15522 ) on Tuesday August 07, 2001 @11:24PM (#2168613)
    I think you simply haven't realized quite how useful it is, in real life, for information to be human-readable. When it isn't, it becomes harder to deal with. If you've programmed anything on the web, you're certainly familiar with using "View Source" to see the final source of a page. If you use XML, you've also examined XML data that's been generated by, say, a database server.

    Contrast that with what I'm dealing with right now: I'm using JDBC to access an MS SQL Server. MS bought their SQL Server from Sybase many years ago, and inherited the binary TDS data stream protocol. As efficient as this might be, when you run into problems, you're in trouble. The TDS format is undocumented, so you can't easily determine what the problem might be, whereas a text format would be easy to debug. Anytime you have a binary protocol, you become totally reliant on the tools that are available to interpret that protocol. With text protocols, you're much less restricted.

    Another example of this is standard Unix-based email systems vs. Microsoft Exchange. Exchange uses a proprietary database for its message base, which makes it effectively inaccessible to anything but specialized tools and a poorly-designed API. If your email is stored in some kind of text format, OTOH, there are a wealth of tools that can deal with it, right down to simple old grep.

    The bottom line is that the human-readability (and writability!!) of HTML was one of the major factors in the success of the web. It's no coincidence that everything on the web, and many other successful protocols, such as SMTP, are text-based. To paraphrase your subject line, binary protocols are BAD BAD BAD.

    Calling human-readable formats "irrational" is a bit like Spock on Star Trek calling things "illogical" - what that usually really meant was that the actual logic of the situation wasn't understood. What's irrational is encoding important information, which needs to be examined by humans for all sorts of reasons that go beyond what you happen to have imagined, into a format which humans can't easily read.

    Human-readable formats and protocols will remain important until humans have been completely "taken out of the loop" of programming computers (which means not in the forseeable future).

Scientists will study your brain to learn more about your distant cousin, Man.

Working...