Slashdot is powered by your submissions, so send in your scoop

Old Protocol Could Save Massive Bandwidth 287

Posted by Hemos on Tuesday August 07, 2001 @07:18PM from the reduce-reuse-recycle dept.

GFD writes: "The EETimes has a story about a relavtively old protocol for structured information call ASN.1 could be used to compress a 200 byte XML document to 2 bytes and few bits. I wonder if the same could be done with XHTML or even regular HTML."

This discussion has been archived. No new comments can be posted.

Old Protocol Could Save Massive Bandwidth

Load All Comments

Search 287 Comments Log In/Create an Account

Comments Filter:

ASN.1 (Score:1)

by sxpert ( 139117 ) writes:

The hell with this.
This damn thing is part of the OSI thing (remember this crap that worked on paper but was hell to implement)...
It's probably the telco people trying to inflict this stuff upon us.
HTML and XML are there for a reason, the current information technologies are fast enough so that there is no need to "compress" things and that documents are human readable. Getting back into ASN.1 is going back in the past, in binary files hell.
200 bytes - 2 bytes and some bits? (Score:1)

by Saggi ( 462624 ) writes:

Forget the 200 bytes for a moment. The think about how much information could be kept in 2 bytes and some bits. Lets say 20 bits. Well if you know about information you would clearly see that this amount of information is small, no matter what the original document contained.

So to argue this is an effective protocol/technique to use, I bet there will be lots of other ways to send 20 bits of information. I really would like to see and XML document with only 20 bits of information, quite empty right?

It is not always important to look at the compression rates, unless you clearly have a bandwidth problem.

Now the strength of XML... that's an entirely other story.
Hoax. (Score:2)

by Futurepower(tm) ( 228467 ) writes:

"could be used to compress a 200 byte XML document to 2 bytes and few bits."

This is a hoax. Someone played a trick like this on Byte Magazine (before Byte quit publishing). It is amazing that the editors didn't immediately recognize the impossibility of extreme claims of compression.

I searched the comments for the word "hoax", but no one commenting here has used the word. Anyhow, it can't happen.
- Re:Hoax. (Score:1)
  
  by drnomad ( 99183 ) writes:
  
  I believe you're right. Compressing 200 bytes into 16-24 bits is superior than the Hammingway code. I've never heard of better compression than Hammingway, and this is basically used for strings, compressing XML, you need to process the hierarchial structure of XML documents into the compressed format. I couldn't find a compression spec. so, until I haven't seen that, I tend to be sceptical.
- - Re:Hoax. (Score:2)
    
    by markmoss ( 301064 ) writes:
    
    No, it is possible, if your XML document had really long-worded tags. And consisted of mostly tags and not much actual data. Er, I think you could be a little more precise than that: it's possible if the document was one 199-byte tag and 1 byte data, assuming that you've agreed on a set of no more than 256 tags. Or you could have up to 65,536 tags, but then 2 bytes would just send the tag.
    I couldn't find anything that really explained how ANS.1 works, and the specs appear to require payment, but from the apparently more knowledgeable posts on /. it appears that it substitutes binary numbers for tags and other repeated parts of messages. The substitution table is fixed in advance and it is assumed that both sender and receiver already have it. So it is only effective if the format is pretty much pre-defined and highly repetitive. Satellite telemetry is a good example. E.g., it might turn "Temperature of engine 2 nozzle, zone 4 = 65" into 2.4.6.5. Or ANS could do a pretty good job of compressing stock market prices by replacing those long corporation names with a short code -- but the exchanges long ago assigned short text codes...
    LZW (*zip) compression also uses a substitution table, but in LZW most substitutions are not predefined. The software adds to the table as needed while processing a particular file, and puts each new substitution in the compressed file. So it's flexible; if you are compressing an XML files and someone uses a new tag, word, or phrase repeatedly, LZW will just assign a new code to that string, send the full string once (per file), and every subsequent use only requires the code.
    In summary, 200 bytes to 2 bytes is B.S. or a contrived case -- about all you can do in 2 bytes is identify one string previously agreed upon, and if you ever might have to send a free-form message (even an update to the table of pre-defined strings) you're going to need at least one byte just to ID the message type. But if you have a large set of large files that are quite repetitive in both content and format, it might be possible to pre-define a substitution table for the whole set and get 100 to 1 lossless compression. But that's going to work with XML on the web only if you browse just one site whose contents meet the repetitiveness criteria...
    By the way, I have seen 98% (50-1) compression using PKZIP. This was on AutoCAD DXF files, which is a remarkably bloated ASCII format representing CAD drawings. And it takes several megabytes before the compression becomes that good. You might get over 90% compression on XML if the files are big enough, but you really shouldn't put that much on one web page.
Postum primus? (Score:3, Funny)

by hivolt ( 468311 ) writes: on Tuesday August 07, 2001 @07:22PM (#2167439)

Sounds like a lossy compression program I heard about early April....it could compress to 0 bytes, if I remember correctly.

Share
twitter facebook
- Lossy-soft! (Score:4, Funny)
  
  by D. Mann ( 86819 ) writes: on Wednesday August 08, 2001 @12:48AM (#2149820) Homepage
  
  Why, that sounds like LossySoft! Compress gigabytes of files to bits!
  
  An excerpt from LampreySoft's page:
  After a typical LossySoft HSV compression cycle you achieve a 16:1 compression ratio, or
  
  9 gigabytes = approx 600 megabytes. You've compressed your data on your very expensive hard drive into a size that will fit on an average 2 gigabyte hard drive with PLENTY of room to spare.
  
  Here's where the REAL excitement comes in - let's run the compression cycle TEN TIMES!
  
  Cycle Size in bytes
  
  9,663,676,416 (9 gigs, it takes a huge hard drive to hold)
  603,979,776 (approx 600 megs, fits on an Iomega Jaz disk, a Syquest SyJet disk, or a CD-R)
  37,748,736 (approx 35 megs, fits on an Iomega Zip disk, a Syquest Ezflyer disk, or a LS-120 disk)
  2,359,296 (approx 2 megs, transfers fairly quickly on a 28.8K or faster modem)
  147,456 (approx 150K, fits on all current removable media)
  9,216 (9K - wow!)
  576 (just over HALF a K!)
  36 (that's BYTES, folks!)
  2.25 (incredible, isn't it?)
  0.140625 (AMAZING!)
  Current technology can't split bytes very well, so the minimum you can compress any disk to is 1 bit.
  
  (Note: future LampreySoft products will use advanced features of quantum mathematics to reduce the lowest unit of information measure to sub-bit levels)
  
  LossySoft! [smart.net]
  
  Parent Share
  twitter facebook
- Re:Postum primus? (Score:2)
  
  by gallir ( 171727 ) writes:
  
  If it can compress to 0 bits, not only we can save lot of bandwidth transferring those 0 bytes, but also lot faster. Light-speed is only a limit if the transfered "thing" convey information, so we don't have such a limit.
  Errr... just realised that most /. posts can be also transferred at higher speeds.
  PS: did that information appear in early April? I missed it.
- Re:Postum primus? (Score:2)
  
  by NonSequor ( 230139 ) writes:
  
  Your Latin is incorrect. "Primus" should agree with "postum." It should be "postum primum."
- - Re:Postum primus? (Score:1)
    
    by Monte ( 48723 ) writes:
    
    Why the hell would i want a lousy compression format?
    
    I appreciate your effort, I really do, but any attempt at humor on Slashdot based on misspelling is doomed from the start, as the replies readily indicate.
    
    The audience just isn't ready for this sort of thing. Sorta like Dennis Miller on Monday Night Football.
    - Re:Postum primus? (Score:2)
      
      by Stephen Samuel ( 106962 ) writes:
      
      I guess it comes from being programmer types.... Misspell one variable, and you could end up crashing your martian probe (but we all know that it died for other reasons -- Except for mark who seems to have changed a strange color).
  - Re:Postum primus? (Score:2, Insightful)
    
    by Phork ( 74706 ) writes:
    
    well, i hate to break it to you, but you use lossy compression all the time. gif, jpeg, and mp3 are all lossy compression, as ar most other image and audio compression schemes.
    - Re:Postum primus? (Score:2)
      
      by ncc74656 ( 45571 ) writes:
      
      well, i hate to break it to you, but you use lossy compression all the time. gif, jpeg, and mp3 are all lossy compression
      
      Um...it looks like nobody else has already mentioned it, so I'll say it: GIF is not a lossy compression method. It uses LZW (has the patent on this run out yet?) to achieve lossless compression.
      - Re:Postum primus? (Score:2, Informative)
        
        by Anonymous Coward writes:
        
        LZW is lossless, and GIF isn't lossy in the normal cumulative sense, but since most images are naturally produced using more than 2^8 distinct colors, the first quantization does lose a great deal of information. (Apparently some people claim the GIF spec allows multiple palettes and thus more colors, but since this is in dispute I wouldn't count on it working.)
        I don't read AC posts...want to be heard? Grow a set and log in.
        
        Truth doesn't vary with the speaker. Identity is only useful for bigots.
    - - Re:Postum primus? (Score:2, Funny)
        
        by Phork ( 74706 ) writes:
        
        cock smoker? wtf is that supposed to mean? How would you go about smoking a cock? the only way i can think of is cut it off and put it in a bong or pipe, and i dont even know how well it would burn, you would proably have to dry it first.
They don't build 'em like they used to. (Score:3, Interesting)

by pjbass ( 144318 ) writes: on Tuesday August 07, 2001 @07:24PM (#2167458) Homepage

When you look at it, it's pretty cool to see that protocols that go back many years (Ethernet for example) just keep coming back with positive results, and scale way beyond what they were ever intended for in their respective RFC. What happened to most current protocols developed recently? Exchange is one that comes to mind...

Share
twitter facebook
- Re:They don't build 'em like they used to. (Score:3, Interesting)
  
  by Garpenlov ( 34711 ) writes:
  
  What happened to most current protocols developed recently? Exchange is one that comes to mind...
  
  I'm not sure what protocol you're referring to when you say Exchange. Are you talking about, perchance, Microsoft Exchange Server? The one that uses X.400 for site-to-site communication? The X.400 that uses ASN.1 encoding?
mod_gzip ? (Score:4, Informative)

by AdamInParadise ( 257888 ) writes: on Tuesday August 07, 2001 @07:26PM (#2167480) Homepage

Ever heard of mod_gzip? It compress anything that goes trough your Apache webserver and it is supported by most browsers. With everything running over http theses days, this is the way to go...

Share
twitter facebook
- Re:mod_gzip ? (Score:3, Informative)
  
  by Nemesis][ ( 21247 ) writes:
  
  Yes, it's a very welcome and needed addition to a "bloated" protocol. But just be aware of some possible drawbacks when using it.
  
  It dosn't work with SSL easily. See this [over.net] thread if curious. I ran into this when I wanted to force Open Webmail [ncku.edu.tw] to use https only and found the pages were not getting compressed.
  
  And take note of possible problems [over.net] with caching proxies serving pages to browsers that can't handle it.
  
  It has a few other quirks, but overall I for one am quite satisfied with it.
  Curious about the savings it brings? Use this [krakow.pl].
  
  Machines are always broken till the repairman comes.
- Re:mod_gzip ? (Score:2)
  
  by tshak ( 173364 ) writes:
  
  Ever heard of mod_gzip? It compress anything that goes trough your Apache webserver and it is supported by most browsers. With everything running over http theses days, this is the way to go...
  
  First of all, this seems a bit off topic. Second, you can read about HTTP compression on the W3C website [w3.org]. It's definatly not a HUGE impact (and has some bugs with certain browsers base on my own tests). Finally, AFAIK, ALL major web servers have this built in as it is part of the HTTP1.1 spec. Nothing to see here, move on please :).
  - Re:mod_gzip ? (Score:1)
    
    by Jebediah21 ( 145272 ) writes:
    
    First of all, how the hell is mod_gzip being mentioned in a bandwidth saving setting offtopic? Second, it depends on what kind of files you are serving. Text files will get better compression than tarballs. Thirdly I would like to know more about "some bugs with certain browsers base on my own tests" that you report. I would also like to know what idiot modded the previous post up.
- - - Re:mod_gzip ? (Score:1)
      
      by djocyko ( 214429 ) writes:
      
      you'd be suprised. Search traffic for gnutella is inhibitingly huge. when you've got hundreds of copies of your query running through the network, along with the answers, along with everyone elses going, you can get up to the 10's of Megs /sec easy. compressing the queries would be a huge help.
- - Bandwidth or CPU? (Score:1)
    
    by kimihia ( 84738 ) writes:
    
    It's a case of what you want to optimise for.
    
    Do you want to save CPU? (An issue on heavily loaded sites with oodles of cheap bandwidth.) Continue as you are without mod_gzip.
    
    Do you want to save bandwidth? (An issue with expensive bandwidth.) Then sure, use mod_gzip and convert some of that CPU into bandwidth savings.
    
    This is only thinking about the server end of things. On the other end of the connection is a user who also has limited bandwidth and CPU available.
    
    So it varies. Athlon 800 serving huge text files on a 56K modem? mod_gzip. P90 dishing out 1x1 GIFs? Leave it as is.
    
    One example of this CPU vs bandwidth I came across was when I was scp'ing a file across a Fast Ethernet (100MB) network. On one end was a K6/200, and the transfer was taking ages! Then I realised I had told SSH to compress data. It was eating CPU like crazy! So I stopped the transfer, and left off the compression flag. It went about three times faster.
Hello, haven't we read Comer's book? (Score:4, Interesting)

by Karpe ( 1147 ) writes: on Tuesday August 07, 2001 @07:27PM (#2167487) Homepage

I believe it was Internetworking with TCP/IP, or perhaps Tanenbaum's Computer Networks, and the "conclusion" of the chapter on SNMP (which uses ASN.1) was that today, it is much more important to make protocols that are simple to handle, than stuff that conserves bandwidth at the price of performance, since the "moore's law for bandwidth" is stronger than the "moore's law for cpu power". You could use (and already uses) compressed communication links, anyway.

This is the same philosophy of IP, ATM, or any modern network technology. Simple, but fast.

Share
twitter facebook
- Re:Hello, haven't we read Comer's book? (Score:3, Informative)
  
  by isdnip ( 49656 ) writes:
  
  I've done real quantitative studies on the topic, and quite frankly you got it wrong. Moore's Law (for CPU power) is far stronger than "Moore's Law for bandwidth". Bandwidth growth has been on the order of 30-40%/year, while CPU power has grown faster than that for at least two decades.
  
  ASN.1 is well known outside of the IETF fundamentalist crowd. With its PER (packed encoding rules), it is very efficient of bandwidth and not all that CPU intensive either. Nor is it difficult, if used correctly (and anything can be tough if used wrong). It's a simple tag-length-value notation which can recurse. The only reason the Internet doesn't use it more is the usual NIH.
  - Re:Hello, haven't we read Comer's book? (Score:2)
    
    by Zeinfeld ( 263942 ) writes:
    
    ASN.1 is well known outside of the IETF fundamentalist crowd.
    Always nice to start with a nice Ad Hominem jibe. I'll try one myself "ASN.1 is supported mainly by the failled has-beens who designed OSI".
    With its PER (packed encoding rules), it is very efficient of bandwidth and not all that CPU intensive either.
    Utterly misleading. ASN.1 encoding rules are relatively simple, the data model is the big smelly dung heap to be avoided. Although the encoding rules are 'simple' the Derranged Encoding Rules (DER) used in X.509 require multiple recursive passes through the data structure to encode it.
    The only reason the Internet doesn't use it more is the usual NIH.
    On the contraty, several IETF protocols have used ASN.1 and the experience has been pretty miserable. The biggest problem being that ISO keeps tweaking the spec in ways that break existing implementations. ASN.1 is simply too much of a pain in the ass for the limited advantage it provides.
    The group's attempt to claim ASN.1 as the savior of HTTP is ignorant and stupid. There have been many proposals to compress HTTP headers and ASN.1 is actually one of the worst performers on both overhead and performance. The reason none of the proposals have gone anywhere is that there is no point in a backwards-incompatible change that saves 100 bytes or so on the headers if you don't do something about compressing the body. The biggest mistake we made in HTTP was not putting a simple huffman coding compression algorithm for ASCII text into the server and browsers. Actually the reason we didn't get arround to it was that nobody wanted to mess arround with the patent minefield.
    Still it is always easier to explain that the reason the world is not using your idea is because they are stupid and ignorant and not because your idea is stupid and ignorant. In the case of ASN.1 the idea is a good one but the execution if third or fourth rate at best.
  - Re:Hello, haven't we read Comer's book? (Score:2)
    
    by dublin ( 31215 ) writes:
    
    This is ridiculous: The "IETF crowd" has been proven right time and time again. And they are well aware of the horrors that await in ASN.1 and other relics of OSI stupidity. Read Marshall Rose's books for more insight on this: "The Simple Book" is a good treatment of ASN.1 and SNMP, "The Internet Message" rails on (quite correctly) about the indescribable stupidity of X.400 mail, another OSI idiocy.
    
    If you didn't live through those horrible days when the trendy crowd was all for OSI and claiming that OSI was the One True Way and would and should eliminate the scourge of the Internet and TCP/IP from the face of the earth, then you really don't get the evil of ASN.1 and its ilk...
  - Re:Hello, haven't we read Comer's book? (Score:2)
    
    by Karpe ( 1147 ) writes:
    
    Check the graphs in this page [freerepublic.com]. Altough this is not a complete reference, the same data, suggesting the bandwidth of opctical fibers to be growing faster that doubling every 18 months can be found in many other articles.
    
    But I agree that a generalization of fiber capacity to bandwidth must be done with extreme caution.
bandwidth is cheap (Score:2, Insightful)

by Proud Geek ( 260376 ) writes:

So who cares about compression. Personally, I'd much prefer the open and obvious standards of XML to some obfuscated form. Data is confusing enough already; at least XML gives a clear description that I can use with a packet sniffer when trying to debug something.
- bandwidth is cheap? On what planet? (Score:2)
  
  by Carnage4Life ( 106069 ) writes:
  
  So who cares about compression. Personally, I'd much prefer the open and obvious standards of XML to some obfuscated form. Data is confusing enough already; at least XML gives a clear description that I can use with a packet sniffer when trying to debug something.
  
  You're kidding right? Most CS people I know cringe at the fact that XML can more than double the size of a document with largely redundant tags. The only thing to be thankful for is that the documents typically compress very well due to the large number of redundant tags and that HTTP 1.1 supports compression especially know that XML over HTTP (i.e. web services) is being beaten to death by a lot of people in the software industry. Numerous [xml.com] articles [irt.org] about [att.com] XML compression [xml.com] also tend to disagree with you that it is not an issue.
  
  PS: If bandwidth is so cheap how come DSL companies are going out of business and AOL owns Time Warner? This would tend to imply that low bandwidth connections are still the order of the day.
  - Re:bandwidth is cheap? On what planet? (Score:2)
    
    by tcc ( 140386 ) writes:
    
    DSL companies are going out of buisness because of poor planning poor support poor service [and the list goes on...] Many dialup isp died too or got bought out, so your logic doesn't apply.
  - Re:bandwidth is cheap? On what planet? (Score:2)
    
    by Garpenlov ( 34711 ) writes:
    
    PS: If bandwidth is so cheap how come DSL companies are going out of business
    
    DSL companies are going out of business because... bandwidth is so cheap. And it's their own fault.
    
    and AOL owns Time Warner? This would tend to imply that low bandwidth connections are still the order of the day.
    
    Why? Are you saying AOL=dialup, and Time-Warner=cable? There's a LOT more to both of those companies than either of those two things...
- Re:bandwidth is cheap (Score:4, Informative)
  
  by Jeffrey Baker ( 6191 ) writes: on Tuesday August 07, 2001 @07:56PM (#2167641)
  
  at least XML gives a clear description that I can use with a packet sniffer when trying to debug something.
  
  Translated:
  My debugging tools are inadequate, and my brain is inadequate for improving them.
  
  You have a powerful, general-purpose computer at your disposal. Why should you care if the protocol can be inspected with the naked eye? Do you use an oscilloscope to pretty-print IP packets? No, you use ethereal [ethereal.com]! If XML is encoded using ASN.1, then the tools will be modified to decode ASN.1 before showing it to the human. Ethereal already knows about ASN.1 [ethereal.com] because it uses it to display LDAP traffic. If you don't like ethereal, try Unigone [unigone.com].
  Use your CPU, not your eyeballs!
  
  Parent Share
  twitter facebook
  - - Re:bandwidth is cheap (Score:2)
      
      by Jeffrey Baker ( 6191 ) writes:
      
      The subject under discussion here is using ASN.1 as a transfer encoding for XML. You still have XML as text at each endpoint, and you can still use Perl, diff, and CVS to manipulate the data. You simply use ASN.1 to encode the data in flight to spare some bandwidth, and I don't see much to object to there.
- Re: Leave compression to the hardware (Score:2)
  
  by willy_me ( 212994 ) writes:
  
  I agree, leave XML uncompressed. Let modems compress the data - it might not be as efficient but it keeps things simple.
  Willy
- Re:bandwidth is cheap (Score:2)
  
  by Guppy06 ( 410832 ) writes:
  
  "bandwidth is cheap"
  I'm typing this over a 56k connection. If I want faster in this area, I can either pay for a leased line, an ISDN line, or a satellite connection. If these options are cheap, could you buy me one please?
ASN.1 "compression" vs XML (Score:3, Insightful)

by Bruce Perens ( 3872 ) writes: <bruce@perens.com> on Tuesday August 07, 2001 @07:29PM (#2167497) Homepage Journal

What we're really saying here is that XML is a very verbose protocol, and that ASN.1 isn't. But verbosity, or lack thereof, is hardly unique. Also, there is no compression claim here - only the difference in verbosity.
ASN.1 uses integers as its symbols. Remember the protocol used for SNMP? Did you really like it? It's not too human-readable or writable.
Also, the idea of promoting it through a consortium is rather old-fashioned.
Bruce

Share
twitter facebook
- Re:ASN.1 "compression" vs XML (Score:2)
  
  by Trepidity ( 597 ) writes:
  
  Also, the idea of promoting it through a consortium is rather old-fashioned.
  
  I always thought there was a reason the X windowing system seemed a bit old-fashioned...
- Re:ASN.1 "compression" vs XML (Score:2)
  
  by Jeffrey Baker ( 6191 ) writes:
  
  Bruce, I had to flame the guy a few posts up from you, but he has a 6-digit slashdot userid. Nobody cares how obtuse the wire encoding is because here in the Cenozoic era, we have learned to walk upright and also to use labor-saving software to analyze our protocols. My favorite is ethereal [ethereal.com] but you might like to browse some others [appwatch.com].
  - Re:ASN.1 "compression" vs XML (Score:2)
    
    by Bruce Perens ( 3872 ) writes:
    
    Actually, I can't say I'm a big fan of XML either. It seems to me that it's a good deal more verbose than it needed to be.
    Regarding ASN.1, Yes, there are tools to make this easier. I do still find it more difficult to code and test. And in general my development time is more expensive than bandwidth. That probably applies to most people.
    Thanks
    Bruce
- Re:ASN.1 "compression" vs XML (Score:2)
  
  by unitron ( 5733 ) writes:
  
  If you hadn't changed your sig I wouldn't be catching so much flack about mine. :-)
- Re:ASN.1 "compression" vs XML (Score:2)
  
  by tswinzig ( 210999 ) writes:
  
  Yeah, but how do we know you're the REAL Bruce Perens?
Multimedia? (Score:3, Interesting)

by starseeker ( 141897 ) writes: on Tuesday August 07, 2001 @07:32PM (#2167517) Homepage

Isn't most of the bandwith on the internet is consumed by multimedia - images, music files, and the odd video? I have seldom encountered an html file larger than a meg, and even those are in my experience very rare.

Yes, it would be nice to make the internet move faster with current technology, and I would support this for people on very slow connections. It might also be a boon for servers that get hit hard and often (though I doubt it would stop the Slashdot effect ;-) For the majority of single use internet concerns, however, I just don't see this doing a whole lot.

Of course, I hope I'm wrong. More effective bandwith is a Good Thing.

Share
twitter facebook
- Re:Multimedia? (Score:2, Funny)
  
  by Sir Robin ( 9082 ) writes:
  
  I have seldom encountered an html file larger than a meg, and even those are in my experience very rare.
  
  You've obviously never saved a 5k Word doc in HTML. *sigh*.
ASN.1 not suitable (Score:5, Informative)

by cartman ( 18204 ) writes: on Tuesday August 07, 2001 @07:33PM (#2167523)

ASN.1 is the basis of a great many protocols, LDAP among them. What is not mentioned in the article is that ASN.1 is a binary protocol and is therefore not human-readable. It may save space for bandwidth-constrained applications. However, bandwidth has a tendency to increase over time. When all wireless handhelds have a megabit of bandwidth, we would sorely regret being tied to ASN.1, as LDAP regrets it now.

Not to mention, ASN.1 does not generally reduce the document size by more than 40% compared to XML. Think about it: how much space is really taken by tags?

It's also worth noting that there is lots of documentation surrounding XML. With ASN.1 you have to download the spec from ITU which is an INCREDIBLY annoying organization and their specs are barely readable and they charge money to look at them, despite the fact that they are supposedly an open organization. The IETF and the W3C are actually open organizations; ITU just pretends to be. ITU does whatever it can to restrict the distribution of their specifications.

Share
twitter facebook
- Re:ASN.1 not suitable (Score:5, Informative)
  
  by pegacat ( 89763 ) writes: on Tuesday August 07, 2001 @08:08PM (#2167720) Homepage
  
  This is pretty much right. I do a lot of work on X500 / ldap / security, and ASN1 is used throughout all this. It does a pretty good job, but as the poster points out, the ITU is a completely brain damaged relic of the sort of big company old boys club that used to make standards. It's very difficult to get info out of them. (Once you get it though, it's usually pretty thorough!)
  
  As for the 'compression', well, yes, it sorta would be shorter under many circumstances. ASN1 uses pre-defined 'global' schema that everyone is presumed to have. Once (!) you've got that schema, subsequent messages can be very terse. (Without the schema you can still figure out the structure of the data, but you don't know what its for). For example, I've seen people try to encode X509 certificates (which are ASN.1) in XML, and they blow out to many times the size. Since each 'tag equivalent' in ASN.1 is a numeric OID (object identifier), the tags are usually far shorter than their XML equivalents. And ASN.1 is binary, whereas XML has to escape binary sequences (base64?).
  
  But yeah, ASN.1 is a pain to read. XML is nice for humans, ASN1 is nice for computers. Both require a XML parser/ ASN.1 compiler though. ASN.1 can be very neat from an OO point of view, 'cause your ASN.1 compiler can create objects from the raw ASN.1 (a bit like a java serialised object). But I can't see ASN.1 being much chop to compress text documents, there are much better ways of doing that around already (and I thought a lot of that stuff was automatically handled by the transport layer these days?)
  
  And just for the record... the XML people grabbed a bunch of good ideas from ASN.1, which is good, and LDAPs problems are more that they screwed up trying to do a cut down version of X500, than that they use ASN.1 :-)!
  
  Parent Share
  twitter facebook
  - Re:ASN.1 not suitable (Score:3, Interesting)
    
    by vsync64 ( 155958 ) writes:
    
    Once (!) you've got that schema, subsequent messages can be very terse. (Without the schema you can still figure out the structure of the data, but you don't know what its for).
    
    Heh. How is this different from XML?
    I'm always amused by people that assume XML will be the magic lingua franca of the Internet and everyone will be able to parse every last bit of meaning out of your document just because it's encased in <handwaving><readable by="human"><tags /></readable></handwaving> without ever agreeing on any of those nasty "standards" things. Guess what, people: until we have a solution to the strong AI problem, human readable don't mean squat.
    - Re:ASN.1 not suitable, but XML is still good (Score:2)
      
      by miniver ( 1839 ) writes:
      
      I'm always amused by people that assume XML will be the magic lingua franca of the Internet and everyone will be able to parse every last bit of meaning out of your document just because [it's human-readable] without ever agreeing on any of those nasty "standards" things.
      Apparently you've never had to write a parser for EDI [everything2.com], or any other binary data interchange format.
      
      I'm not going to claim that XML is a magic bullet for data interchange -- but I will attest that human-readable data formats are superior to binary formats when it comes to data interchange. I have lost track of the number of custom parsers I've had to write over the last 15+ years in order to convert data from one system to another, simply because the systems in question didn't have a shared data format. The big wins for XML are that (1) you can visually inspect your before-and-after results, (2) you don't have to write the parser, even if you have to write code to call it, (3) there are actually two sensible APIs to match two very different ways to look at the data, each of which is parser independent, and best of all (4) if you don't have documentation for the schema (or it's misimplemented), you still have a prayer of interpreting the data correctly.
      
      Anyone who's ever had to write an EDI application will *instantly* understand the appeal of XML.
      - Re:ASN.1 not suitable, but XML is still good (Score:2)
        
        by miniver ( 1839 ) writes:
        
        My point is - if you're going to compare XML with something, why choose - of all things - EDI??!?
        Because both formats are supposed to be good for data interchange, and only one of them really is -- XML. With EDI, the standard had to be so all-encompassing that one group of programmers would read the spec one way, and one another way, and so you could spend months trying to correctly interpret data that was "standard".
- Those who do not undestand ASN.1 .... (Score:3, Informative)
  
  by StormyMonday ( 163372 ) writes:
  
  are condemned to repeat it. Badly.
  
  I have had to deal with dozens of binary protocols that do the same thing as ASN.1, and do it worse.
  
  As to comparisons, XML and ASN.1 are designed for different jobs. Designing a Web page in ASN.1 would be ridiculous. Sending (say) telemetry data encoded in XML is equally ridiculous. I can believe that *data* transmissions could be 100 times larger in XML than in ASN.1. You have the header, DTD, some namespace delcarations, and a bunch of nested tags, just to express a couple of numbers.
  
  Problem is, XML is one of the latest forms of fairy dust that Management has latched onto. "Sprinkle this on your project and it will fly!" So programs have XML grafted onto them anywhere it might fit.
  
  A particularly cute example is SOAP (Microsoft's firewall-bypass protocol) It's going to be fun to watch people try to squeeze some performance out of a SOAP based system that tries to do something interactive.
  
  As to the ISO, yeah, they're seriously obnoxious. They tend to go off into their own little world, redefine standard terminology so they're incomprehensible to outsiders, and come up with stuff that can't be implemented. (Nobody uses ASN.1 -- it's unimplementable. When people talk about using ASN.1 for something real, they're talking about a subset. A subset, of course, cannot claim conformance to the standard.) The crowning insult, of course, is that they fund the organization by selling the standards. Hey, it's a standard -- you *have* to buy it!
  
  "It's all in knowing what wrench to use to pound in the screw."
  - That's funny. (Score:1)
    
    by dave-fu ( 86011 ) writes:
    
    I could've sworn I saw something on the W3C about SOAP [w3.org]?
    I don't see what's so bad about judiciously applied XML. If you'd like to piddlefart around with obscure offsets and byte counts in binary transfers, knock yourself out. XML doesn't bloat transmissions up that much (argue about node overhead, then remember filler columns) and every machine in existence speaks text.
    Of course it's not all things for all people, but in the right place at the right time, it's just fine.
  - Re:Those who do not undestand ASN.1 .... (Score:2)
    
    by Gleef ( 86 ) writes:
    
    StormyMonday writes:
    
    Problem is, XML is one of the latest forms of fairy dust that Management has latched onto. "Sprinkle this on your project and it will fly!" So programs have XML grafted onto them anywhere it might fit.
    
    XML is no magic bullet; however, that doesn't change the fact that it is incredibly useful in many different circumstances. XML, realistically used, can make some projects simpler, and data transfers much more comprehensible.
    
    A particularly cute example is SOAP (Microsoft's firewall-bypass protocol) It's going to be fun to watch people try to squeeze some performance out of a SOAP based system that tries to do something interactive.
    
    SOAP, XML-RPC and similar protocols are designed for generic, highly interoperable, communications, not performance. Anybody who expects blinding performance out of an XML encoded procedure call shouldn't be programming. You want performance, use a custom protocol, or at least CORBA. SOAP is for when you can sacrifice performance to gain interoperability.
    
    I'd even go a step farther: anything that can be done using an XML-based data format can be done smaller and faster by some other design. However, as machines get larger, faster and cheaper, getting that last bit of performance becomes less and less important for most computing tasks. XML is great for tasks that don't need every last ounce of speed. Save the custom-tuned binary formats and protocols for the few apps that really need them.
- Binary Bits (Score:2)
  
  by fm6 ( 162816 ) writes:
  
  ...bandwidth has a tendency to increase over time. When all wireless handhelds have a megabit of bandwidth, we would sorely regret being tied to ASN.1, as LDAP regrets it now.
  Not to mention, ASN.1 does not generally reduce the document size by more than 40% compared to XML. Think about it: how much space is really taken by tags?
  
  I share your dislike of unnecessary bit squishing. But I have to pick some nits.
  First, you shouldn't assume that available bandwidth will steadily increase. It will take some major breakthroughs -- not just technical, but political and economic -- before there's a megabit internet connection every place where it might be useful. And wide-area wireless networking is in an even worse state. Not to mention that radio spectrum is a finite resource.
  Your point about tags is well-taken. But you can compress the content too. Using 8 bits for every character is very inefficient, especially considering that there are only 128 characters to represent. With the right scheme, you could certainly get the average character width to somewhere between 4 and 5 bits.
ASN.1 was designed to be efficient (Score:4, Informative)

by Anonymous Coward writes: on Tuesday August 07, 2001 @07:33PM (#2167526)

If I remember the history right, ASN.1 was designed during the era of X.25 and charging for every 64 byte packet. I used to use ASN.1 for remote communications in a commercial product, but later changed it to a hybdrid of CORBA and XML, mostly due to more modern techologies, and since the actual bandwith did not cost that much anymore, it did not make sense to keep an old protocol alive. ASN.1 has it's drawbacks too--8 different ways to encode a floating point number. It was a political reason, because everyone involved wanted their own floating point format included, and as a net result, everyone has to be able to decode 8 different formats. A encoding designed by a committee (a stoneage telcom committe as a matter of fact).

Share
twitter facebook
Missing the point? (Score:2, Insightful)

by MikeyNg ( 88437 ) writes:

Bandwidth is cheap now, but it may not be forever. Yes, we'll most likely continue to see order of magnitude increases for years and decades to come, but it'll slow down sometime.

Also, consider wireless devices. Their bandwidth isn't there right now, and maybe with 3G we'll see a nice increase, but I can see that as a practical application for this type of compression.

Let's also not forget that even though it's compressed, you can always uncompress it into regular old XML to actually read it and understand it, for you human folks that actually need like LETTERS and stuff! That's it. I'm just going to start writing everything in integers soon. Time to change my .sig!
HTML could be compressed (Score:2, Flamebait)

by Restil ( 31903 ) writes:

What you would lose is the readability. Any symbol in an html file could be reduced to a byte or less depending on the total number of symbols used. Consider a 80 character line of text with
each character a different color. For each character you'd need data approxately equal to:

a

This entire sequence could be compressed into 4 bytes or less, but you would require an html compiler instead of coding it by hand (unless you're one of those crazy people that prefer coding opcodes straight over using C).

The issue with html, and the reason why we don't worry about the inefficiency much is the fact that you could have a rather extensive html file with one link to a single picture, and that picture would easily take up the space of the entire html file.

-Restil
- Re:HTML could be compressed (Score:1)
  
  by supersnail ( 106701 ) writes:
  
  If you are using a modem with a "V54" or "Vnn".
  If anywhere in the network two CISCO or two NORTEL routers are talking to each other, if your backbone provider is reasonably competant and wants to make money.
  Then your web traffic is already being compressed.
  One of the great things about HTML and XML is that it compresses really easily using comparitively simple compression algorithms.
  So any effort you put in "compressing" XML traffic is wasted as your network hardware would probably have done it anyway.
Bandwidth Versus Computational Effort (Score:2, Insightful)

by DougM ( 175616 ) writes:

When the web was lots of static pages and images, and bandwidth was scarce, compression made sense.
With the current over-supply of domestic bandwidth and the move to database-driven, customised web sites, is it worth spending CPU cycles compressing small data files on-the-fly?
Most popular websites don't suffer from poor connectivity -- they suffer from too little back-end grunt.
- Re:Bandwidth Versus Computational Effort (Score:2)
  
  by donglekey ( 124433 ) writes:
  
  Imagine starting your own website. When you are paying for bandwidth on a site that has a >100KB front page (like slashdot on my configuration) then it is definitly worth it. Not everyone is on broadband and many people won't be for a long long time. Saving bandwidth is always good, whatever the situation. And besides, many many page serves can be had (10,000 a day) off a very inexpensive computer (K6-2 400 Mhz) even on a complex website (scoop driven).
ASN.1 resources on the web. (Score:3, Informative)

by gd23ka ( 324741 ) writes: on Tuesday August 07, 2001 @07:43PM (#2167577) Homepage

Actually ASN.1 is a formal way of specifying how to encode data into binary representations like BER, CER, DER and PER which do save bandwidth compared to XML.

Those of you that want to find out more about ASN.1, can pick up free e-books on ASN.1 here [oss.com]. There's some blatant propaganda in them for OSS Nokalva's ASN.1 compiler, but of course there's also snacc [gnu.org], an GPL'd open source ASN.1 compiler. Snacc however only generates code for encoding to BER, so you might also want to check out the a hacked version [qut.edu.au] of snacc from Queensland University of Technology.

ASN.1 is a base technology for a lot of standards out there like X.509, PKCS and LDAP, the OSI application layer protocols etc.

Share
twitter facebook
Reverse Engineer hax0r3d! (Score:4, Funny)

by TroyFoley ( 238708 ) writes: on Tuesday August 07, 2001 @07:46PM (#2167596) Homepage Journal

I figured it out. They do it by removing the data pertaining to popup/popunder banners! 100 to 1 ratio seems about right.

Share
twitter facebook
Totally misses the point (Score:5, Insightful)

by coyote-san ( 38515 ) writes: on Tuesday August 07, 2001 @07:46PM (#2167599)

This idea totally misses the point.

ASN.1 achieves good compression because the designer must specify every single and parameter for all time. The ASN.1 compiler, among other things, then figures out that that "Letterhead, A4, landscape" mode flag should be encoded as something like 4.16.3.23.1.5, which is actually a sequence of bits that can fit into 2 bytes because the ASN.1 grammar knows exactly how few bits are sufficient for every possible case.

In contrast, XML starts with *X* because it's designed to be extensible. The DTDs are not cast in stone, and in fact a well-behaved application should read the DTD for each session, and only extracting the items of interest. It's not an error if one site decides to extend their DTD locally, provided they don't remove anything.

But if you use ASN.1 compression, you either need to cast those XML DTDs into stone (defeating the main reason for XML in the first place), or compile the DTD into an ASN.1 compiler on the fly (an expensive operation, at least at the moment).

This idea is actually pretty clever if you control both sides of the connection and can ensure that the ASN.1 always matches the DTD, but as a general solution it's the wrong idea at the wrong time.

Share
twitter facebook
- Re:Totally misses the point (Score:2)
  
  by interiot ( 50685 ) writes:
  
  The ASN.1 compiler only has to run when the DTD changes, if the compiler can output a program that converts XML to ASN.1.
  - Re:Totally misses the point (Score:2, Insightful)
    
    by ttfkam ( 37064 ) writes:
    
    What if the XML document is representative of a dynamic aggregate of multiple schemas?
    
    Say what? Heh heh...
    
    Let's say you have an XHTML document (one DTD) that contains MathML (another DTD) and some SVG for good measure (third DTD). This would not be handled in your static DTD compile unless you made specific provisions for all of them in a single document. But what if the next document only has one of them used? Or two? Or includes some other one later? Are you going to compile every permutation of DTD that could ever occur?
    
    This is where the strength of XML is not necessarily compatible with the strengths of ASN.1.
Missing the point as to why XML is good (Score:4, Insightful)

by Eryq ( 313869 ) writes: on Tuesday August 07, 2001 @07:50PM (#2167615) Homepage

XML, by virtue of being text-based, may be easily inspected and understood. Sure, it's a little bulky, but if you're transmitting something like an XML-encoded vCard versus an ASN.1 encoding of the same info, the bulk is negligible.
Yes, for mp3-sized data streams, or real-time systems, there would be a difference. But many interesting applications don't require that much bandwidth.
ASN.1 achieves its compactness by sacrificing transparency. Sure, it's probably straightforward enough if you have the document which says how the tags are encoded, but good documentation of anything is rare as hen's teeth, and not all software companies are willing to play nice with the developer community at large and share their standards documents. And some of them get downright nassssssty if your reverse engineer...
Transparency is one of the reasons for the rapid growth of the Web: both HTML and HTTP were easy enough to understand that it took very little tech savvy to throw up a website or code an HTTPD or a CGI program.
Transparency and extensibiliy also make XML an excellent archival format; so if your protocol messages contain data you want to keep around for a while, you can snip out portions of the stream and save them, knowing that 10 or 15 years from now, even if all the relevant apps (and their documentation) disappear, you'll still be able to grok the data.

Share
twitter facebook
ASN.1 -- excellent choice (Score:4, Informative)

by ciurana ( 2603 ) writes: on Tuesday August 07, 2001 @08:02PM (#2167672) Homepage Journal

Some people in this forum think that ASN.1 is a replacement for XML; others think of it as a "lossy" compression algorithm. ASN.1 is neither. Read the article and learn a bit about ASN.1 before forming an opinion. Most important, ASN.1 has been an interoperability standard for at least 10 years prior to the introduction of XML.

ASN.1 is a standard interoperability protocol (ISO IS 8824 and 8825) that defines a transfer syntax irrespective of the local system's syntax. In the scenario described in the article, the local syntax is XML and the transfer syntax is ASN.1. ASN.1 is a collection of data values with some meaning associated with them. It doesn't specify how the values are to be encoded. The semantics of those values are left to the application to resolve (i.e. XML). ASN.1 defines only the transfer syntax between systems.

ASN.1 codes are defined in terms of one or more octets (bytes) joined together in something called an encoding structure. This encoding structure may have values associated with it in terms on bits rather than bytes. An encoding structure has three parts: Identifier, Length, and Contents octets. Id octects are used for specifying primitive or constructor data types. Length octets define the size of the actual content. A boolean is thus represented by a single bit, and digits 0-9 could be BCD encoded. Each encoding structure carries with it it's interpretation.

An XML document could thus be encoded by converting the tags into a lookup table and a single octect code. If the tags are too many, or too long (i.e. FIRST-NAME) then there are significant savings by replacing the whole tag with an ASN.1 encoded datum. If we assume there are up to 255 different potential tags in the XML document definition, then each could be assigned to a single byte. Thus, encoding the tag <FIRST-NAME> would only take two bytes: One for the ID, one for the length octet, and zero for the contents (the tag ID could carry its own meaning).

I used to work with OSI networks at IBM. All the traffic was ASN.1-encoded. I personally think this is an excellent idea because ASN.1 parsers are simple and straightforward to implement, fast, their output is architecture independent, and the technology is very stable. Most important, this is a PRESENTATION LAYER protocol, not an APPLICATION LAYER protocol. The semantics of the encoding are left to the XML program. Carefully encoded ASN.1 will preserve the exact format of the original XML document while allowing its fast transmission between two systems.

http://www.bgbm.fu-berlin.de/TDWG/acc/Documents/as n1gloss.htm has an excellent overview if you're interested.

Cheers!
E

Share
twitter facebook
- Re:ASN.1 -- excellent choice (Score:1)
  
  by jhantin ( 252660 ) writes:
  
  An XML document could thus be encoded by converting the tags into a lookup table and a single octect code. If the tags are too many, or too long (i.e. FIRST-NAME) then there are significant savings by replacing the whole tag with an ASN.1 encoded datum. If we assume there are up to 255 different potential tags in the XML document definition, then each could be assigned to a single byte. Thus, encoding the tag <FIRST-NAME> would only take two bytes: One for the ID, one for the length octet, and zero for the contents (the tag ID could carry its own meaning).
  
  That's fine, but leaves the X out of XML: eXtensibility. A lot of existing XML schemas have slots of the form <xs:any namespace="##other"/&gt which allows any foreign tag, known or unknown, defined or not, to be incorporated at that point. As far as I know, ASN.1 can't cope with that without both explicit tagging and a fully-expanded OID for the incorporated entity (since it's not enumerable), which creates metadata bloat all over again.
  
  Another XML design goal is that a document be parsable (at least as far as an abstract syntax tree) without foreknowledge of the type structure. A couple of mechanisms from SGML that were forbidden in XML but don't defeat this goal are empty end-tags and unquoted (single-token) attribute values. Empty end-tags would knock a large chunk out of the size of a complex XML document by allowing a simple </> to close whatever element was last opened. Unquoted attribute values can save 2 characters per attribute and also feel more natural when the values aren't stringlike in nature; quoting small integers just grates on me, anyway.
  
  Another approach is defining a general binary shorthand coding for XML; a place I worked at had one in use for wire transmission of XML between hosts running their code base.
Actually... (Score:2)

by Captain_Frisk ( 248297 ) writes:

I wonder if the same could be done with XHTML or even regular HTML.

If HTML is written properyly, it is XML. Browsers nowadays let you cheat, and mix tags, and ignore quotes, but if the HTML is written to spec, then it is technically XML.

Captain_Frisk
ASN.1 isn't efficient--for a binary protocol (Score:2, Informative)

by Anonymous Coward writes:

ASN.1 and a way of encoding ASN.1 (BER is commonly used) produces output that's binary. Encoded like this it represents everything using type, length, and data. So to represent, say, the integer 255 you'd encode it like this, using BER: [type byte: ?] [length byte: 1] [value byte: 255] So that's three bytes to encode a single byte integer. Great. Basically the advantages of ASN.1 are that it's a well defined way to express data types, and it has encodings that are platform neutral. Compared to other fixed-field binary protocols it's fat and not particularly robust (got a length value wrong anywhere? You can't make any sense of the rest of the data). It's a binary protocol, which means you can't just look at the data and understand it, which I see as a huge disadvantage--in my mind the reason the net is big now is because the protocols are straightforward and easy to understand at a glance. I work with ASN.1 every day in the guise of SNMP and I've learned to become annoyed with it. Ever see ASN.1 in the form of a mib? Bleeh. XML is popular because it's flexible and extendable. You don't really have a prayer of understanding encoded ASN.1 data without the full ASN.1 definition for the data, whereas with XML it's inherently human readable. Maybe there's more to this and it's a good fit, but I am not a big fan of ASN.1. - Bill
This is funny ... (Score:2, Informative)

by ras ( 84108 ) writes:

I remember when I first came across ASN.1 years ago. Everybody hated it because the parser was sssooo big and complex. Why not just use a simple ASCII file was a common refrain. Sure ASN.1 was capable of representing just about any data structure in a reasonably compact form, but most information did not need complex data structures to represent it so why does anybody use ASN.1?

Well a decade or two later we get the ASCII version of ASN.1 - XML. And guess what? It's arguably harder to write a generic parser for XML that it is for ASN.1. (I still have not found a good open source validating parser for XML.) But guess what - everybody is wildly enthusiastic this time round. My how times change!

Actually ASN.1 and XML in some ways are very similar. They try to solve the same problem - how to represent complex data structures in a generic way. And they do it in a similar way. Because ASN.1 is binary and uses numbers instead of text tags it does use a lot less space to represent the same thing, although 2 verus 200 bytes claim is at best misleading. Most of the 200 bytes would probably be XML header (dtd's and stuff) which you would not put in the ASN.1 encoding.

And yes, XML is too fat for some applications. For example, if you are pumping out a 60k row SQL table to your 1000 clients every day you probably would not choose XML. That is why this idea has merit. It could give you the benefits of XML without the fat. To work someone who have to come up with a standard way of translating a DTD to ASN.1 encoding. I know it's a good idea because I came up with it myself a while back :).
- XML is BAD BAD BAD :) (Score:2)
  
  by Lazy Jones ( 8403 ) writes:
  I've never quite understood why some people found the idea to have machines communicate with a data format designed to be readable by humans so intelligent. Because of this oh-so-intelligent idea, we have:
  
  lots of broken pages with wrong HTML syntax
  
  lots of broken browsers with different ideas about how to interpret HTML
  
  a huge amount of bandwidth wasted with unnecessary whitespace and superfluous characters
  
  A standard format for web content without a human-readable form (i.e. a compact binary encoding) would have many advantages:
  
  syntax could be strict, so no ambiguity would be supported (i.e. there would never have been a reason to support things like both size="123" and size=123)
  
  the documents/content would be checked prior to publication on the WWW (because a "compilation" step would be needed in case the content was typed in by a human being, and high level libraries / widgets would probably be used in generators for dynamic content)
  
  no waste of precious bandwidth!
  
  anyone who wanted proprietary extensions for their encoding would have to give you their parser and generator/compiler
  
  OK, so XML is more strict and extensible than HTML, but it's still based on the irrational notion of encoding things in a human-readable form - trading bandwidth for readability - when in most cases no human will ever look at them.
  - Why human-readable formats are critical (Score:4, Insightful)
    
    by alienmole ( 15522 ) writes: on Tuesday August 07, 2001 @11:24PM (#2168613)
    
    I think you simply haven't realized quite how useful it is, in real life, for information to be human-readable. When it isn't, it becomes harder to deal with. If you've programmed anything on the web, you're certainly familiar with using "View Source" to see the final source of a page. If you use XML, you've also examined XML data that's been generated by, say, a database server.
    Contrast that with what I'm dealing with right now: I'm using JDBC to access an MS SQL Server. MS bought their SQL Server from Sybase many years ago, and inherited the binary TDS data stream protocol. As efficient as this might be, when you run into problems, you're in trouble. The TDS format is undocumented, so you can't easily determine what the problem might be, whereas a text format would be easy to debug. Anytime you have a binary protocol, you become totally reliant on the tools that are available to interpret that protocol. With text protocols, you're much less restricted.
    Another example of this is standard Unix-based email systems vs. Microsoft Exchange. Exchange uses a proprietary database for its message base, which makes it effectively inaccessible to anything but specialized tools and a poorly-designed API. If your email is stored in some kind of text format, OTOH, there are a wealth of tools that can deal with it, right down to simple old grep.
    The bottom line is that the human-readability (and writability!!) of HTML was one of the major factors in the success of the web. It's no coincidence that everything on the web, and many other successful protocols, such as SMTP, are text-based. To paraphrase your subject line, binary protocols are BAD BAD BAD.
    Calling human-readable formats "irrational" is a bit like Spock on Star Trek calling things "illogical" - what that usually really meant was that the actual logic of the situation wasn't understood. What's irrational is encoding important information, which needs to be examined by humans for all sorts of reasons that go beyond what you happen to have imagined, into a format which humans can't easily read.
    Human-readable formats and protocols will remain important until humans have been completely "taken out of the loop" of programming computers (which means not in the forseeable future).
    
    Parent Share
    twitter facebook
    - Humans and Tools (Score:2)
      
      by the red pen ( 3138 ) writes:
      
      I think you simply haven't realized quite how useful it is, in real life, for information to be human-readable.
      
      This is particularly true if the humans work for an intelligence agency, law enforcement, or even a corporation that has decided it has a burning need to know what your information is. Encryption is BAD BAD BAD! You think ASN.1 is a bitch to debug? Try figuring out what's wrong with HTML that has even wimpy 40-bit DES slapped on it.
      Of course, you never have to deal with that because the SSL stream is already decoded for you. That might not help with a new format, but maybe someone could come up with a special language that's really good for rearranging data and making it presentable. We could call is "Practical Language for Extracting and Reporting." Yeah, PLER. That has kind of a nice ring to it. There are quite a few jobs that need this kind of data munging, but are too small for Java and would take too long to write in C++, so I'd be there'd be a lot of interest in this hypothetical PLER language.
    - Re:Why human-readable formats are critical (Score:2)
      
      by Lazy Jones ( 8403 ) writes:
      
      I think you simply haven't realized quite how useful it is, in real life, for information to be human-readable. When it isn't, it becomes harder to deal with. If you've programmed anything on the web, you're certainly familiar with using "View Source" to see the final source of a page
      I have programmed something "on the web", but before it became such a fad, I used to like assembly language programming... Decoding a simple binary format is trivial and if the usual format for web pages was binary, Browsers would still allow you to use a "view source" command (to decode the binary format, probably giving a much more readable presentation of the structure of the document than the HTML code you can see nowdays)
    - - Re:Why human-readable formats are critical (Score:2)
        
        by alienmole ( 15522 ) writes:
        
        When did you last look at the TCP/IP specs? When was the last time you thought that TCP/IP should have been encoded as XML (with the spec in the form of a DTD) to avoid the bother of using tcpdump or snoop to get a human-readable presentation?
        When was the last time you saw a web page designer or web application programmer dealing with any of this stuff?
Sounds like they're spewing buzzwords... (Score:2)

by coupland ( 160334 ) writes:

A 200 byte message reduced to 2 bytes? I don't know ASN.1 but I would have to assume tags are counted, and added to an indexed table. Using variable-length encoding you can squeeze some extra compression out of your algorithm but 100:1 compression? So basically you have a 180-byte XML tag with a single value reduced to a single symbol with an index of 1. Meaning that the "benchmark" is a sham. Add to that the fact that the symbol table obviously wasn't counted in their "compression" technique. I would assume you don't LZ-compress the symbol table (creating a symbol table for a symbol table) so basically what you have is after compression the code goes from 200 bytes to 200 bytes + 2 bytes and a few bits. What a joke. The worst part of all is that I'm sure it achieves fairly good compression on a 100k XHTML document but they have to throw bogus numbers at us thinking we'll go all doe-eyed. Very insulting.
No! Not ASN.1! Make it stop! Make it stop! (Score:2)

by osgeek ( 239988 ) writes:

After writing an SNMP management console with an ASN.1 parser, I have nightmares about the protocol. Sure, it's very efficient yet flexible, but it makes all sorts of neural connections happen in your brain that are better left open. :P

Since XML was designed for humans to be able to look at to a certain extent, why not just have a standard compreession method that's included with all XML parsers? Whenever you transmit or save the XML file, it should be saved in the compressed format.
GPL'ed ASN.1 encoder/decoder (Score:2, Informative)

by foo ( 143650 ) writes:

http://www.fokus.gmd.de/ovma/freeware/snacc/
Oh yeah? (Score:2)

by Andrewkov ( 140579 ) writes:

ASN.1 could be used to compress a 200 byte XML document to 2 bytes and few bits
Oh yeah?? I wrote a protocol that can take a 6 MB MP3 file and compress it to under 10 bytes!
(Some sound quality degragation may occur, use at own risk)
The ASN.1 faithful just don't get it (Score:5, Insightful)

by RobertGraham ( 28990 ) writes: on Tuesday August 07, 2001 @10:29PM (#2168402) Homepage

Preface: I've written parsers for ASN.1 (esp. SNMP MIBs, but also generic), BER/DER (same thing), PER, HTML, XML, and while we are at it, XDR and CORBA IDL. I've written a BER decoder that can decode SNMP at gigabit/second speeds.
There are a vast number of differences between ASN.1 and XML. To think that ASN.1 is in any way related to XML demonstrates that they just don't "get it".
1. Why not XDR or just raw binary?
Why not just specify your own binary format for you application? The thing that the ASN.1 bigots don't understand is that in most real-world applications, the ASN.1 formatting provides only overhead but no realworld value. This happens in XML, too, but the value proposition for XML is much clearer. A good example is the H.323 series PER encoding which is just plain wrong: well-documented custom encoding would have been tons better.
2. DTD or no DTD
The ASN.1 language is essentially a DTD; it gets encoded in things like BER. The trick is that I can parse "well-formed" XML content without knowing the DTD. This is impossible with current ASN.1 encoding. The idea of DTD-free "well-formed" input and DTD-based "valid" input is at the core of XML. Yes, both ASN.1 and XML both format data, but proposing ASN.1 as being a valid substitute means you just don't grok what XML is all about
3. Interoperability
The Internet grew up in an environment that parsers should be liberal in what they receive. This was important in early interoperability, but now is a detriment. For example, it is impossible to write an interoperable HTML parser. XML took the radical zen approach of mandating that any parser that excepts malformed input is BAD. As a result, anybody writing an parser knows the input will be well-formed. There is one-and-only-one way to represent input (barring whitespace), so writing parsers is easy. ASN.1 has taken the opposite approach, there are a zillion ways to represent input.
As a result, non-interoperable ASN.1 implementations abound. For example, most SNMP implementations are incompatible. They work only "most" of the time. Go to a standard SNMP MIB repository and you'll find that the same MIB must be published multiple times to handle different ASN.1 compilers.
The long and the short of it is that ASN.1 implementations today are extremely incompatible with each other, whereas XML libraries have proving to extremely interoperable. Right now, XML has proven the MOST interoperable way to format data, and ASN.1 has proven to be the LEAST.
4. Bugs
Most XML parsers have proven to be robust, most ASN.1 parsers have proven to be buggy. You can DoS a lot of devices today by carefully crafting malformed SNMP BER packets.
5. Security
You can leverage ASN.1's multiple encodings to hack. For example, my SideStep program shows how to play with SNMP and evade network intrusion detection systems: http://robertgraham.com/tmp/sidestep.html [robertgraham.com] At the same time, ASN.1 parsers are riddled with buffer-overflows.
Anyway, sorry for ranting. I think XML advocates are a little overzealous (watch carefully your possessions or some XMLite will come along and encode it), but ASN.1 is just plain wrong. The rumor is that somebody through it together as a sample to point out problems, but it was accidentally standardized. It is riddled with problems, it should be abandoned. An encoding system is rarely needed, but if you need one, pick XDR for gosh sakes.

Share
twitter facebook
- Re:The ASN.1 faithful just don't get it (Score:2, Insightful)
  
  by haapi ( 16700 ) writes:
  
  Well said! A Silly Notation.1 is a hideous encoding scheme. The BER is simply ambiguous -- you don't need to send malformed packets to devices, rather simply send valid BER packets that just aren't right, but still follow the rules, and watch carnage ensue.
- - Re:The ASN.1 faithful just don't get it (Score:1)
    
    by RobertGraham ( 28990 ) writes:
    
    It's not an attempt to flog the product; it is coincident on the fact that I'm likely to post where my expertise is, and my expertise is what I've been doing for the last several years. The problem of "multiple-encodings" is a BIG one in security. There are addendums to both ASN.1 and the Unicode standard. Actually, the reason for DER is to get rid of the ambiguity in BER because of security reasons (that's why DER is always specified for security-related ASN.1). There are other papers that describe the how multiple encodings are a big problem for security, the only thing I had handy was my own research.
The same struggle in the VoIP world (Score:2)

by Lumpish Scholar ( 17107 ) writes:

Among the voice-over-IP (VoIP) protocols out in the world are H.323 (an ITU-T spec that makes heavy use of ASN.1) and SIP [columbia.edu] (RFC 2543 et. al.)

H.323 interoperability is tough. Some problems are due to differences in how one entity encodes a piece of data and another decodes it. Many H.323 implementations, um, do not fail gracefully under such circumstances.

SIP call signalling looks like HTTP. There have been complaints that it's too verbose, and needs to be replaced with something binary. One proposal [ietf.org] suggests using a binary encoding. It uses LZW [google.com] compression and shared "codebooks" (schemas?)

That's just for call signalling. Both these VoIP protocols (and others) use RTP [columbia.edu] ("Real Time Protocol") for voice, video, etc.; that's encoded and compressed pretty darned seriously.

(I'm not speaking for my employer, I'm just speaking my mind.)
- Re:The same struggle in the VoIP world (Score:2, Funny)
  
  by OpCode42 ( 253084 ) writes:
  
  There is a revolutionary new form of voice compression that works, not only over VoIP but also on your analog telephone lines.
  
  Simply cut out the un-needed words.
  
  [dials]
  Broken down. Main street. Need spare tyre.
  [hangs up]
  
  See, it'll half your phone bills!
BFD (Score:2)

by r_j_prahad ( 309298 ) writes:

Big fuckin' deal. I compressed an entire Microsoft Operating System into a single byte once. HALT 0
- Amen (Score:2)
  
  by joss ( 1346 ) writes:
  
  XML is *not* an "enabling technology".
  XML is a file format.
  XML shows that you *can* use a single file format for everything. That doesn't mean it a good idea, except in a couple of particular places.
  
  The reason it's caught on is that the average programmers is getting stupider. It's genuinely difficult for these people to write a simple parser, so they use XML for everything. Nevermind that it's harder to read/write for humans than some custom HCI format or insanely more verbose and slower to scan than some custom binary format. They preach interoperability when this is irrelevent to cover for laziness and incompetence.
  
  If I hear one more fuckwit say, "hey lets create an XML based programming lanugage", I'll scream.
- Re:Using XML is _ASKING_ for bloat (Score:1)
  
  by dingbat_hp ( 98241 ) writes:
  
  XML is a very wasteful and generic file format.
  So what if it's wasteful ? Bytes are cheap. The entropy content of XML isn't inefficient (as could also be said of ASN.1), so low-level compression algorithms can equally well compress them. The message "Your Amazon order has billed your credit card $23 and sent you a copy of 'Fly Fishing'" compresses down to much the same size in either encoding.
  If your network transport layers don't do compression, blame the network not the content.
  Secondly, when did "generic" become a criticism ?
  Thirdly, XML isn't just a serialization format. Admittedly it is now, was even more so in the early days, and the "XML For Morons" books get it entirely wrong, but the XML Infoset [w3.org] WG are trying to steer it back. Think data model, not just bytes on the wire - that's the real reason why ASN.1 is an inappropriate comparison.
  ASN.1 is like EDI and Read Codes. It's an application-level solution to byte squashing. The things are nightmares to work with, and simply not needed any more.
- Re:Check this out! (Score:1)
  
  by wtmcgee ( 113309 ) writes:
  
  i havent used DOS in a while but i think you got even that command wrong. the switch comes after the destination drive to format... format c: /q is the proper command.
- - OT (slightly) : SNMP (Score:1)
    
    by maroberts ( 15852 ) writes:
    
    Yes, it is widely used, especially amongst Telcos for managing their networks. A lot of telecommunications equipment is managed by and supports SNMP. I've managed to stay clear of having to bother with it despite the fact I write network management software as my job (which I think is a minor miracle)
    
    As far as computers go, I was under the impression you could manage computers running Windows (and maybe even Linux and Unix) using SNMP, so maybe someone can provide more detail.
- Re:ASN.1 is evil (Score:1)
  
  by iguana ( 8083 ) writes:
  
  Hear hear! I have lost more time and hair due to debugging ****ing ASN.1 encoding problems in our SNMP stack. What a pain in the ***. We couldn't pitch SNMP off the sleigh fast enough when customers started asking for HTTP (Web) management interfaces to our products.
- Re:ASN.1 is evil (Score:3, Informative)
  
  by eigenhead ( 245821 ) writes:
  
  Its tha devils spawn I tell ya. Its extremely complex and hard to debug.
  
  Having worked with ASN.1 and CMIP I can certainly state that most examples for ASN.1 data types I've seen (M3100 and that lot) are far too complex (too many CHOICE, ANY values). But I still think ASN.1 and BER/PER are a decent way to efficiently encode data in a platform-independent manner. ASN.1 data types can be really simple or really complex, so blame the designers defining complex types in ASN.1 not the notation itself.
  
  The whole reason the net has taken off so quickly is the simple, open and clear protocols used. You need to debug your email server? Just telnet in and talk to it! With ASN.1 you need a compiler to make each damn data packet.
  
  I think it is only fair to state that a lack of good (I mean open and free, of course) ASN.1 decoders/encoders contributes to the lack of widespread adoption of technologies like ASN.1. Not that tools like SNACC are all that bad, but were good tools around in the early days of ASN.1? Certainly CMIP never had good free toolkits.
  
  The standards bodies play a role here. Making sure you advocate for your standard early on and doing your best to promote good open reference implementations goes a long way towards helping a standard gain widespread adoption.
  
  I think SNMP is a good example of how ASN.1 can be used effectively. Just because ASN.1 allows for complex types doesn't mean people have to build complex types into their standards/protocols.
  
  I'm growing tired of the "I've got the world on a String" school of data typing ;->
  
  Sometimes efficient, compact encoding/decoding is just what the solution calls for, whether it is ASN.1 BER/PER or the OMG IDL using CDR.
- Re:200 bytes to 2 +/- (Score:1)
  
  by kenh ( 9056 ) writes:
  
  This is a BS compression method.
  
  ASN.1, as I understand it is structured as follows:
  
  [data_type][data_length][data......]
  
  so, to convert
  
  data string
  
  (30 bytes)
  
  to an ASN.1 format would result in:
  
  [4][11][data string]
  (13 bytes)
  
  BUT the sender and receiver need to have already agreed that a data_type value of "4" indicates a datatype of "xml_tag", that the length code that follows is of size 8 bits - thus removing the self-describing value of an XML file type.
  
  If you want to compare apples to apples, you need to add in the size of the tables that will map the "data_type" values to their corresponding xml tag types...
  
  How is this a huge improvement over comma-delimited text, since the sender and receiver have to know the layout before the data can be sent???
  
  Ken
- Re:You sure you read the same story (Score:1)
  
  by MrBoring ( 256282 ) writes:
  
  Sure you could run it through a compressor, but how often would that happen? I thought people actually like being that verbose and inefficient.
- Re:What was it used for? (Score:2, Informative)
  
  by andri ( 23774 ) writes:
  
  It is still used to encode SNMP packets, for example.
- X.509 digital certs, among other things (Score:2)
  
  by coyote-san ( 38515 ) writes:
  
  It's still used for many, many things. One of the major current uses is X.509 digital certificate encoding.
- Re:What was it used for? (Score:2, Informative)
  
  by Steven Reddie ( 237450 ) writes:
  
  And one that we all use most days: SSL. ASN.1 is a syntax for specifying data structures. It has nothing to do with the actual encoding of the "bits on the wire". In fact, that is part of the reason for using ASN.1 for specifying data structures; you don't need to care about the encoding. It is ASN.1's related encoding rules such as BER (Basic Encoding Rules), DER (Distinguished Encoging Rules), and PER (Packed Encoding Rules) that specify how the data structures are encoded. I only work with BER/DER. It would be impossible to say much about anything in 2 bytes using those encoding rules since the first byte tells you what type of data is about to follow, and the next byte(s) tell you the length of the data. So you've used up at least 2 bytes before having said anything useful.
- Re:100:1 text compression ? (Score:3, Informative)
  
  by cREW oNE ( 445594 ) writes:
  
  First....
  
  200 BYTE (!) XML documents are pretty rare. They probably standarized a few headers and instead of sending they just send some code.
  
  Don't believe for a second we're talking about a compression scheme here. The usual slashdot lack of information applies.
- - Re:not quite (Score:2, Funny)
    
    by thejake316 ( 308289 ) writes:
    
    Well, in one benchmark my friend's sister told me about, a friend of a 200-byte message was compressed to 2 bytes and a few bits when he crashed his car into a tree, but they never found his eyes, so they think he always had two glass eyes but never told anybody. True story, ask anyone.
- Re:What? No way. (Score:3, Funny)
  
  by ElRata ( 513239 ) writes:
  
  This is even better than ASN.1.
  Original XML (130 bytes):
  
  <AnEncodedInteger>
  
  The whole number that is located between
  one hundred seventy seven and
  one hundred seventy nine
  </AnEncodedInteger>
  
  Binary encoded (1 byte):
  10110010
  That's a 130:1 ratio.
- - Re:Try UDP with bigger packets (Score:2, Informative)
    
    by reflective recursion ( 462464 ) writes:
    
    Seriously, how much bandwidth do we lose to simple ACKs, NACKs, and packet headers? How often do networks really drop packets that we couldn't use UDP for web applications?
    UDP drops packets enough, that is for sure. The purpose of TCP is to be a _stable_ transport. UDP simply throws messages towards their destination and hopes they hit their target. Say an HTML document is sent via UDP. Say you get 1 packet, miss the 2nd and get the 3rd instead. How does your browser know packet 3 is _not_ packet 2. This also says nothing about the order of packets sent (with UDP packet 3 could arrive before 2 or 1). So then you begin to hack on a protocol that detects the correct order. Then you hack on another protocol that makes sure packets even arrive. Then you will have TCP all over again. :-)
    As for HTML and XML, we could cut ascii data by 20% if we just got rid of useless carriage returns, non-paragraph whitespace, tag quotation marks, HTML comments... just compare the source HTML for Yahoo with CNN.com... BIG difference.
    
    Ahh. We finally see that just learning HTML (or in general, web-oriented languages such as VB script and Javascript) does not make a good programmer. If you have never seen a VB program's source code.. well, don't. I don't mean to bash VB (or web) programmers though. The problem with HTML/XML is it is not compiled (like Java machine-independent). I believe this is more to do with the web outgrowing its purpose. It was never designed for graphics, let alone plug-ins, Javascript/Java, frames (should I really continue? :P ).

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

ASN.1 (Score:1)

200 bytes - 2 bytes and some bits? (Score:1)

Hoax. (Score:2)

Re:Hoax. (Score:1)

Re:Hoax. (Score:2)

Postum primus? (Score:3, Funny)

Lossy-soft! (Score:4, Funny)

Re:Postum primus? (Score:2)

Re:Postum primus? (Score:2)

Re:Postum primus? (Score:1)

Re:Postum primus? (Score:2)

Re:Postum primus? (Score:2, Insightful)

Re:Postum primus? (Score:2)

Re:Postum primus? (Score:2, Informative)

Re:Postum primus? (Score:2, Funny)

They don't build 'em like they used to. (Score:3, Interesting)

Re:They don't build 'em like they used to. (Score:3, Interesting)

mod_gzip ? (Score:4, Informative)

Re:mod_gzip ? (Score:3, Informative)

Re:mod_gzip ? (Score:2)

Re:mod_gzip ? (Score:1)

Re:mod_gzip ? (Score:1)

Bandwidth or CPU? (Score:1)

Hello, haven't we read Comer's book? (Score:4, Interesting)

Re:Hello, haven't we read Comer's book? (Score:3, Informative)

Re:Hello, haven't we read Comer's book? (Score:2)

Re:Hello, haven't we read Comer's book? (Score:2)

Re:Hello, haven't we read Comer's book? (Score:2)

bandwidth is cheap (Score:2, Insightful)

bandwidth is cheap? On what planet? (Score:2)

Re:bandwidth is cheap? On what planet? (Score:2)

Re:bandwidth is cheap? On what planet? (Score:2)

Re:bandwidth is cheap (Score:4, Informative)

Re:bandwidth is cheap (Score:2)

Re: Leave compression to the hardware (Score:2)

Re:bandwidth is cheap (Score:2)

ASN.1 "compression" vs XML (Score:3, Insightful)

Re:ASN.1 "compression" vs XML (Score:2)

Re:ASN.1 "compression" vs XML (Score:2)

Re:ASN.1 "compression" vs XML (Score:2)

Re:ASN.1 "compression" vs XML (Score:2)

Re:ASN.1 "compression" vs XML (Score:2)

Multimedia? (Score:3, Interesting)

Re:Multimedia? (Score:2, Funny)

ASN.1 not suitable (Score:5, Informative)

Re:ASN.1 not suitable (Score:5, Informative)

Re:ASN.1 not suitable (Score:3, Interesting)

Re:ASN.1 not suitable, but XML is still good (Score:2)

Re:ASN.1 not suitable, but XML is still good (Score:2)

Those who do not undestand ASN.1 .... (Score:3, Informative)

That's funny. (Score:1)

Re:Those who do not undestand ASN.1 .... (Score:2)

Binary Bits (Score:2)

ASN.1 was designed to be efficient (Score:4, Informative)

Missing the point? (Score:2, Insightful)

HTML could be compressed (Score:2, Flamebait)

Re:HTML could be compressed (Score:1)

Bandwidth Versus Computational Effort (Score:2, Insightful)

Re:Bandwidth Versus Computational Effort (Score:2)

ASN.1 resources on the web. (Score:3, Informative)

Reverse Engineer hax0r3d! (Score:4, Funny)

Totally misses the point (Score:5, Insightful)

Re:Totally misses the point (Score:2)

Re:Totally misses the point (Score:2, Insightful)

Missing the point as to why XML is good (Score:4, Insightful)

ASN.1 -- excellent choice (Score:4, Informative)

Re:ASN.1 -- excellent choice (Score:1)

Actually... (Score:2)

ASN.1 isn't efficient--for a binary protocol (Score:2, Informative)

This is funny ... (Score:2, Informative)

XML is BAD BAD BAD :) (Score:2)

Why human-readable formats are critical (Score:4, Insightful)

Humans and Tools (Score:2)

Re:Why human-readable formats are critical (Score:2)

Re:Why human-readable formats are critical (Score:2)

Sounds like they're spewing buzzwords... (Score:2)

No! Not ASN.1! Make it stop! Make it stop! (Score:2)

GPL'ed ASN.1 encoder/decoder (Score:2, Informative)

Oh yeah? (Score:2)

The ASN.1 faithful just don't get it (Score:5, Insightful)