Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
News

Old Protocol Could Save Massive Bandwidth 287

GFD writes: "The EETimes has a story about a relavtively old protocol for structured information call ASN.1 could be used to compress a 200 byte XML document to 2 bytes and few bits. I wonder if the same could be done with XHTML or even regular HTML."
This discussion has been archived. No new comments can be posted.

Old Protocol Could Save Massive Bandwidth

Comments Filter:
  • Re:Postum primus? (Score:2, Informative)

    by Anonymous Coward on Wednesday August 08, 2001 @03:17AM (#2122016)

    LZW is lossless, and GIF isn't lossy in the normal cumulative sense, but since most images are naturally produced using more than 2^8 distinct colors, the first quantization does lose a great deal of information. (Apparently some people claim the GIF spec allows multiple palettes and thus more colors, but since this is in dispute I wouldn't count on it working.)

    I don't read AC posts...want to be heard? Grow a set and log in.

    Truth doesn't vary with the speaker. Identity is only useful for bigots.

  • by andri ( 23774 ) on Tuesday August 07, 2001 @07:22PM (#2167433)
    It is still used to encode SNMP packets, for example.
  • mod_gzip ? (Score:4, Informative)

    by AdamInParadise ( 257888 ) on Tuesday August 07, 2001 @07:26PM (#2167480) Homepage
    Ever heard of mod_gzip? It compress anything that goes trough your Apache webserver and it is supported by most browsers. With everything running over http theses days, this is the way to go...
  • by cREW oNE ( 445594 ) on Tuesday August 07, 2001 @07:26PM (#2167481)
    First....

    200 BYTE (!) XML documents are pretty rare. They probably standarized a few headers and instead of sending they just send some code.

    Don't believe for a second we're talking about a compression scheme here. The usual slashdot lack of information applies.
  • ASN.1 not suitable (Score:5, Informative)

    by cartman ( 18204 ) on Tuesday August 07, 2001 @07:33PM (#2167523)
    ASN.1 is the basis of a great many protocols, LDAP among them. What is not mentioned in the article is that ASN.1 is a binary protocol and is therefore not human-readable. It may save space for bandwidth-constrained applications. However, bandwidth has a tendency to increase over time. When all wireless handhelds have a megabit of bandwidth, we would sorely regret being tied to ASN.1, as LDAP regrets it now.

    Not to mention, ASN.1 does not generally reduce the document size by more than 40% compared to XML. Think about it: how much space is really taken by tags?

    It's also worth noting that there is lots of documentation surrounding XML. With ASN.1 you have to download the spec from ITU which is an INCREDIBLY annoying organization and their specs are barely readable and they charge money to look at them, despite the fact that they are supposedly an open organization. The IETF and the W3C are actually open organizations; ITU just pretends to be. ITU does whatever it can to restrict the distribution of their specifications.
  • by Anonymous Coward on Tuesday August 07, 2001 @07:33PM (#2167526)
    If I remember the history right, ASN.1 was designed during the era of X.25 and charging for every 64 byte packet. I used to use ASN.1 for remote communications in a commercial product, but later changed it to a hybdrid of CORBA and XML, mostly due to more modern techologies, and since the actual bandwith did not cost that much anymore, it did not make sense to keep an old protocol alive. ASN.1 has it's drawbacks too--8 different ways to encode a floating point number. It was a political reason, because everyone involved wanted their own floating point format included, and as a net result, everyone has to be able to decode 8 different formats. A encoding designed by a committee (a stoneage telcom committe as a matter of fact).
  • by gd23ka ( 324741 ) on Tuesday August 07, 2001 @07:43PM (#2167577) Homepage
    Actually ASN.1 is a formal way of specifying how to encode data into binary representations like BER, CER, DER and PER which do save bandwidth compared to XML.

    Those of you that want to find out more about ASN.1, can pick up free e-books on ASN.1 here [oss.com]. There's some blatant propaganda in them for OSS Nokalva's ASN.1 compiler, but of course there's also snacc [gnu.org], an GPL'd open source ASN.1 compiler. Snacc however only generates code for encoding to BER, so you might also want to check out the a hacked version [qut.edu.au] of snacc from Queensland University of Technology.

    ASN.1 is a base technology for a lot of standards out there like X.509, PKCS and LDAP, the OSI application layer protocols etc.
  • Re:ASN.1 is evil (Score:3, Informative)

    by eigenhead ( 245821 ) on Tuesday August 07, 2001 @07:51PM (#2167619)

    Its tha devils spawn I tell ya. Its extremely complex and hard to debug.

    Having worked with ASN.1 and CMIP I can certainly state that most examples for ASN.1 data types I've seen (M3100 and that lot) are far too complex (too many CHOICE, ANY values). But I still think ASN.1 and BER/PER are a decent way to efficiently encode data in a platform-independent manner. ASN.1 data types can be really simple or really complex, so blame the designers defining complex types in ASN.1 not the notation itself.

    The whole reason the net has taken off so quickly is the simple, open and clear protocols used. You need to debug your email server? Just telnet in and talk to it! With ASN.1 you need a compiler to make each damn data packet.

    I think it is only fair to state that a lack of good (I mean open and free, of course) ASN.1 decoders/encoders contributes to the lack of widespread adoption of technologies like ASN.1. Not that tools like SNACC are all that bad, but were good tools around in the early days of ASN.1? Certainly CMIP never had good free toolkits.

    The standards bodies play a role here. Making sure you advocate for your standard early on and doing your best to promote good open reference implementations goes a long way towards helping a standard gain widespread adoption.

    I think SNMP is a good example of how ASN.1 can be used effectively. Just because ASN.1 allows for complex types doesn't mean people have to build complex types into their standards/protocols.

    I'm growing tired of the "I've got the world on a String" school of data typing ;->

    Sometimes efficient, compact encoding/decoding is just what the solution calls for, whether it is ASN.1 BER/PER or the OMG IDL using CDR.

  • by Jeffrey Baker ( 6191 ) on Tuesday August 07, 2001 @07:56PM (#2167641)
    at least XML gives a clear description that I can use with a packet sniffer when trying to debug something.

    Translated:

    My debugging tools are inadequate, and my brain is inadequate for improving them.

    You have a powerful, general-purpose computer at your disposal. Why should you care if the protocol can be inspected with the naked eye? Do you use an oscilloscope to pretty-print IP packets? No, you use ethereal [ethereal.com]! If XML is encoded using ASN.1, then the tools will be modified to decode ASN.1 before showing it to the human. Ethereal already knows about ASN.1 [ethereal.com] because it uses it to display LDAP traffic. If you don't like ethereal, try Unigone [unigone.com].

    Use your CPU, not your eyeballs!

  • by Anonymous Coward on Tuesday August 07, 2001 @08:00PM (#2167658)
    It is not the compression, but data representation.

    In BER encoding, a integer that can fit into single byte takes two characters, where as in XML, it can take almost infinite number of bytes depending on number of tags and how they are nested.

  • by ciurana ( 2603 ) on Tuesday August 07, 2001 @08:02PM (#2167672) Homepage Journal

    Some people in this forum think that ASN.1 is a replacement for XML; others think of it as a "lossy" compression algorithm. ASN.1 is neither. Read the article and learn a bit about ASN.1 before forming an opinion. Most important, ASN.1 has been an interoperability standard for at least 10 years prior to the introduction of XML.

    ASN.1 is a standard interoperability protocol (ISO IS 8824 and 8825) that defines a transfer syntax irrespective of the local system's syntax. In the scenario described in the article, the local syntax is XML and the transfer syntax is ASN.1. ASN.1 is a collection of data values with some meaning associated with them. It doesn't specify how the values are to be encoded. The semantics of those values are left to the application to resolve (i.e. XML). ASN.1 defines only the transfer syntax between systems.

    ASN.1 codes are defined in terms of one or more octets (bytes) joined together in something called an encoding structure. This encoding structure may have values associated with it in terms on bits rather than bytes. An encoding structure has three parts: Identifier, Length, and Contents octets. Id octects are used for specifying primitive or constructor data types. Length octets define the size of the actual content. A boolean is thus represented by a single bit, and digits 0-9 could be BCD encoded. Each encoding structure carries with it it's interpretation.

    An XML document could thus be encoded by converting the tags into a lookup table and a single octect code. If the tags are too many, or too long (i.e. FIRST-NAME) then there are significant savings by replacing the whole tag with an ASN.1 encoded datum. If we assume there are up to 255 different potential tags in the XML document definition, then each could be assigned to a single byte. Thus, encoding the tag <FIRST-NAME> would only take two bytes: One for the ID, one for the length octet, and zero for the contents (the tag ID could carry its own meaning).

    I used to work with OSI networks at IBM. All the traffic was ASN.1-encoded. I personally think this is an excellent idea because ASN.1 parsers are simple and straightforward to implement, fast, their output is architecture independent, and the technology is very stable. Most important, this is a PRESENTATION LAYER protocol, not an APPLICATION LAYER protocol. The semantics of the encoding are left to the XML program. Carefully encoded ASN.1 will preserve the exact format of the original XML document while allowing its fast transmission between two systems.

    http://www.bgbm.fu-berlin.de/TDWG/acc/Documents/as n1gloss.htm has an excellent overview if you're interested.

    Cheers!

    E
  • by pegacat ( 89763 ) on Tuesday August 07, 2001 @08:08PM (#2167720) Homepage

    This is pretty much right. I do a lot of work on X500 / ldap / security, and ASN1 is used throughout all this. It does a pretty good job, but as the poster points out, the ITU is a completely brain damaged relic of the sort of big company old boys club that used to make standards. It's very difficult to get info out of them. (Once you get it though, it's usually pretty thorough!)

    As for the 'compression', well, yes, it sorta would be shorter under many circumstances. ASN1 uses pre-defined 'global' schema that everyone is presumed to have. Once (!) you've got that schema, subsequent messages can be very terse. (Without the schema you can still figure out the structure of the data, but you don't know what its for). For example, I've seen people try to encode X509 certificates (which are ASN.1) in XML, and they blow out to many times the size. Since each 'tag equivalent' in ASN.1 is a numeric OID (object identifier), the tags are usually far shorter than their XML equivalents. And ASN.1 is binary, whereas XML has to escape binary sequences (base64?).

    But yeah, ASN.1 is a pain to read. XML is nice for humans, ASN1 is nice for computers. Both require a XML parser/ ASN.1 compiler though. ASN.1 can be very neat from an OO point of view, 'cause your ASN.1 compiler can create objects from the raw ASN.1 (a bit like a java serialised object). But I can't see ASN.1 being much chop to compress text documents, there are much better ways of doing that around already (and I thought a lot of that stuff was automatically handled by the transport layer these days?)

    And just for the record... the XML people grabbed a bunch of good ideas from ASN.1, which is good, and LDAPs problems are more that they screwed up trying to do a cut down version of X500, than that they use ASN.1 :-)!

  • by StormyMonday ( 163372 ) on Tuesday August 07, 2001 @08:47PM (#2167943) Homepage
    are condemned to repeat it. Badly.

    I have had to deal with dozens of binary protocols that do the same thing as ASN.1, and do it worse.

    As to comparisons, XML and ASN.1 are designed for different jobs. Designing a Web page in ASN.1 would be ridiculous. Sending (say) telemetry data encoded in XML is equally ridiculous. I can believe that *data* transmissions could be 100 times larger in XML than in ASN.1. You have the header, DTD, some namespace delcarations, and a bunch of nested tags, just to express a couple of numbers.

    Problem is, XML is one of the latest forms of fairy dust that Management has latched onto. "Sprinkle this on your project and it will fly!" So programs have XML grafted onto them anywhere it might fit.

    A particularly cute example is SOAP (Microsoft's firewall-bypass protocol) It's going to be fun to watch people try to squeeze some performance out of a SOAP based system that tries to do something interactive.

    As to the ISO, yeah, they're seriously obnoxious. They tend to go off into their own little world, redefine standard terminology so they're incomprehensible to outsiders, and come up with stuff that can't be implemented. (Nobody uses ASN.1 -- it's unimplementable. When people talk about using ASN.1 for something real, they're talking about a subset. A subset, of course, cannot claim conformance to the standard.) The crowning insult, of course, is that they fund the organization by selling the standards. Hey, it's a standard -- you *have* to buy it!

    "It's all in knowing what wrench to use to pound in the screw."

  • Re:Actually... (Score:1, Informative)

    by Anonymous Coward on Tuesday August 07, 2001 @08:59PM (#2167991)
    Hmm... slight nuance to be added :)

    If HTML is written properly, it is easily converted to XHTML (and thus XML) by changing a few tags and adding the XML-formalities.
    For example: changing all single tags (<br>) into XML-single tags (<br />), or changing name-only attributes (<td nowrap>) into full attributes (<td nowrap="nowrap">). Check the XHTML 1.0 specs on w3.org for the full story ;) (can't access it at the moment for reasons unknown)
  • by Anonymous Coward on Tuesday August 07, 2001 @09:05PM (#2168019)
    ASN.1 and a way of encoding ASN.1 (BER is commonly used) produces output that's binary. Encoded like this it represents everything using type, length, and data. So to represent, say, the integer 255 you'd encode it like this, using BER: [type byte: ?] [length byte: 1] [value byte: 255] So that's three bytes to encode a single byte integer. Great. Basically the advantages of ASN.1 are that it's a well defined way to express data types, and it has encodings that are platform neutral. Compared to other fixed-field binary protocols it's fat and not particularly robust (got a length value wrong anywhere? You can't make any sense of the rest of the data). It's a binary protocol, which means you can't just look at the data and understand it, which I see as a huge disadvantage--in my mind the reason the net is big now is because the protocols are straightforward and easy to understand at a glance. I work with ASN.1 every day in the guise of SNMP and I've learned to become annoyed with it. Ever see ASN.1 in the form of a mib? Bleeh. XML is popular because it's flexible and extendable. You don't really have a prayer of understanding encoded ASN.1 data without the full ASN.1 definition for the data, whereas with XML it's inherently human readable. Maybe there's more to this and it's a good fit, but I am not a big fan of ASN.1. - Bill
  • This is funny ... (Score:2, Informative)

    by ras ( 84108 ) <russell+slashdot ... rt DOT id DOT au> on Tuesday August 07, 2001 @09:24PM (#2168101) Homepage

    I remember when I first came across ASN.1 years ago. Everybody hated it because the parser was sssooo big and complex. Why not just use a simple ASCII file was a common refrain. Sure ASN.1 was capable of representing just about any data structure in a reasonably compact form, but most information did not need complex data structures to represent it so why does anybody use ASN.1?

    Well a decade or two later we get the ASCII version of ASN.1 - XML. And guess what? It's arguably harder to write a generic parser for XML that it is for ASN.1. (I still have not found a good open source validating parser for XML.) But guess what - everybody is wildly enthusiastic this time round. My how times change!

    Actually ASN.1 and XML in some ways are very similar. They try to solve the same problem - how to represent complex data structures in a generic way. And they do it in a similar way. Because ASN.1 is binary and uses numbers instead of text tags it does use a lot less space to represent the same thing, although 2 verus 200 bytes claim is at best misleading. Most of the 200 bytes would probably be XML header (dtd's and stuff) which you would not put in the ASN.1 encoding.

    And yes, XML is too fat for some applications. For example, if you are pumping out a 60k row SQL table to your 1000 clients every day you probably would not choose XML. That is why this idea has merit. It could give you the benefits of XML without the fat. To work someone who have to come up with a standard way of translating a DTD to ASN.1 encoding. I know it's a good idea because I came up with it myself a while back :).

  • Re:mod_gzip ? (Score:3, Informative)

    by Nemesis][ ( 21247 ) on Tuesday August 07, 2001 @09:29PM (#2168119)
    Yes, it's a very welcome and needed addition to a "bloated" protocol. But just be aware of some possible drawbacks when using it.

    It dosn't work with SSL easily. See this [over.net] thread if curious. I ran into this when I wanted to force Open Webmail [ncku.edu.tw] to use https only and found the pages were not getting compressed.

    And take note of possible problems [over.net] with caching proxies serving pages to browsers that can't handle it.

    It has a few other quirks, but overall I for one am quite satisfied with it.
    Curious about the savings it brings? Use this [krakow.pl].

    Machines are always broken till the repairman comes.
  • Re:This is funny ... (Score:1, Informative)

    by Anonymous Coward on Tuesday August 07, 2001 @09:35PM (#2168139)
    "I still have not found a good open source validating parser for XML"

    Then you're not looking very hard. Try Xeres.
  • by Steven Reddie ( 237450 ) on Tuesday August 07, 2001 @09:39PM (#2168164)
    And one that we all use most days: SSL. ASN.1 is a syntax for specifying data structures. It has nothing to do with the actual encoding of the "bits on the wire". In fact, that is part of the reason for using ASN.1 for specifying data structures; you don't need to care about the encoding. It is ASN.1's related encoding rules such as BER (Basic Encoding Rules), DER (Distinguished Encoging Rules), and PER (Packed Encoding Rules) that specify how the data structures are encoded. I only work with BER/DER. It would be impossible to say much about anything in 2 bytes using those encoding rules since the first byte tells you what type of data is about to follow, and the next byte(s) tell you the length of the data. So you've used up at least 2 bytes before having said anything useful.
  • by isdnip ( 49656 ) on Tuesday August 07, 2001 @10:02PM (#2168282)
    I've done real quantitative studies on the topic, and quite frankly you got it wrong. Moore's Law (for CPU power) is far stronger than "Moore's Law for bandwidth". Bandwidth growth has been on the order of 30-40%/year, while CPU power has grown faster than that for at least two decades.

    ASN.1 is well known outside of the IETF fundamentalist crowd. With its PER (packed encoding rules), it is very efficient of bandwidth and not all that CPU intensive either. Nor is it difficult, if used correctly (and anything can be tough if used wrong). It's a simple tag-length-value notation which can recurse. The only reason the Internet doesn't use it more is the usual NIH.
  • by foo ( 143650 ) on Tuesday August 07, 2001 @10:08PM (#2168307)
    http://www.fokus.gmd.de/ovma/freeware/snacc/
  • by Anonymous Coward on Tuesday August 07, 2001 @11:14PM (#2168580)
    This is not true. The program creating the message and the program reading the message do not need exactly the same DTD. Anything extra in the message will be ignored when reading it, and fields can be made optional.
  • by reflective recursion ( 462464 ) on Tuesday August 07, 2001 @11:21PM (#2168598)
    Seriously, how much bandwidth do we lose to simple ACKs, NACKs, and packet headers? How often do networks really drop packets that we couldn't use UDP for web applications?
    UDP drops packets enough, that is for sure. The purpose of TCP is to be a _stable_ transport. UDP simply throws messages towards their destination and hopes they hit their target. Say an HTML document is sent via UDP. Say you get 1 packet, miss the 2nd and get the 3rd instead. How does your browser know packet 3 is _not_ packet 2. This also says nothing about the order of packets sent (with UDP packet 3 could arrive before 2 or 1). So then you begin to hack on a protocol that detects the correct order. Then you hack on another protocol that makes sure packets even arrive. Then you will have TCP all over again. :-)
    As for HTML and XML, we could cut ascii data by 20% if we just got rid of useless carriage returns, non-paragraph whitespace, tag quotation marks, HTML comments... just compare the source HTML for Yahoo with CNN.com... BIG difference.
    Ahh. We finally see that just learning HTML (or in general, web-oriented languages such as VB script and Javascript) does not make a good programmer. If you have never seen a VB program's source code.. well, don't. I don't mean to bash VB (or web) programmers though. The problem with HTML/XML is it is not compiled (like Java machine-independent). I believe this is more to do with the web outgrowing its purpose. It was never designed for graphics, let alone plug-ins, Javascript/Java, frames (should I really continue? :P ).

To the systems programmer, users and applications serve only to provide a test load.

Working...