Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Open Source

Dropbox Open Sources New Lossless Middle-Out Image Compression Algorithm (dropbox.com) 135

Dropbox announced on Thursday that it is releasing its image compression algorithm dubbed Lepton under an Apache open-source license on GitHub. Lepton, the company writes, can both compress and decompress files, and for the latter, it can work while streaming. Lepton offers a 22% savings reductions for existing JPEG images, and preserves the original file bit-for-bit perfectly. It compresses JPEG files at a rate of 5MB/s and decodes them back to the original bit at 15MB/s. The company says it has used Lepton to encode 16 billion images saved to Dropbox, and continues to utilize the technology to recode its older images. You can find more technical details here.
This discussion has been archived. No new comments can be posted.

Dropbox Open Sources New Lossless Middle-Out Image Compression Algorithm

Comments Filter:
  • It compresses JPEG files at a rate of 5MB/s and decodes them back to the original bit at 15MB/s

    Even if I cross-compile for my 2MHz TRS-80? Amazing!

    • by darkain ( 749283 ) on Thursday July 14, 2016 @11:49AM (#52510821) Homepage

      From TFA: "Lepton decode rate when decoding 10,000 images on an Intel Xeon E5 2650 v2 at 2.6GHz"

      • What resolution, bit depth, and JPEG encoding/compression methods did those images have? How compressible were those images with something generic like LZMA2?

        • by Bengie ( 1121981 )
          Good questions, but in lue of facts, compressibility usually goes up as bit depth and resolution goes up, and it seems cell phones are taking greater than 4k resolutions.
  • Where am I? (Score:4, Funny)

    by freeze128 ( 544774 ) on Thursday July 14, 2016 @11:46AM (#52510799)
    This headline sounds a lot like a press release from Pied Piper, the fictional company in the TV show "Silicon Valley".
    • beat me to the punch, "Middle out"? But the thing is, the tech isn't complete BS on that show, the terms they use are real and the application is actually possible, likely not to the extent of pied piper in the show though
      • by AmiMoJo ( 196126 )

        I think whoever wrote that was confused by the screen cap of Silicon Valley (TV show on HBO) in the article, which is of the fictional "Pied Piper" company and not of Dropbox.

        It's been known that you can compress JPEGs losslessly by about 20% for many years, because JPEG only uses run length encoding rather than say Huffman encoding after the DCT stage. In fact I seem to recall an app called StuffIt that could do this in the late 90s. Their improvements seem to be some kind of prediction to make the coding

        • by Fwipp ( 1473271 )

          From TFA:

          For those familiar with Season 1 of Silicon Valley, this is essentially a “middle-out” algorithm.

          • by imgod2u ( 812837 )

            Except it isn't. "Middle-out" is a fictional name up until the advent of the show. Nobody researching compression had a "middle-out" algorithm.

            Also, the Pied Piper algorithm offered lossless compression of just about anything at a ridiculously high rate (something like 10x what HVEC is capable of with no loss). They also had a distributed storage platform that used drive space of everyone's phone to store files.

          • by AmiMoJo ( 196126 )

            That's what I mean, the writer seems to that that was some kind of documentary and that "middle-out" is a real thing. It's just a meaningless phrase they came up with for the show.

        • JPEG does do Huffman coding, or less commonly arithmetic coding.
        • by miknix ( 1047580 )

          Why even use JPEG?? JPEG2000 has been out there for a while, professional photographers and digital cinema use it for a reason..

          • How about BPG [github.io]? Looks better than JPEG2000 to me.

          • It is because the idiots in JPEG 2000 committee did everything to keep people, especially web browser development teams away from that excellent format.

            Now 4K monitors and ultra resolution phones around, watching web developers struggle with 5-6 different files of same photo, I really feel pity. That was a solved problem, both multiple bandwidth& resolution and the compression rate.

            There is a reason we deal with JPEG files today, ask JP2 committee. Even MS stayed away from it fearing the patents.

            • by miknix ( 1047580 )

              Exactly! The patents are not just underlying JPEG2000, they affect everything from multi-resolution analysis to the algorithms of DWTs. I speculate it's the reason why MPEG stayed away from it, even though the DWT is clearly supperior to the DCT.
              Today it is not just the UHD resolutions that would benefit from JPEG2000. JPEG 2000 also supports arbitrary precision coding which is good for today's HDR which is starting to popup in consumer and cinema.
              Anyway, this is a lost cause because everybody is moving to

          • by AmiMoJo ( 196126 )

            JPEG2000 never really took off outside certain niches because the processing overhead was too high. WebM is a better general purpose option for web and JPEG is so universal nothing else has made any inroads.

      • likely not to the extent of pied piper in the show though

        Not "likely". Absolutely. In the show, they have a compression algorithm that compresses _ANY_ data some ridiculously high percentage.

        Real world example: Put data through compression.. then put the resulting compressed data through compression again... and so on and so on.. To get impossibly good compression...

    • by nyet ( 19118 )

      This headline sounds a lot like a press release from Pied Piper, the fictional company in the TV show "Silicon Valley".

      derp

    • Re: (Score:2, Funny)

      by Anonymous Coward

      I'm confused... is this a box, or is this a platform?

    • by DMJC ( 682799 )
      Actually this is the team that wrote the Pied Piper algorithm which featured on Slashdot a few months ago. A good friend of mine is the person who actually created the Algorithm. He was the lead developer on Vegastrike. Really great guy. It's great to see him achieving success in his career.
  • does any one have knowledge about how this compares to other compression algorithms? also wonder if they are releasing this because they have lepton2 or whatever now?

    • This is specifically for compressing JPEG (lossy) with an extra layer of lossless compression to bring file sizes down further. It would only be useful if you have a large collection of JPEG images to archive and not enough disk space. In my own quickie test:

      Source image was 2560x1440 TGA at 32MB

      PNG (lossless, level 9) took that down to 6,912KB
      WebP (lossless) took it down to 5,868KB
      JPEG (lossy, quality 100) took it down to 3,402KB
      JPEG (lossy, quality 95) took it down to 1,995KB

      They are claiming a 22% furth

    • Re:comparison (Score:4, Informative)

      by miknix ( 1047580 ) on Thursday July 14, 2016 @12:49PM (#52511307) Homepage

      They are basically just bringing the entropy coder from JPEG2000 into JPEG... Why the heck not just fully re-encode the images in lossless JPEG2000 instead? There is a good reason why the DWT was used instead of DCT on JPEG2000, it is because it yields higher coding efficiency. It is also why JPEG2000 is the standard format for digital cinema (yes, movies are coded intra-only with JPEG2000).

      • JPEG2000 suffered from the same problem JPEG initially did - it was slow. I remember downloading the first sample JPEG images in the early 1990s. An 800x600 image took about 20 seconds to decode on my PC back then. JPEG2000 had a similar problem, though it was asymmetric. Over 1 min to encode a 3504x2336 image from my DSLR, about 5-15 seconds to decode.

        JPEG didn't have any competitors, and the growth of the Internet and Web made smaller-size picture files very important in the coming years Couple th
  • by nine-times ( 778537 ) <nine.times@gmail.com> on Thursday July 14, 2016 @11:49AM (#52510823) Homepage
    Is this actually a "middle-out" compression, or is that just a joke? Do we know what the Weissman score is?
    • Is this actually a "middle-out" compression, or is that just a joke? Do we know what the Weissman score is?

      Meh. I'm just going to wait for Pied Piper to hit open beta. Their Weissman scores are unbelievable.

    • by xuvetyn ( 89257 )
      ah. beat me to it. =)
  • by tylersoze ( 789256 ) on Thursday July 14, 2016 @12:08PM (#52510981)

    It can both compress *and* decompress.

    • by Anonymous Coward

      Just the other day, I developed a slightly lossy compression algorithm with an infinite Weissman score and 100% compression.

      Still working on the decompression step.

      • Your awesome compression algorithm works so well, I can paste all of the decompressor's code into its post. Maybe its helpful. The code (without the " of course): ""

        All you need to do is to decompress it once manually.

    • According to the summary it can decode back to the original bit. Slashdot 2016.
    • It can both compress *and* decompress.

      That is actually very important. I know from first hand experience that compression can be much faster if later decompression is not a requirement.

  • I'm all for companies open-sourcing cool algorithms. But not a great choice on the name. There are already several products out there called 'Lepton'. There's a software CMS [lepton-cms.org], and also FLIR's thermal sensors [flir.com] are branded 'Lepton'. (Worth noting - Lepton IS an actual word [dictionary.com] so it probably won't qualify for Trademark protection. But an Apple Music vs. Apple Computer like scenario is not impossible to conceive.)

  • Huffman alternative (Score:2, Informative)

    by hsa ( 598343 )
    I worked as a part-time assistant in Data Structures and Algorithms course 10 years ago in Helsinki University of Technology.

    JPEG is a lossy compression algorithm. It does not preserve the image. It creates these blocks of image data and then compresses them using Huffmann encoding. Same encoding is used in zip-files.

    Dropbox's algorithm uses these same blocks JPEG algorithm produces (meaning, that the information is still lost in compression), but uses a clever way to compress them and ditches Huffmann enco
    • Encoding the image with the coefficients is not the lossy part. The lossy part is when you ditch the coefficients which contribute little to the image, and when you downsample the chroma.
      • by Anonymous Coward

        "Encoding the image into coefficients" is imprecise, but if you want to split JPEG encoding into two steps, the lossless Huffman coding and the part before that, then the first part is the lossy step in which the majority of the compression is achieved. The technical term for the conversion from the (in principle arbitrary precision) floating point coefficients to the more compact integer coefficients is "quantization". The quality factor controls the amount of information which is lost by setting the granu

      • Just a bout every step in JPEG quantization is lossy, even if you're using floating point DCT.

        If you're subsampling your chroma, you need to be shot.

        • by Anonymous Coward

          The primary use case for JPEG compression is storing digital photos. With few exceptions, the chroma information in digital photos is interpolated from Bayer pattern sensors, so the chroma information is naturally lower resolution than the luma information. The interpolated information is often reduced in the camera hardware, before the data is even written to main memory, where it is stored in a subsampled YUV format. You would first have to interpolate it in order to expensively store it at "full resoluti

    • You can compress the rendered area of your posts by avoiding the unnecessary use of pre-formatted text. :)
    • Pfft.. too little, too late. JPEG is "good enough" and I don't want a huge clusterfuck of incompatibility problems with my libraries.

      In terms of widespread adoption, I think you're right, Joe's Image Viewer is unlikely to ever come with Lepton support. But I wouldn't dismiss this so quickly, as large sites might force the issue into the browser space.

      Take Facebook as an example, think of the trillions of photos they store (they claim 2 billion are uploaded each day). Facebook archives older, infrequently-accessed photos to Blu-Ray and has an army of jukeboxes ready to swap in discs when someone actually tries to load that family reunion

  • We've been here before. JPEG2000, webp, BPG, JPEG XR. There are many formats that are superior to JPEG. And look - none of them caught on!

    Why? Because JPEG, though far from the best modern algorithms could offer, is still 'good enough' for most purposes. It's also supported by every web browser, photo viewer, image editor, mobile phone, camera, digital picture frame, slideshow maker and every other thing that might need to process an image. A new format, no matter how superior, cannot offer the same ubiquit

    • Re:Again. (Score:4, Insightful)

      by Anonymous Coward on Thursday July 14, 2016 @12:50PM (#52511325)

      This isn't a "better than JPEG" format. It's a "store existing JPEG files your users upload & use more efficiently" format. Flickr, for instance, could theoretically save 22% of its disk space using this.

    • Re:Again. (Score:5, Insightful)

      by Ramze ( 640788 ) on Thursday July 14, 2016 @12:59PM (#52511387)

      It's not a file format, it's a compression algorithm that happens at the data storage level. This is similar to compressing a hard drive -- the files are individually compressed, but the file formats are the same, and the OS handles the compression/decompression seamlessly so that the applications don't even know they're accessing compressed versions of the file formats they normally use.

      You can keep all your JPEGs, and with the open-source license, compress the contents of a drive or partition with this algorithm and save maybe 20% or so of the space the JPEG files took up. Not worth it for most people but photographers and image sites might save a lot of money using this.

      • by Kjella ( 173770 )

        Not worth it for most people but photographers and image sites might save a lot of money using this.

        I would think most serious photographers keep the RAW files which are much bigger and will dominate their storage. And even MP monsters only produce ~20MB jpegs so ~200,000 photos on a $99 4TB drive. Pretty sure you won't bother with this unless you're Dropbox, Facebook or some other big image site with many, many millions of photos.

      • And it's not really new, there is a similar program (actually, it's a mix of algorithm and heuristics) called packJPG [encode.ru], it also achieves similar results.
    • The point is that this can be implemented server side to save storage of existing JPEGs. This is a better way to store existing images, not a better way to compress images.

    • by swb ( 14022 )

      I wonder if there's a way to come up with a format that decoders would process as a JPEG but only containing a preview-quality image but have the rest of the file be some higher quality version of the image in a more advanced format for a format-aware decoder. And do it all in a total file size better than high quality JPEG.

      You'd get backwards compatibility (albeit with degraded quality) but higher quality than existing JPEG.

      Although with storage continually getting better and cheaper, you have to work mir

  • So a program named literally "Lepton under an Apache" that happens to also, confusingly, be an open source license (*and* a program)?

    Okaaaaaaay.... ...Took me like a minute to figure out it was saying

    "...dubbed 'Lepton,' under an Apache open-source license..."

  • Dropbox' software is called "lepton". There is an image-processing library called Leptonica [leptonica.org] — could someone comment on the relationship, if any?

  • So this means instead of getting 5 GB free storage, I should get 22% more if I'm storing JPEGs, so I get 6.1 GB free storage now? ;)

  • Much of their compression comes from they dont use full 32 bit floats or integers to store the discrete cosine transform coefficients, but variable bit length numbers which can be squished more tightly. I didnt read the paper deep enough to study how efficient this bit hacking is machine operations. There might be few clever tricks there. Bit hacking was more common in the early days of computers when core memory was very expensive. I recall Woz had some clever way of compressing color and shape graphics i

Never test for an error condition you don't know how to handle. -- Steinbach

Working...