Dropbox Open Sources New Lossless Middle-Out Image Compression Algorithm (dropbox.com) 135
Dropbox announced on Thursday that it is releasing its image compression algorithm dubbed Lepton under an Apache open-source license on GitHub. Lepton, the company writes, can both compress and decompress files, and for the latter, it can work while streaming. Lepton offers a 22% savings reductions for existing JPEG images, and preserves the original file bit-for-bit perfectly. It compresses JPEG files at a rate of 5MB/s and decodes them back to the original bit at 15MB/s. The company says it has used Lepton to encode 16 billion images saved to Dropbox, and continues to utilize the technology to recode its older images. You can find more technical details here.
Regardless of CPU clock speed? (Score:2, Troll)
Even if I cross-compile for my 2MHz TRS-80? Amazing!
Re:Regardless of CPU clock speed? (Score:5, Informative)
From TFA: "Lepton decode rate when decoding 10,000 images on an Intel Xeon E5 2650 v2 at 2.6GHz"
Re: (Score:3)
What resolution, bit depth, and JPEG encoding/compression methods did those images have? How compressible were those images with something generic like LZMA2?
Re: (Score:2)
Re: (Score:2)
Re: (Score:1)
"Random things that people upload to dropbox" I assume. This would be why they quoted a number based on a large number of images, a small number of images would be more susceptible to bias.
Re:Regardless of CPU clock speed? (Score:5, Informative)
PR? The code is on github, and imho a very nice accessible explanation of their algorithm is in the linked article. They developed some neat software to save money by essentially modernizing JPEG to compress beyond the 8x8 blocks it was designed to use and, having done that, are now letting other people use it too. What is with your crabby, paranoid attitude? Instead of being an asshole, you could just, you know, build the code yourself and experiment with it, rather than sneering at a gift horse. This is exactly the use case for open source software.
Although I would prefer if they explained the sampling methodology for their images, they do present a few simple scatterplots of (de-)compression performance as a function of original JPEG file size. It's not as in-depth as xiph.org foundation's stuff, but it's a hell of a lot more than a PR piece.
Re: (Score:1)
Re: (Score:1)
For starters, he's a liberal cuck.
Re: (Score:2)
Indeed. Sometimes it takes two fucking Pi boards and an Arduino!
Where am I? (Score:4, Funny)
Re: (Score:2)
Re: (Score:3)
I think whoever wrote that was confused by the screen cap of Silicon Valley (TV show on HBO) in the article, which is of the fictional "Pied Piper" company and not of Dropbox.
It's been known that you can compress JPEGs losslessly by about 20% for many years, because JPEG only uses run length encoding rather than say Huffman encoding after the DCT stage. In fact I seem to recall an app called StuffIt that could do this in the late 90s. Their improvements seem to be some kind of prediction to make the coding
Re: (Score:2)
From TFA:
For those familiar with Season 1 of Silicon Valley, this is essentially a “middle-out” algorithm.
Re: (Score:2)
Except it isn't. "Middle-out" is a fictional name up until the advent of the show. Nobody researching compression had a "middle-out" algorithm.
Also, the Pied Piper algorithm offered lossless compression of just about anything at a ridiculously high rate (something like 10x what HVEC is capable of with no loss). They also had a distributed storage platform that used drive space of everyone's phone to store files.
Re: (Score:2)
That's what I mean, the writer seems to that that was some kind of documentary and that "middle-out" is a real thing. It's just a meaningless phrase they came up with for the show.
Re: (Score:2)
Re: (Score:2)
Why even use JPEG?? JPEG2000 has been out there for a while, professional photographers and digital cinema use it for a reason..
Re: (Score:3)
How about BPG [github.io]? Looks better than JPEG2000 to me.
Evil patents (Score:2)
It is because the idiots in JPEG 2000 committee did everything to keep people, especially web browser development teams away from that excellent format.
Now 4K monitors and ultra resolution phones around, watching web developers struggle with 5-6 different files of same photo, I really feel pity. That was a solved problem, both multiple bandwidth& resolution and the compression rate.
There is a reason we deal with JPEG files today, ask JP2 committee. Even MS stayed away from it fearing the patents.
Re: (Score:2)
Exactly! The patents are not just underlying JPEG2000, they affect everything from multi-resolution analysis to the algorithms of DWTs. I speculate it's the reason why MPEG stayed away from it, even though the DWT is clearly supperior to the DCT.
Today it is not just the UHD resolutions that would benefit from JPEG2000. JPEG 2000 also supports arbitrary precision coding which is good for today's HDR which is starting to popup in consumer and cinema.
Anyway, this is a lost cause because everybody is moving to
Re: (Score:2)
JPEG2000 never really took off outside certain niches because the processing overhead was too high. WebM is a better general purpose option for web and JPEG is so universal nothing else has made any inroads.
Re: (Score:2)
Why even use JPEG?? JPEG2000 has been out there for a while, professional photographers and digital cinema use it for a reason..
My DSLR spits out JPGs... it could spit out RAW as well, but then I need to do development of it is say, Canon Digital Photo Professional, which well, spits out JPG.
But if you use Darktable [darktable.org] then you can also output JPEG200, PNG, OpenEXR, TIFF, and a few more file formats when developing your RAW photo.
Re: (Score:2)
Re: (Score:2)
Not "likely". Absolutely. In the show, they have a compression algorithm that compresses _ANY_ data some ridiculously high percentage.
Real world example: Put data through compression.. then put the resulting compressed data through compression again... and so on and so on.. To get impossibly good compression...
Re: (Score:3)
This headline sounds a lot like a press release from Pied Piper, the fictional company in the TV show "Silicon Valley".
derp
Re: (Score:2, Funny)
I'm confused... is this a box, or is this a platform?
Re:Where am I? (Score:4, Funny)
Re: (Score:2)
Re: (Score:3)
After reading their blurb, it looks like the middle-out thing was a bit of a joke Their use of the term 'middle-out' is not unreasonable but refers to something much more specific, and less fundamental, than what seemed to be depicted in the TV show. Their 'middle' is the just the place where two squares of the image meet.
Re: About time (Score:1)
comparison (Score:2)
does any one have knowledge about how this compares to other compression algorithms? also wonder if they are releasing this because they have lepton2 or whatever now?
Re: (Score:1)
They're releasing it because it has no commercial value. Probably costs them more in energy doing all the compression and decompression than it would to just put more storage in their datacenters. Nice technically, but the niche of useful applications is probably pretty small.
That's a very valid point; what's the cost in cpu-power versus storage costs?
Now, the issue is that storage is permanent, in the sense that you're using your disk/SAN/tape storage space with the file. Compression happens only once, the quicker decompression only happens when someone accesses it. So the 22% storage savings of JPGs across TBs may be worthwhile.
It's not totally clear how much of their space is being used up by JPGs? Also tiered storage may have been an option? Generic compression using alr
Re: (Score:2)
This is specifically for compressing JPEG (lossy) with an extra layer of lossless compression to bring file sizes down further. It would only be useful if you have a large collection of JPEG images to archive and not enough disk space. In my own quickie test:
Source image was 2560x1440 TGA at 32MB
PNG (lossless, level 9) took that down to 6,912KB
WebP (lossless) took it down to 5,868KB
JPEG (lossy, quality 100) took it down to 3,402KB
JPEG (lossy, quality 95) took it down to 1,995KB
They are claiming a 22% furth
Re:comparison (Score:4, Informative)
They are basically just bringing the entropy coder from JPEG2000 into JPEG... Why the heck not just fully re-encode the images in lossless JPEG2000 instead? There is a good reason why the DWT was used instead of DCT on JPEG2000, it is because it yields higher coding efficiency. It is also why JPEG2000 is the standard format for digital cinema (yes, movies are coded intra-only with JPEG2000).
Re: (Score:3)
JPEG didn't have any competitors, and the growth of the Internet and Web made smaller-size picture files very important in the coming years Couple th
Re: (Score:2)
The article you point out is a bit misinformed, there are several papers out there specifying how to do (multi-resolution) motion-compensation in a lifting scheme; the implementation is not more complex. From my point of view, the main reason why the DWT is not attractive and only remains a topic of interest in the Academia is because all the patents surrounding it.
Middle-out? (Score:3)
Re: (Score:2)
Is this actually a "middle-out" compression, or is that just a joke? Do we know what the Weissman score is?
Meh. I'm just going to wait for Pied Piper to hit open beta. Their Weissman scores are unbelievable.
Re: (Score:1)
Re: Open Sources is NOT TWO WORDS! (Score:2)
It is two words, but most people apply middle-out compression to make it one word.
Wow (Score:3)
It can both compress *and* decompress.
Re: Wow (Score:1)
Just the other day, I developed a slightly lossy compression algorithm with an infinite Weissman score and 100% compression.
Still working on the decompression step.
Re: (Score:2)
Your awesome compression algorithm works so well, I can paste all of the decompressor's code into its post. Maybe its helpful. The code (without the " of course): ""
All you need to do is to decompress it once manually.
Re: (Score:2)
Re: (Score:2)
It can both compress *and* decompress.
That is actually very important. I know from first hand experience that compression can be much faster if later decompression is not a requirement.
Great Name... Everyone is using it. (Score:2)
I'm all for companies open-sourcing cool algorithms. But not a great choice on the name. There are already several products out there called 'Lepton'. There's a software CMS [lepton-cms.org], and also FLIR's thermal sensors [flir.com] are branded 'Lepton'. (Worth noting - Lepton IS an actual word [dictionary.com] so it probably won't qualify for Trademark protection. But an Apple Music vs. Apple Computer like scenario is not impossible to conceive.)
Huffman alternative (Score:2, Informative)
JPEG is a lossy compression algorithm. It does not preserve the image. It creates these blocks of image data and then compresses them using Huffmann encoding. Same encoding is used in zip-files.
Dropbox's algorithm uses these same blocks JPEG algorithm produces (meaning, that the information is still lost in compression), but uses a clever way to compress them and ditches Huffmann enco
Re: (Score:3)
Re: (Score:1)
"Encoding the image into coefficients" is imprecise, but if you want to split JPEG encoding into two steps, the lossless Huffman coding and the part before that, then the first part is the lossy step in which the majority of the compression is achieved. The technical term for the conversion from the (in principle arbitrary precision) floating point coefficients to the more compact integer coefficients is "quantization". The quality factor controls the amount of information which is lost by setting the granu
Re: (Score:2)
Just a bout every step in JPEG quantization is lossy, even if you're using floating point DCT.
If you're subsampling your chroma, you need to be shot.
Re: (Score:1)
The primary use case for JPEG compression is storing digital photos. With few exceptions, the chroma information in digital photos is interpolated from Bayer pattern sensors, so the chroma information is naturally lower resolution than the luma information. The interpolated information is often reduced in the camera hardware, before the data is even written to main memory, where it is stored in a subsampled YUV format. You would first have to interpolate it in order to expensively store it at "full resoluti
Re: (Score:3, Insightful)
So you are an idiot. If you run a JPEG through Lepton the ORIGINAL file (from Lepton's point of view) is the JPEG. Not the Nikon raw file which it has now knowledge of.
Re:Huffman alternative (Score:5, Informative)
This isn't about restoring a JPEG file back into its original RAW format. The information lost from converting RAW to JPEG is gone. There is no way to get that back.
This is about storing JPEG files more efficiently. DropBox is in the business of providing cloud storage, and it is in their best interest to keep their costs as low as possible. The more they can compress data for their customers, the more efficiently they use their infrastructure. Some files such as text documents are easy to compress. Some files such as JPEG files are difficult to compress, especially with lossless algorithms.
For DropBox, this allows them to store the LEP representation of a JPEG file instead of the actual JPEG file. This saves them approximately 22% of their storage needs. They can then decompress it on the fly whenever a user tries to read the original JPEG file, essentially trading savings in storage costs for a bit of extra CPU demand. As long as the compression is lossless and the user sees acceptable performance, there is no user impact.
Depending on the cost of extra CPU cycles vs. the cost of reduced storage, and the relative mix of JPEG files vs. other data files, this could save DropBox quite a bit of money.
Re: (Score:2, Interesting)
If they're smart (and they are) the decompression will happen on the user's computer, in the web browser/native client.
Making it (almost) a free lunch for dropbox.
Re: (Score:2)
Or more likely they'll build it into their clients and do the compression on the user's side, saving them on both disk space and bandwidth.
Re: (Score:3)
Depending on the cost of extra CPU cycles vs. the cost of reduced storage, and the relative mix of JPEG files vs. other data files, this could save DropBox quite a bit of money.
Better yet, do it in the client at no CPU cycle cost to Dropbox, and also reducing data transport. Dropbox controls the desktop, mobile, and web clients, so this would be easy to do, and could revert to server-side translation from LEP to JPG for e.g. API clients etc.
Re: (Score:2)
This saves them approximately 22% of their storage needs.
Correction: It saves them 22% of the storage taken up by jpegs.
Re: (Score:2)
This isn't about restoring a JPEG file back into its original RAW format.
I know what it is really about, thanks. What I pointed out is that "bit-for-bit perfectly" of "the original file" is nonsense and is just marketing hype.
ANY lossless compression will return "the original file" "bit-for-bit perfectly" when "the original file" is considered to be what the lossless compressor starts with. That's a tautology. It's a useless statement. When someone says the result of their lossless compression/decompression is a "bit-for-bit perfect" copy of "the original file", it is reasonab
Re: (Score:1)
Assume that the engineers behind this aren't morons. Failing that, read the article. For every newly compressed image, Dropbox does a decompression and a bit-for-bit comparison w
Re:Huffman alternative (Score:5, Insightful)
Look, they clearly state that the operate at the level of JPEG-files. So, where is the confusion coming from? They are analyzing JPEG files and using features of that format to compress the already compressed files further.
Which I, honestly, find very impressive.
The reproduce JPEG files in a bit-by-bit faithful fashion. And the have tested in on 16 million (or was it billion) files where it worked without problems plus they don't replace user files unless they have checked that it decodes correctly. I presume that the process is actually transparent to the Dropbox user.
I don't see the problem that you have with this, sorry.
Good work lads!
Re: (Score:2)
Which I, honestly, find very impressive.
Not to belittle their achievement, but what do you find impressive with someone beating a compression algorithm that is 23 years old? In terms of image storage JPEG is an old hat beaten by many in absolute terms.
JPEG also screws the image quite a lot (but does so in an eye pleasing way) which certainly leads to better lossless compression after than on the original image. Take a look at the blue channel of a JPEG image for instance and you'll see why that channel in particular would be trivial to get good l
Re: (Score:2)
First, the claim that it reproduces the original file "bit-for-bit perfectly".
By "original file," they mean a JPEG.
Re: (Score:2)
Re: (Score:2)
Pfft.. too little, too late. JPEG is "good enough" and I don't want a huge clusterfuck of incompatibility problems with my libraries.
In terms of widespread adoption, I think you're right, Joe's Image Viewer is unlikely to ever come with Lepton support. But I wouldn't dismiss this so quickly, as large sites might force the issue into the browser space.
Take Facebook as an example, think of the trillions of photos they store (they claim 2 billion are uploaded each day). Facebook archives older, infrequently-accessed photos to Blu-Ray and has an army of jukeboxes ready to swap in discs when someone actually tries to load that family reunion
Again. (Score:2)
We've been here before. JPEG2000, webp, BPG, JPEG XR. There are many formats that are superior to JPEG. And look - none of them caught on!
Why? Because JPEG, though far from the best modern algorithms could offer, is still 'good enough' for most purposes. It's also supported by every web browser, photo viewer, image editor, mobile phone, camera, digital picture frame, slideshow maker and every other thing that might need to process an image. A new format, no matter how superior, cannot offer the same ubiquit
Re: (Score:3)
Re:Again. (Score:4, Insightful)
This isn't a "better than JPEG" format. It's a "store existing JPEG files your users upload & use more efficiently" format. Flickr, for instance, could theoretically save 22% of its disk space using this.
Re:Again. (Score:5, Insightful)
It's not a file format, it's a compression algorithm that happens at the data storage level. This is similar to compressing a hard drive -- the files are individually compressed, but the file formats are the same, and the OS handles the compression/decompression seamlessly so that the applications don't even know they're accessing compressed versions of the file formats they normally use.
You can keep all your JPEGs, and with the open-source license, compress the contents of a drive or partition with this algorithm and save maybe 20% or so of the space the JPEG files took up. Not worth it for most people but photographers and image sites might save a lot of money using this.
Re: (Score:2)
Not worth it for most people but photographers and image sites might save a lot of money using this.
I would think most serious photographers keep the RAW files which are much bigger and will dominate their storage. And even MP monsters only produce ~20MB jpegs so ~200,000 photos on a $99 4TB drive. Pretty sure you won't bother with this unless you're Dropbox, Facebook or some other big image site with many, many millions of photos.
Re: (Score:1)
Re: (Score:2)
The point is that this can be implemented server side to save storage of existing JPEGs. This is a better way to store existing images, not a better way to compress images.
Re: (Score:2)
I wonder if there's a way to come up with a format that decoders would process as a JPEG but only containing a preview-quality image but have the rest of the file be some higher quality version of the image in a more advanced format for a format-aware decoder. And do it all in a total file size better than high quality JPEG.
You'd get backwards compatibility (albeit with degraded quality) but higher quality than existing JPEG.
Although with storage continually getting better and cheaper, you have to work mir
Lepton under an Apache (Score:2)
So a program named literally "Lepton under an Apache" that happens to also, confusingly, be an open source license (*and* a program)?
Okaaaaaaay.... ...Took me like a minute to figure out it was saying
"...dubbed 'Lepton,' under an Apache open-source license..."
Lepton vs. Leptonica (Score:1)
Dropbox' software is called "lepton". There is an image-processing library called Leptonica [leptonica.org] — could someone comment on the relationship, if any?
So... (Score:2)
So this means instead of getting 5 GB free storage, I should get 22% more if I'm storing JPEGs, so I get 6.1 GB free storage now? ;)
abandon standard format numbers (Score:2)