Forgot your password?
typodupeerror
Media Bug

Technical Objections To the Ogg Container Format 370

Posted by timothy
from the do-these-objections-matter-on-modern-hardware? dept.
E1ven writes "The Ogg container format is being promoted by the Xiph Foundation for use with its Vorbis and Theora codecs. Unfortunately, a number of technical shortcomings in the format render it ill-suited to most, if not all, use cases. This article examines the most severe of these flaws."
This discussion has been archived. No new comments can be posted.

Technical Objections To the Ogg Container Format

Comments Filter:
  • by godrik (1287354) on Wednesday March 03, 2010 @02:28PM (#31349144)

    I don't see any comment and the website is already down. gg /.

  • by Nemyst (1383049) on Wednesday March 03, 2010 @02:35PM (#31349232) Homepage
    Must be a hardware bug.
  • Re:Just complaining (Score:5, Interesting)

    by TheRaven64 (641858) on Wednesday March 03, 2010 @02:45PM (#31349342) Journal
    Wow, did you copy that criticism of TFA from the last section, where he says:

    More commonly, the Ogg proponents will respond with hand-waving arguments best summarised as Ogg isn’t bad, it’s just different. My reply to this assertion is twofold:

    • Being too different is bad. We live in a world where multimedia files come in many varieties, and a decent media player will need to handle the majority of them. Fortunately, most multimedia file formats share some basic traits, and they can easily be processed in the same general framework, the specifics being taken care of at the input stage. A format deviating too far from the standard model becomes problematic.
    • Ogg is bad. When every angle of examination reveals serious flaws, bad is the only fitting description.

    And he's right. Unless the technical details of Ogg are not as he represented them, the format is stupid. I've not looked at Ogg in detail, but I have written multimedia apps and his complaints are right on the mark. Even if most of them are untrue, the point about timestamps would have been a show stopper. There is absolutely no excuse for not encoding timestamps as rationals in a fixed format in the container. Without that, you are just inviting synchronisation problems between audio and video CODEC formats that aren't explicitly designed to work together.

    Which may, of course, be intentional. Vorbis and Theora are designed to work together. But if you have a Theora video stream with MP3 or AAC audio, what happens? An H.264 video stream with Vorbis? Obviously the solution is to just use Xiph formats in the Xiph container. And that's fine. I don't have a problem with Ogg as a container for Xiph formats (other than the latency issues he mentions), but claiming that it is a general purpose format is misleading.

    Ogg is like XML. It defines just enough to let you define something useful, but it's not useful by itself.

  • The great thing about Matroska is that it supports (or at least can support) absolutely everything.
    The main drawback of Matroska is that it supports (or at least can support) absolutely everything.

    Matroska is a great container format, but unless you have a program like mplayer or vlc you can't guarantee that a Matroska file is going to be playable on your system. You can't reasonably expect browser maker to standardise on Matroska if it will mean having to include 30+ different codecs in their software, which from a practical standpoint it will. The unfortunate reality is that most of the world's population still doesn't have access to a comprehensive library of software like apt, and while our current software IP regime reigns, they never will.

  • by aflag (941367) on Wednesday March 03, 2010 @02:59PM (#31349522)
    However, do you disagree? Why? I hope it's not because of this world you talk about.
  • by sylvandb (308927) on Wednesday March 03, 2010 @03:08PM (#31349642) Homepage Journal

    There's at least one obvious flaw in his reasoning. He talks about removing the 8-bit version field in the header and replacing that with a 1-bit portion of the flags field to distinguish it from a hypothetical future version. That only works if one assumes there will only *ever* be two versions (v1 and v2).

    No, the flaw is yours. The 1 bit merely says "this is not the original version" and anyone that only knows the original version just stops there. Anyone that knows the 2nd version has enough smarts to look at the 2nd version bit (or field).

  • by drtsystems (775462) on Wednesday March 03, 2010 @03:11PM (#31349682)

    This argument is what will kill HTML5 and ensure a new era of the reign of flash, silverlight, etc. The choices are not h264 or theora. Its h.264 through an open html5 spec, or h264 through silverlight and flash. All major operating systems have support for h264 built in as it is (not to mention all the portable devices with hardware acceleration for it, including now many netbooks). The whole debate is stupid, firefox needs to just use the operating system's built in codecs to play h264. Problem solved.

  • In the long run (Score:5, Interesting)

    by istartedi (132515) on Wednesday March 03, 2010 @03:44PM (#31350154) Journal

    "In the long run, all file formats become programming languages."

    From this I draw a number of conclusions, the first being that when designing a format you need to bring a "language sensibility" to it. If you don't, it's only a question of *when*, not if, your format will become a poorly designed language. OK, "language" may not be the right word. I'd also accept, "byte code" or "executable file", but it's the same idea. JMHO.

  • Re:Just complaining (Score:3, Interesting)

    by evilviper (135110) on Wednesday March 03, 2010 @03:58PM (#31350306) Journal

    "I would have done it diffferently" does not mean that the format is bad.

    Every open source multimedia developer outside of Xiph.org, who has had to do anything with Ogg, will tell you that Ogg is a flaming pile of crap. This notably includes Moritz Bunkus, the author of Ogmtools. Quotes of such are easy to find.

    For a real challenge, just try to find ANYONE saying Ogg is a well-deigned and well thought-out container format...

  • by Hurricane78 (562437) <deleted@slashd o t .org> on Wednesday March 03, 2010 @04:36PM (#31350796)

    Besides it being EBML (a binary and efficient kind of XML), I’ve yet to see a feature that it can’t do. Even a complete 3D TV series with multiple perspectives, languages, subtitles, additional content, hull cover... streamed over the net in one file? No problem.

    Also, it’s already the format of choice for HD video and multichannel audio format rips on the net.

    A competitor would be nice. Unfortunately, OGG can’t hold a candle to it. But if they manage to catch up, they will be very welcome.

  • by Korin43 (881732) on Wednesday March 03, 2010 @04:46PM (#31350928) Homepage
    As opposed to a standard that you can't standardize on (because only Google can afford the licensing fees). They're pushing back when the fees start, but that doesn't change the fact that small businesses and individual people would have to be insane to start a website using H.264.
  • by bbn (172659) <baldur.norddahl@gmail.com> on Wednesday March 03, 2010 @05:03PM (#31351124)

    Random access
    You've got somewhat of a point there, maybe somebody will find a solution for that. The issues around indexing however is that seeking within a stream is possible. HTTP servers allow you to start/stop downloading from different points in time and QuickTime is one of those formats that uses this feature.

    He is trying too hard make an issue out of it. Read this:

    In a large file (sizes upwards of 10 GB are common), 50 seeks might be required to find the correct position.

    A typical hard drive has an average seek time of roughly 10 ms, giving a total time for the seek operation of around 500 ms, an annoyingly long time.

    Now being a binary search, each seek halves the size of the search domain. The minimum size a harddrive can read is 512 bytes, so there will be no further seeks after we find a search domain less than that.

    So 50 seeks is only needed with a filesize of 512 * 2^50 = 576460 terabytes.

    For a 10 GB file you would need no more than half that many seeks (25). Take into account that most filesystems do not distribute the file at random over the whole harddrive, which makes the average seek time for the file much smaller than 10 ms. We are probably looking at no more than 100 ms to do this random access search on a 10 GB file.

    100 ms might still be enough to complain about. But he is not making his point stronger by exaggerating the problem.

  • by pslam (97660) on Wednesday March 03, 2010 @05:29PM (#31351450) Homepage Journal

    Your objective is to Armchair engineers? Ok, well I'm not an armchair engineer. I've written my own Ogg/Vorbis decoder from scratch in the past (here [mooo.com]). I've worked on codecs for about 10 years. I'm a fan of Vorbis and Theora, but Ogg needs to die a horrible death.

    Ogg was by far the most bug-inducing part of the code. It's just AWFUL. It's ill-designed. It's incredibly complicated. It's inherently inefficient (copy sometimes required).

    In short, it's the worst container format I've used in any serious application, and I've used pretty much all the common ones.

    The irony of what you're saying, is that actually Ogg is what you'd end up with if an armchair engineer designed an audio codec container from scratch.

  • by KonoWatakushi (910213) on Wednesday March 03, 2010 @05:56PM (#31351826)

    NUT is another alternative, which is open, simple, and well designed. Along with Matroska, it is also capable of containing Ogg Theora and Vorbis streams, so there is really is no good reason to use the Ogg container anymore. The author of the article is correct--the Ogg container is an awful format.

    The main complaints about Matroska are two-fold. One, the EMBL encoding is overly complicated. It requires a considerable amount of code to parse, and also imposes an unnecessary degree of overhead. The second is a much more serious problem: a Matroska file can only contain one timebase. Thus, in order to mux streams with different timebases, approximation is required. To accurately represent the converted timebases, it is necessary to use a much finer granularity, and then you also lose the exact timestamps.

    The NUT specification and code is available from svn://svn.mplayerhq.hu/nut, and the (de)muxers are included in MPlayer/FFmpeg, VLC, and probably elsewhere.

  • by Anonymous Coward on Wednesday March 03, 2010 @06:05PM (#31351916)

    Overall, most experts would agree that Theora is still a good codec but it seems like the latest talk is all about Dirac: http://en.wikipedia.org/wiki/Dirac_%28codec%29 and http://diracvideo.org which is a very strong contender. It has suddenly gained backing from a number of the major corporations who were previously in favor of H.264... This is good news since Dirac offers much better quality than either of the other codecs, is royalty-free, and released under either GPL2, LGPL2, or MPL.

  • by pslam (97660) on Wednesday March 03, 2010 @06:19PM (#31352096) Homepage Journal

    I'll do some analysis for you:

    Generalities/codec mapping

    The complaint is that there's no up-front header declaring all the streams contained. This is actually absurd - in theory you need to scan the entire file in case someone's just concatenated a video file with an audio file. This was, also absurdly, one of the aims of the Ogg container spec: concatenation. It's awesome to ask implementations to do this.

    Overhead

    One of Ogg's aims was to try to be less than 1% of the total stream space. It does achieve that, but the 'lacing values' end up looking pretty stupid for anything with large packets. It's like the article says: you end up with long strings of '255' summing up to 32-64KB packets, and hey just for extra complexity's sake, you'll have to split them across multiple not-quite-64KB pages. And then figure out where in that mess you're supposed to stick a timestamp: and here's a hint, you first page in that sequence has timestamp 0xffffffff which is nice if you randomly seeked to that place to find a position. God, what a mess that is to implement.

    Then there's decode CPU overhead: the above basically means you end up copying the bitstream, which is a significant few percent overhead when you're talking about video.

    Latency

    You didn't understand his point. The latency is inherent in Ogg due to the large pages (not packets) required to reduce its size overhead, and in the position of the CRC (at the front of the page rather than the end). Reducing the page size makes the page headers start taking significant percentages of size if it's a low bit rate stream, e.g internet audio.

    Random Access

    Try pre-caching a 2GB video file. Or try pre-caching a 2GB video stream coming off the internet where the other end of the pipe is the other side of the world. Random access in these two realistic cases (if you'll admit that) requires a look up table, and it's precisely why many containers DO.

    Complexity

    The lacing values crossing pages, packets crossing pages, position of CRC, position of timestamp between packets/pages especially when cross-page, timestamps between logical streams (elementary streams), and other oddities/idiocies all ADD UP to make it a bloody mess to deal with. You end up just making copies of packets out of the stream, which is inefficient. In fact, that's exactly what the official Xiph codecs do: they make ugly copies. On real world MP3 players (and I've worked on some) that accounts for about 10% of your battery play time right there. I kid you not.

    What this guy is expressing is what everyone who's worked on the Ogg container format itself has found out: it's just BAD at EVERYTHING. It needs replacing with something that doesn't suck, and there are free/open alternatives around. Maybe Vorbis 2 should switch container.

  • Re:Just complaining (Score:3, Interesting)

    by evilviper (135110) on Wednesday March 03, 2010 @07:23PM (#31352738) Journal

    Are we to believe that they have no clue about container formats?

    YES! From the same link:
    Ogg was designed to stream audio, specifically Vorbis. Ogg was not designed to handle video, or any other type of audio.

    Ogg is so tightly coupled to Vorbis, and has only the minimal features required for streaming. It's shortcomings become clear when you try to do ANYTHING ELSE. Even just playing a local file, you find seeking horrible, no way to do a accurate progress-bar, etc, etc.

    And when you try to stick anything else in an Ogg, forget it... Even Theora. It's a mess. Wonder why it took so many years after VP3 went open source before Theora-1.0 was released? A big chunk of time was spent squeezing it into an Ogg. Meanwhile, every other container had no problem holding VP3 video.

  • by horza (87255) on Wednesday March 03, 2010 @07:41PM (#31352900) Homepage

    "That is why MP3 stomped Vorbis and FLAC, because it was easy"

    Because it was first, and gained momentum. At the end of the day, MP3 gained popularity because of pirates and they aren't exactly known for caring about patents. Those that were ripping from their personal collection often chose FLAC or Vorbis. A codec is just a codec, there is nothing more 'easy' about any one of them.

    "can't say about Vorbis as I've honestly never come across a Vorbis player"

    Any Samsung player, but then if Vorbis became popular it would only take a firmware update for every player to support it.

    "The average Joe really doesn't give a shit about "free as in freedom" all he gives a shit about is does it work and is it easy."

    I'm sorry, why do I care about average Joe? If he is prepared to fork out hundreds for XP, then again for Vista, then again for Win7, etc, why would I care if he forks out for yet another bunch of proprietary rubbish? However, on the distribution side things are different. An Internet 'video tax' is unacceptable.

    Phillip.

  • by pslam (97660) on Wednesday March 03, 2010 @07:43PM (#31352912) Homepage Journal

    What the fuck are you talking about? There is absolutely no "latency" harm caused by the CRC, at least not on any hardware actually able to decode the formats much less encode. If performing the CRC on decode is so burdensome, you can stop checking it once you obtain sync and only check it if you obviously lose sync.

    There may be, for example, 64KB pages, containing many packets. None of the packets can be decoded until the entire 64KB page is received and its CRC checked. This may sound small, but for 32-64kbit stream, that's 10 seconds of latency right there. Alternatively, you can have 1 page per packet, but on 32-64kbit streams you end up with about 5-10% overhead from the container. It is a REAL problem.

    So 5 times the decoding complexity, correctly masking out the right bit, just to save 7 bits out of a half million. Yea, I'll get right on it.

    There is a version field on every page header, and it's 32 bits. It's a tiny waste, but it's still a waste. It's not so tiny a waste on the above mentioned low-latency, low-bitrate streams.

    Ugh! so the amount of data that you must read in order to obtain a framing lock is then infinite?

    Yes. Why not? Packets aren't infinite unless you're deliberately malforming a stream. Codecs generally have 'profiles' defining what the limits are. For example, Vorbis has a soft-limit of 8KB. Framing lock serves a purpose in some transports, but for on-disk, on-disc or WAN transport, it's not a big issue.

    Or if at least the container had simplified framing that could be placed throughout large packets. There are huge advantages to packet streaming being as simple as possible. Copying the packets out of a video stream is a bad thing for CPU and power consumption.

    Did you even bother to spend five minutes thinking before posting this crap? The designers of Ogg obviously spent a lot more time.

    I've spent a very long time with the Ogg container format, as well as most of the others in common use. That's why I can recognize the problems with it, as can the ffmpeg developers, as can all the other developers I've worked with at various companies. It's universally hated by anyone who's had to deal with integrating it into a project already supporting other containers/codecs.

    If you're reading this, Monty - it's not just bad blood with ffmpeg. I can't think of anyone I've worked on Ogg with who would admit to liking it, and who hasn't had to spend hours re-working their nice A/V streaming designs to work around its oddities.

  • by moonbender (547943) <moonbender@NoSPaM.gmail.com> on Thursday March 04, 2010 @06:02AM (#31356348)

    You care about the average Joe because he seemingly gets to decide which codec is hardware accelerated and which codec is used by major web sites. Even if you (or I) find his choice unacceptable.

  • by peppepz (1311345) on Thursday March 04, 2010 @08:27AM (#31357120)

    You care about the average Joe because he seemingly gets to decide which codec is hardware accelerated

    Yes but "hardware acceleration" means cheap devices which only play media when encoded with certain codecs, only at some bitrates, only at certain resolutions and colour depths, and only when in certains containers: all of this, together with obscure, unfixable hardware bugs which result in defective playback when the media was encoded a hair differently from the streams the device was tested on.

    I suppose fixed function decoders can be power efficient, but before my battery life, I care about a device that DOES play my stuff. Usually you can't know the details of the media supported by a device before buying it, because its box will probably read "plays h.264" instead of "plays h.264 baseline profile only, with resolutions from 128x128 to 512x512 and a framerate multiple of 16".

    The right thing to do for video acceleration is to extend CPUs (GPUs, DSPs) instruction sets in order to provide general acceleration for any kind of media. This is perfectly possible with today's devices (my 2006 phone plays PAL-encoded DivX movies just fine, without any assistance from its manufacturer). Fixed function hardware is a thing of the past.

It's a naive, domestic operating system without any breeding, but I think you'll be amused by its presumption.

Working...