Follow Slashdot stories on Twitter


Forgot your password?
Open Source Graphics Intel Software Linux

Intel Develops Linux 'Software GPU' That's ~29-51x Faster ( 111

An anonymous reader writes: Intel is open-sourcing their work on creating a high-performance graphics software rasterizer that originally was developed for scientific visualizations. Intel is planning to integrate this new OpenSWR project with Mesa to deploy it on the Linux desktop as a faster software rasterizer than what's currently available (LLVMpipe). OpenSWR should be ideal for cases where there isn't a discrete GPU available or the drivers fail to function. This software rasterizer implements OpenGL 3.2 on Intel/AMD CPUs supporting AVX(2) (Sandy Bridge / Bulldozer and newer) while being 29~51x faster than LLVMpipe and the code is MIT licensed. The code prior to being integrated in Mesa is offered on GitHub.
This discussion has been archived. No new comments can be posted.

Intel Develops Linux 'Software GPU' That's ~29-51x Faster

Comments Filter:
  • by Anonymous Coward on Tuesday October 20, 2015 @07:40PM (#50769747)

    That's the really interesting question, since on board graphics just tend to work nowadays and the only real use case of such software for a consumer is as a fall back for when it doesn't and in that case the fancy graphics tend to get turned off anyway.

    • Re: (Score:3, Interesting)

      by edxwelch ( 600979 )

      Even more interesting question, how does something like Half-Life run on it? Is it a slide show, or is it playable?

      • Probably not optimally since it's for scientific visualization not gaming.

      • by TheRaven64 ( 641858 ) on Wednesday October 21, 2015 @04:02AM (#50771453) Journal
        I've not tried it, but to give a few data points:

        I have a slow AMD E-350 (1.6GHz, dual core, low power chip) in the machine I use as a media centre. With the original MESA software fallback, I got about 3 frames per second in the UI. It was totally unusable. After FreeBSD gained support for the GPU, I tried it again and got about 20-30fps. This seemed a bit low, and I discovered that I'd misconfigured it and it was still using the software fallback, only now it was using LLVMPipe. I don't know how much faster the GPU actually is, because at 60fps it hits vsync and doesn't try to go faster, though CPU usage drops from 100% to around 10% (of one core). Of course, this CPU doesn't have AVX, so won't benefit from this code.

        The release announcement has a couple more details. This new back end is optimised for workloads with large vertex counts but simple shaders and for machines with a lot of cores. There are a lot of ways that you can make OpenGL shaders faster if you're optimising for the simple case. Half Life 1 works on a fixed-function pipeline, so should be fast on this. Half Life 2 uses a bit more by way of shaders, but possibly not enough to cause it to struggle. Stuff that runs well on a GPU is likely to also run well on multiple cores, if you get the synchronisation right, but the existing LLVM pipe uses a single thread.

        My main interest in this is whether you can turn off the GPU entirely for normal compositing desktop workloads and whether it will make a big difference to the power consumption if you do. Compositing desktops generally do a lot of simple compositing, but have very simple shaders and quite simple geometry. I'd be very interested to see whether doing this on the AVX pipelines is cheaper than having an entire separate core doing the work, especially given that the GPU core is generally optimised for more complex graphical workloads.

    • by Kjella ( 173770 )

      That's the really interesting question, since on board graphics just tend to work nowadays and the only real use case of such software for a consumer is as a fall back for when it doesn't and in that case the fancy graphics tend to get turned off anyway.

      For the scientific visualizations they use it for it's probably quite good, since they're not using a GPU. For gaming and such a consumer might want it's probably bad since it is has high vertex complexity and low shader complexity. So lots of details using primitives, but not all the shader work to make realistic graphics. It seems like a special case though.

      • by mikael ( 484 )

        Scientific visualization applications use data sets that are in the Gigabyte range - a volume cube of 1024x1024x1024 representing a supernova and streamed off a supercomputer, or maybe they have a high-resolution grid of an aircraft and jet engines and and want to see where the areas of turbulence are. With all the CPU cores available, it's easier to render on the CPU's that it is to funnel the data into GPU memory just to render one frame.

    • That's the really interesting question, since on board graphics just tend to work nowadays and the only real use case of such software for a consumer is as a fall back for when it doesn't and in that case the fancy graphics tend to get turned off anyway.

      Not good at all! Boy this older cpu is just not cutting come to think of it.

      I wonder if a nice shiny new i7 extreme edition would be in order for decent performance?

    • There are a lot of applications I can think of:

      1) Virtualization which often doesn't support GPU sharing. I would love this on HyperV guests.
      2) Remote Desktop scenarios where the GPU won't load.
      3) Servers with no on-board GPU at all but rely on OpenGL for specific UI elements.

  • by itamihn ( 1213328 ) on Tuesday October 20, 2015 @07:42PM (#50769767) Homepage

    Not 29-51x faster, but _up to_ 29-51x faster (in a specific use case -for which it was developed-)

    • -22 (Score:2, Funny)

      by Tablizer ( 95088 )

      I did the math, and got -22x faster. I'll pass on it, Thank You.

      • by dfsmith ( 960400 )
        Are you kidding!? Negative 22x faster means it will produce the image before you asked for it! Please excuse me while I go and render the current NYSE prices....
    • by hattig ( 47930 )

      In this case, running on a 22-core Xeon chip. You won't get 29x-51x faster on your quad-core Skylake.

      I.e., most of the speed-up is from multi-threading and use of AVX. Which I'm a little surprised that LLVMPipe didn't have - but then again, it probably wasn't too important at the time, and correctness was most important.

  • by CajunArson ( 465943 ) on Tuesday October 20, 2015 @07:47PM (#50769797) Journal

    Despite the ignorance (or perhaps intentional clickbaityness) of the post, nobody at Intel expects this to replace a GPU to do regular graphics or play games. They haven't been investing big money in going from effectively zero GPU power in 2010 to beating AMD's best solutions in 2015 to replace it with a software gimmick now.

    This renderer is designed to do all kinds of graphical visualization that doesn't make sense to do with a traditional GPU, just like running POVRay or rendering complex images in scientific applications.

    It is NOT going to replace a real GPU for what a real GPU does.
    Nobody at Intel ever said it would replace a GPU.
    The Internet, however, isn't so smart.

    • "They haven't been investing big money in going from effectively zero GPU power in 2010 to beating AMD's best solutions in 2015 to replace it with a software gimmick now."...quit bogarting bro, share some of that crazy smoke weed you been puffing. after all if you think Intel graphics are gonna beat Radeons? That be some damned good shit now!
      • He's comparing the Iris Pro 6200 to AMD's offerings. And he is somewhat right: [].
        • by Anonymous Coward

          AMD's on-chip offerings are badly obsolete, because AMD has skipped a generation in the jump to Zen (so all its current CPU line-up are basically dinosaurs).

        • The Iris Pro 6200 parts beating AMD's APUs here cost at least twice as much for just the processor.

          • by Bengie ( 1121981 )
            Intel CPUs is actually quite cheap. I paid less for my CPU than my PSU, GPU, SSD, case, and monitor. Just because AMD is cheaper doesn't mean I'm going to saving much money relative to the whole.
      • by hattig ( 47930 )

        Only because AMD stopped at 512 shaders on their APUs, because of memory bandwidth limitations making it pointless to include more. Additionally, being stuck on 28nm didn't help with scaling or power use (although Carrizo does a very good job to be competitive with 14nm Intel chips).

        Intel bypassed that by including a very large on-die memory so they could expand their GPU further and get more performance. This comes at a cost - price.

    • by thechao ( 466986 )

      One of the original authors, here: this is exactly correct. This thing was a toy that we wrote for our own entertainment that grew rapidly out of scale. As a joke, we'd implemented display-lists on OpenGL 1.2 and began playing Quake III. (This required monstrous multi-socket Xeon workstations, with all the fans going flat-out.) It just happens to turn out that (at the time) regular old top-of-the-line GPUs were crashing under the TACC workloads. Weirdly, our rasterizer was both faster than the GPUs (even th

      • by thechao ( 466986 )

        Apparently, I'm not supposed to call SWR a "toy"; it's all grown up. SWR really shines through in terms of performance due to its (nearly) linear scaling on cores. When other rasterizers begin to suffer from communication overhead, SWR keeps going. Thus, if you've only got a few threads, you're not going to see really seriously awesome number—if you've got 16+ threads, that's where SWR is going to shine.

  • TempleOS (Score:1, Funny)

    by Anonymous Coward

    It's not faster than mine. Mine is optimal. I am the best programmer, given divine intellect. We do not allow different drivers for different people. Everybody uses the same driver. We do not allow hidden logic in the GPU. All logic must be in the CPU.

  • ' On a 36 total core dual E5-2699v3 we see performance 29x to 51x that of llvmpipe. '
    Clearly some improvement happens going from single to multithreaded, but I suspect very few desktops have >4 cores, and a vanishingly small number >16.

    • by markus ( 2264 )

      Have you been buying any new hardware recently? My laptop has four physical cores and eight cores when using hyper threading (4/8). My desktop is 6/12 and my always-on home server has 12/24. The newest one of these devices is about two years old; the oldest one closer to five years.

      Multiple cores are a pretty standard feature for a lot of hardware these days. Heck, even cellphones have between four and eight cores these days. So, no, you won't see many home PCs running on 36 cores. But there clearly is a tr

      • Hyperthreading does not meaningfully improve performance of compute-bound tasks.

        • by Bengie ( 1121981 )
          It does enhance performance of compute that is memory bandwidth limited and/or the code is not pipelined well enough to take advantage of OoO.
          • Unless it's a Pentium 4, then a cache miss stalls the entire pipeline for both threads.

            • by Bengie ( 1121981 )
              From the beginning, one of the benefits of hypertheading is hiding the cost of cache misses. Intel has a proof of concept program that uses one thread to prefetch data, which means it gets hit with a cache miss, and the other thread doesn't get the cache miss because the data is ready just in time. Hyperthreading has a 1 cycle context switch overhead, allowing it to quickly change threads for even the shortest stalls.

              The biggest issue HT has against it is split resources, especially the L1 cache. Many typ
      • by fnj ( 64210 )

        My laptop has four physical cores and eight cores when using hyper threading (4/8).

        No it doesn't. It has 4 real cores plus 4 hyperthreads that are sorta kinda like cores but not as good.

        • Kind of like an AMD core then?

          • by Bengie ( 1121981 )
            Except when your OS sleeps an HT "core", it effectively deactivates HT for that real core. One of the reasons Win7 runs better than XP on HT cpus is better thread scheduling. It will attempt to use only one HT core per physical core, then place the virtual core into a deep sleep mode, allowing the CPU to run as if HT is disabled.
    • by Anonymous Coward

      Wait, wait, wait, woa, hold everything... You're telling me, let me be sure I'm sitting down for this... you're telling me, they ran a highly parallel problem (graphics rendering) on a 36-core machine, and it ran roughly 36 times faster than a single-threaded code?

      As Iago put it, "I think I'm going to fall over and DIE from NOT SURPRISE!"

      • by Anonymous Coward
        Just in case you didn't realize, LLVMpipe has been at least partially multi-threaded since, like, 2010...

        The Gallium3D LLVMpipe driver does not touch the GPU, so it can be run with any graphics card. However, for efficient performance, you will want to be running a 64-bit operating system and a CPU that supports SSE2.0 or better. LLVM can take advantage of SSE3 and SSE4 extensions too, which will result in even greater performance. To no surprise, the better the CPU you have, the better LLVMpipe will perform. The more cores that the CPU has, the better the performance will be too, as the rasterizer supports threading and tiling.

    • Bullshit. Clearly some improvement happens going from single to multithreaded, but I suspect very few desktops have >4 cores, and a vanishingly small number >16.

      Well, for one, it's 36 threads, which means 18 cores. Two, the types of computers which this is useful for (multi-processor renderfarms, virtualized guest OSes and compute clusters) you do often have 40+ threads.

      If you want to play Doom 4, go buy a GPU, llvmpipe isn't useful for you either. So there was a solution for a specific group of people who had one solution (llvmpipe) and now those same people now have a much faster solution.

      This is like an announcement that MRI machines have been made 100% more

    • by mikael ( 484 )

      Just go the gaming PC websites, and you'll see that every OS is 64-bit now, and that even the laptop CPU's have at least 4 cores while desktops CPU's can have as many as 8 or more cores. Intel have some CPU's with 6 cores which are hyperthreaded, so that looks like 12 cores to the OS. AMD have at least 8 core CPU's. Then you have dual-socket Intel XEON servers with at two sockets and supporting 18 cores each.
      Even an old laptop from 2005 is dual-core and hyperthreaded.

      Then there are so many ways you can par

  • Holy shit (Score:1, Insightful)

    by Anonymous Coward

    the comments on this story pretty much put the nail in the coffin for this website. News for nerds? Not based on the replies of people who think this is for desktops. Seriously, what the fuck?

    • Intel is planning to integrate this new OpenSWR project with Mesa to deploy it on the Linux desktop

      What hardware do I need?

      * Any x86 processor with at least AVX (introduced in the Intel
      SandyBridge and AMD Bulldozer microarchitectures in 2011) will

      * You don't need a fire-breathing Xeon machine to work on SWR - we do
      day-to-day development with laptops and desktop CPUs.

      TL:DR; it's for whoever wants to use it on whatever hardware they want to use.

  • by friedmud ( 512466 ) on Tuesday October 20, 2015 @11:40PM (#50770779)

    This is seriously useful for massive scientific visualization... where raw rendering speed isn't always the bottleneck (but of course, faster is better).

    We do simulations on supercomputers that generate terabytes of output. You then have to use a smaller cluster (typically 1000-2000 processors) to read that data and generate visualizations of it (using software like Paraview ( [] ) ). Those smaller clusters often don't have any graphics cards on the compute nodes at all... and we currently fall back to Mesa for rendering frames.

    If you're interested in what some of these visualizations look like... here's a video of some of our stuff: []

    • by robi5 ( 1261542 )

      The simulation part is very performance intensive, but the visualizations themselves look like something you could do with WebGL, or often, just some SVG and CSS. What are the thousands of cores used for? Not even a super-high resolution seems warranted, because of the continuity of material properties etc. Apparently the result is some 3D model which can be interactively rotated and zoomed, likely on a single local machine that takes direct input from the user, i.e. the thousands of cores don't even seem t

      • by Anonymous Coward

        Classic use of "just" to completely minimize the argument and set up your straw man.

        Hint: if its $$$ to render and they still do so, its probably because they need to.

      • Like I mentioned... the actual drawing is NOT the bottleneck (but every little bit helps).

        Those images you see on the screens are backed by TB of data that has to be read in and distilled down before being renderable. That's what the thousands of cores are doing.

        Also: those rotateable ones you see in the beginning are small. If you skip forward to the 2:30 mark you can see some of the larger stuff (note that we're not interactively rotating it). That movie at 2:30 took 24 hours to render on about 1,000 c

    • by jabuzz ( 182671 )

      Hum, rather than dicking about with a software render get some GPU's onto your cluster. I find it hard to believe that HPC clusters exist in 2015 with zero GPU nodes, and if they do the solution is to add the GPU nodes.

      Heck our next cluster (in planing stages) is going to have all the login nodes with GPU's (Dell Precision R7910, four GPU's per login node) rather than dedicated visualization login nodes, because life is too short. That is in addition to compute nodes being equipped with GPU's.

      • Like I said: raw rasterizing isn't the main bottleneck... reading the data and transforming the data is.... both things better done on the CPU. Drawing frames takes up a very small amount of the overall runtime... but it's always nice to speed it up!

        GPUs wouldn't help much in this scenario... and our CPU clusters are used for many things other than visualization.

        Yes, we do have some dedicated "viz" clusters as well... but we typically don't use them because they are too small for loading many TB of data.

  • If Intel is doing this, they might as well do it for BSD versions of this as well, something that might be leveraged across the vanilla UNIX board
    • It plugs into MESA in the same place as LLVMPipe and is entirely userspace. It will probably just work on *BSD.
  • Come on - people here should know better. It's 2015 and the "Oooh! Linux sounds cool, so let's use that word for everything!" fad should be over now.

    Everything open source is NOT Linux. Linux is a friggin' kernel. This is open source software. It coincidentally gets used with GNU/Linux often. BUT IT'S OPEN SOURCE SOFTWARE.

    Repeat after me: open source does not mean Linux. Linux does not mean open source.

  • I have access to a few linux boxes but my main work desktop is windows 7. I use cygwin Xwindows server to run X clients in linux boxes. The X windows is implemented in software and it does not support anything above OpenGL1.8. I think it uses Mesa as the rasterizer. If it gets faster rasterizer and higher OpenGL support it would help users. This would beat Windows Remote Desktop that too does not support anything about OpenGL1.8, but I am sure someone will port Mesa support for remote desktop too, if it hel

This process can check if this value is zero, and if it is, it does something child-like. -- Forbes Burkowski, CS 454, University of Washington