Intel Develops Linux 'Software GPU' That's ~29-51x Faster (phoronix.com) 111
An anonymous reader writes: Intel is open-sourcing their work on creating a high-performance graphics software rasterizer that originally was developed for scientific visualizations. Intel is planning to integrate this new OpenSWR project with Mesa to deploy it on the Linux desktop as a faster software rasterizer than what's currently available (LLVMpipe). OpenSWR should be ideal for cases where there isn't a discrete GPU available or the drivers fail to function. This software rasterizer implements OpenGL 3.2 on Intel/AMD CPUs supporting AVX(2) (Sandy Bridge / Bulldozer and newer) while being 29~51x faster than LLVMpipe and the code is MIT licensed. The code prior to being integrated in Mesa is offered on GitHub.
How does it compare to a low-end graphics card? (Score:5, Insightful)
That's the really interesting question, since on board graphics just tend to work nowadays and the only real use case of such software for a consumer is as a fall back for when it doesn't and in that case the fancy graphics tend to get turned off anyway.
Re: (Score:3, Interesting)
Even more interesting question, how does something like Half-Life run on it? Is it a slide show, or is it playable?
Re: (Score:3)
Probably not optimally since it's for scientific visualization not gaming.
Re:How does it compare to a low-end graphics card? (Score:4, Interesting)
I have a slow AMD E-350 (1.6GHz, dual core, low power chip) in the machine I use as a media centre. With the original MESA software fallback, I got about 3 frames per second in the UI. It was totally unusable. After FreeBSD gained support for the GPU, I tried it again and got about 20-30fps. This seemed a bit low, and I discovered that I'd misconfigured it and it was still using the software fallback, only now it was using LLVMPipe. I don't know how much faster the GPU actually is, because at 60fps it hits vsync and doesn't try to go faster, though CPU usage drops from 100% to around 10% (of one core). Of course, this CPU doesn't have AVX, so won't benefit from this code.
The release announcement has a couple more details. This new back end is optimised for workloads with large vertex counts but simple shaders and for machines with a lot of cores. There are a lot of ways that you can make OpenGL shaders faster if you're optimising for the simple case. Half Life 1 works on a fixed-function pipeline, so should be fast on this. Half Life 2 uses a bit more by way of shaders, but possibly not enough to cause it to struggle. Stuff that runs well on a GPU is likely to also run well on multiple cores, if you get the synchronisation right, but the existing LLVM pipe uses a single thread.
My main interest in this is whether you can turn off the GPU entirely for normal compositing desktop workloads and whether it will make a big difference to the power consumption if you do. Compositing desktops generally do a lot of simple compositing, but have very simple shaders and quite simple geometry. I'd be very interested to see whether doing this on the AVX pipelines is cheaper than having an entire separate core doing the work, especially given that the GPU core is generally optimised for more complex graphical workloads.
Re: (Score:2)
Re: (Score:3)
That's the really interesting question, since on board graphics just tend to work nowadays and the only real use case of such software for a consumer is as a fall back for when it doesn't and in that case the fancy graphics tend to get turned off anyway.
For the scientific visualizations they use it for it's probably quite good, since they're not using a GPU. For gaming and such a consumer might want it's probably bad since it is has high vertex complexity and low shader complexity. So lots of details using primitives, but not all the shader work to make realistic graphics. It seems like a special case though.
Re: (Score:3)
Scientific visualization applications use data sets that are in the Gigabyte range - a volume cube of 1024x1024x1024 representing a supernova and streamed off a supercomputer, or maybe they have a high-resolution grid of an aircraft and jet engines and and want to see where the areas of turbulence are. With all the CPU cores available, it's easier to render on the CPU's that it is to funnel the data into GPU memory just to render one frame.
Re: (Score:2)
Re: (Score:2)
That's the really interesting question, since on board graphics just tend to work nowadays and the only real use case of such software for a consumer is as a fall back for when it doesn't and in that case the fancy graphics tend to get turned off anyway.
Not good at all! Boy this older cpu is just not cutting come to think of it.
I wonder if a nice shiny new i7 extreme edition would be in order for decent performance?
Re: (Score:2)
There are a lot of applications I can think of:
1) Virtualization which often doesn't support GPU sharing. I would love this on HyperV guests.
2) Remote Desktop scenarios where the GPU won't load.
3) Servers with no on-board GPU at all but rely on OpenGL for specific UI elements.
Re: (Score:2)
Because game engines for automated builds, that's why..
Build remote gaming platform and sell out to Sony (Score:2)
Why are you running software with graphical UI elements on a server to begin with?
Have you ever heard of Remote Desktop? Two companies even did this with games (Gaikai and OnLive), and both ended up acquired by PlayStation.
Up to (Score:3)
Not 29-51x faster, but _up to_ 29-51x faster (in a specific use case -for which it was developed-)
Re: (Score:2)
Good point.
Re:Up to (Score:5, Insightful)
Then it should have just said "up to 51x faster" ... or more ...
"Up to" is just a weasel word way of saying "less than".
I have up to a billion dollars in my pocket.
Re: (Score:2)
Not exactly. It usually means that under some plausible, but very rare scenario, it can & will achieve that level of performance.
If there's a plausible scenario under which you'll find a billion real (preferably US) dollars in your pocket, I think many of us would love to hear it.
Re: (Score:2)
The Intel software has a far better chance of achieving its max performance than of that check being cashable.
Re: (Score:2)
If there's a plausible scenario under which you'll find a billion real dollars in your pocket, I think many of us would love to hear it.
Here you go [shopify.com].
Re: (Score:2)
That's exactly why I'd specified both real and USD in my original comment.
Spend it wisely.
Re: (Score:2)
You're comparing Apples and Oranges.
Either you *do* have a billion dollars OR you *don't*. There is only _one_ use case.
If this rasterizer is 51x faster at stencil operations, but only 4x then MESA everywhere else, then there it is perfectly fine to say "Up to 51x faster".
What is the _context_ ? That's the crux of the issue.
-22 (Score:2, Funny)
I did the math, and got -22x faster. I'll pass on it, Thank You.
Re: (Score:3)
Re: (Score:1)
In this case, running on a 22-core Xeon chip. You won't get 29x-51x faster on your quad-core Skylake.
I.e., most of the speed-up is from multi-threading and use of AVX. Which I'm a little surprised that LLVMPipe didn't have - but then again, it probably wasn't too important at the time, and correctness was most important.
No, it's not for playing games (Score:5, Informative)
Despite the ignorance (or perhaps intentional clickbaityness) of the post, nobody at Intel expects this to replace a GPU to do regular graphics or play games. They haven't been investing big money in going from effectively zero GPU power in 2010 to beating AMD's best solutions in 2015 to replace it with a software gimmick now.
This renderer is designed to do all kinds of graphical visualization that doesn't make sense to do with a traditional GPU, just like running POVRay or rendering complex images in scientific applications.
It is NOT going to replace a real GPU for what a real GPU does.
Nobody at Intel ever said it would replace a GPU.
The Internet, however, isn't so smart.
Re: (Score:3)
Re: (Score:2)
Re: (Score:1)
AMD's on-chip offerings are badly obsolete, because AMD has skipped a generation in the jump to Zen (so all its current CPU line-up are basically dinosaurs).
Re: (Score:2)
The Iris Pro 6200 parts beating AMD's APUs here cost at least twice as much for just the processor.
Re: (Score:2)
Re: (Score:1)
Only because AMD stopped at 512 shaders on their APUs, because of memory bandwidth limitations making it pointless to include more. Additionally, being stuck on 28nm didn't help with scaling or power use (although Carrizo does a very good job to be competitive with 14nm Intel chips).
Intel bypassed that by including a very large on-die memory so they could expand their GPU further and get more performance. This comes at a cost - price.
Re: (Score:2)
One of the original authors, here: this is exactly correct. This thing was a toy that we wrote for our own entertainment that grew rapidly out of scale. As a joke, we'd implemented display-lists on OpenGL 1.2 and began playing Quake III. (This required monstrous multi-socket Xeon workstations, with all the fans going flat-out.) It just happens to turn out that (at the time) regular old top-of-the-line GPUs were crashing under the TACC workloads. Weirdly, our rasterizer was both faster than the GPUs (even th
Re: (Score:1)
Apparently, I'm not supposed to call SWR a "toy"; it's all grown up. SWR really shines through in terms of performance due to its (nearly) linear scaling on cores. When other rasterizers begin to suffer from communication overhead, SWR keeps going. Thus, if you've only got a few threads, you're not going to see really seriously awesome number—if you've got 16+ threads, that's where SWR is going to shine.
TempleOS (Score:1, Funny)
It's not faster than mine. Mine is optimal. I am the best programmer, given divine intellect. We do not allow different drivers for different people. Everybody uses the same driver. We do not allow hidden logic in the GPU. All logic must be in the CPU.
Err - bullshit. (Score:2, Troll)
' On a 36 total core dual E5-2699v3 we see performance 29x to 51x that of llvmpipe. '
Clearly some improvement happens going from single to multithreaded, but I suspect very few desktops have >4 cores, and a vanishingly small number >16.
Re: (Score:2)
Have you been buying any new hardware recently? My laptop has four physical cores and eight cores when using hyper threading (4/8). My desktop is 6/12 and my always-on home server has 12/24. The newest one of these devices is about two years old; the oldest one closer to five years.
Multiple cores are a pretty standard feature for a lot of hardware these days. Heck, even cellphones have between four and eight cores these days. So, no, you won't see many home PCs running on 36 cores. But there clearly is a tr
Re: (Score:1)
Hyperthreading does not meaningfully improve performance of compute-bound tasks.
Re: (Score:2)
Re: (Score:2)
Unless it's a Pentium 4, then a cache miss stalls the entire pipeline for both threads.
Re: (Score:2)
The biggest issue HT has against it is split resources, especially the L1 cache. Many typ
Re: (Score:2)
https://en.wikipedia.org/wiki/... [wikipedia.org]
Hyperthreading in the P4 killed the whole pipeline.
Re: (Score:1)
No it doesn't. It has 4 real cores plus 4 hyperthreads that are sorta kinda like cores but not as good.
Re: (Score:2)
Kind of like an AMD core then?
Re: (Score:2)
Re: (Score:1)
Wait, wait, wait, woa, hold everything... You're telling me, let me be sure I'm sitting down for this... you're telling me, they ran a highly parallel problem (graphics rendering) on a 36-core machine, and it ran roughly 36 times faster than a single-threaded code?
As Iago put it, "I think I'm going to fall over and DIE from NOT SURPRISE!"
Re: (Score:1)
The Gallium3D LLVMpipe driver does not touch the GPU, so it can be run with any graphics card. However, for efficient performance, you will want to be running a 64-bit operating system and a CPU that supports SSE2.0 or better. LLVM can take advantage of SSE3 and SSE4 extensions too, which will result in even greater performance. To no surprise, the better the CPU you have, the better LLVMpipe will perform. The more cores that the CPU has, the better the performance will be too, as the rasterizer supports threading and tiling.
Re: (Score:2)
Bullshit. Clearly some improvement happens going from single to multithreaded, but I suspect very few desktops have >4 cores, and a vanishingly small number >16.
Well, for one, it's 36 threads, which means 18 cores. Two, the types of computers which this is useful for (multi-processor renderfarms, virtualized guest OSes and compute clusters) you do often have 40+ threads.
If you want to play Doom 4, go buy a GPU, llvmpipe isn't useful for you either. So there was a solution for a specific group of people who had one solution (llvmpipe) and now those same people now have a much faster solution.
This is like an announcement that MRI machines have been made 100% more
Re: (Score:2)
Err - no.
http://ark.intel.com/products/... [intel.com] - 18 core.
" a 36 total core dual E5-2699v3 we see performance 29x to 51x that of llvmpipe"
Two processors, with 18 cores each.
Re: (Score:2)
Just go the gaming PC websites, and you'll see that every OS is 64-bit now, and that even the laptop CPU's have at least 4 cores while desktops CPU's can have as many as 8 or more cores. Intel have some CPU's with 6 cores which are hyperthreaded, so that looks like 12 cores to the OS. AMD have at least 8 core CPU's. Then you have dual-socket Intel XEON servers with at two sockets and supporting 18 cores each.
Even an old laptop from 2005 is dual-core and hyperthreaded.
Then there are so many ways you can par
fanbois with a pottymouth (Score:2)
Yes, because you had a valid point to make, I'm sure, but couldn't articulate it without going into the toilet.
As a result the only one covered in the brown stuff is you, and not those "mysterious GPL fanbois" for fear of whom you wear your aluminum-foil hat.
Go gently into the night, and bring toilet paper, and don't come back.
E
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
He said most and he's correct. Most OS (of any license) goes nowhere. I'd expect most of it never even gets finished. I don't even license any of the code I release - I just give it away without any license at all. It's not good enough to steal and usually does just one thing that I had to do. Heh. I've not done that in a while.
Anyhow, no... I'm pretty sure I've read several actual studies (well, extracts) that show an absurdly high number of OS projects go nowhere. You're not reading properly and assuming
Re: (Score:2)
I don't even license any of the code I release - I just give it away without any license at all
These two are contradictory, unless you live in a jurisdiction where you can explicitly place things in the public domain and you are doing so. Without a license, no one who receives the code has the right to do anything with it.
Re: (Score:2)
I do. I live in the US. We can place things in the public domain at will. All of it includes a "Public Domain" bit of gibberish in it. You can steal it (I guess) and sell it, you can change it, you can throw sticks at it. I don't care. It's not good enough to do anything worth paying for - often it's simply scripts to do something I needed. Hell, not even often, I've not done so in years. I'm a much more passive consumer than I used to be.
If you're curious about the legality, it's called "dedicating" and th
Holy shit (Score:1, Insightful)
the comments on this story pretty much put the nail in the coffin for this website. News for nerds? Not based on the replies of people who think this is for desktops. Seriously, what the fuck?
Re: (Score:3)
Intel is planning to integrate this new OpenSWR project with Mesa to deploy it on the Linux desktop
What hardware do I need?
* Any x86 processor with at least AVX (introduced in the Intel
SandyBridge and AMD Bulldozer microarchitectures in 2011) will
work.
* You don't need a fire-breathing Xeon machine to work on SWR - we do
day-to-day development with laptops and desktop CPUs.
TL:DR; it's for whoever wants to use it on whatever hardware they want to use.
Massive Scientific Visualization (Score:4, Informative)
This is seriously useful for massive scientific visualization... where raw rendering speed isn't always the bottleneck (but of course, faster is better).
We do simulations on supercomputers that generate terabytes of output. You then have to use a smaller cluster (typically 1000-2000 processors) to read that data and generate visualizations of it (using software like Paraview ( http://www.paraview.org/ [paraview.org] ) ). Those smaller clusters often don't have any graphics cards on the compute nodes at all... and we currently fall back to Mesa for rendering frames.
If you're interested in what some of these visualizations look like... here's a video of some of our stuff: https://www.youtube.com/watch?... [youtube.com]
Re: (Score:2)
The simulation part is very performance intensive, but the visualizations themselves look like something you could do with WebGL, or often, just some SVG and CSS. What are the thousands of cores used for? Not even a super-high resolution seems warranted, because of the continuity of material properties etc. Apparently the result is some 3D model which can be interactively rotated and zoomed, likely on a single local machine that takes direct input from the user, i.e. the thousands of cores don't even seem t
Re: (Score:1)
Classic use of "just" to completely minimize the argument and set up your straw man.
Hint: if its $$$ to render and they still do so, its probably because they need to.
Re: (Score:2)
Exactly right.
Re: (Score:2)
Like I mentioned... the actual drawing is NOT the bottleneck (but every little bit helps).
Those images you see on the screens are backed by TB of data that has to be read in and distilled down before being renderable. That's what the thousands of cores are doing.
Also: those rotateable ones you see in the beginning are small. If you skip forward to the 2:30 mark you can see some of the larger stuff (note that we're not interactively rotating it). That movie at 2:30 took 24 hours to render on about 1,000 c
Re: (Score:2)
Thanks, got it!
Re: (Score:2)
Hum, rather than dicking about with a software render get some GPU's onto your cluster. I find it hard to believe that HPC clusters exist in 2015 with zero GPU nodes, and if they do the solution is to add the GPU nodes.
Heck our next cluster (in planing stages) is going to have all the login nodes with GPU's (Dell Precision R7910, four GPU's per login node) rather than dedicated visualization login nodes, because life is too short. That is in addition to compute nodes being equipped with GPU's.
Re: (Score:2)
Like I said: raw rasterizing isn't the main bottleneck... reading the data and transforming the data is.... both things better done on the CPU. Drawing frames takes up a very small amount of the overall runtime... but it's always nice to speed it up!
GPUs wouldn't help much in this scenario... and our CPU clusters are used for many things other than visualization.
Yes, we do have some dedicated "viz" clusters as well... but we typically don't use them because they are too small for loading many TB of data.
Any BSD equivalents? (Score:2)
Re: (Score:2)
This isn't Linux... (Score:2)
Come on - people here should know better. It's 2015 and the "Oooh! Linux sounds cool, so let's use that word for everything!" fad should be over now.
Everything open source is NOT Linux. Linux is a friggin' kernel. This is open source software. It coincidentally gets used with GNU/Linux often. BUT IT'S OPEN SOURCE SOFTWARE.
Repeat after me: open source does not mean Linux. Linux does not mean open source.
Would it help Cygwin X Windows? (Score:2)