AMD's CUDA Implementation Built On ROCm Is Now Open Source (phoronix.com) 29

Posted by BeauHD on Tuesday February 13, 2024 @09:25PM from the exciting-projects dept.

Michael Larabel writes via Phoronix: While there have been efforts by AMD over the years to make it easier to port codebases targeting NVIDIA's CUDA API to run atop HIP/ROCm, it still requires work on the part of developers. The tooling has improved such as with HIPIFY to help in auto-generating but it isn't any simple, instant, and guaranteed solution -- especially if striving for optimal performance. Over the past two years AMD has quietly been funding an effort though to bring binary compatibility so that many NVIDIA CUDA applications could run atop the AMD ROCm stack at the library level -- a drop-in replacement without the need to adapt source code. In practice for many real-world workloads, it's a solution for end-users to run CUDA-enabled software without any developer intervention. Here is more information on this "skunkworks" project that is now available as open-source along with some of my own testing and performance benchmarks of this CUDA implementation built for Radeon GPUs. [...]

For those wondering about the open-source code, it's dual-licensed under either Apache 2.0 or MIT. Rust fans will be excited to know the Rust programming language is leveraged for this Radeon implementation. [...] Those wanting to check out the new ZLUDA open-source code for Radeon GPUs can do so via GitHub.

AMD's CUDA Implementation Built On ROCm Is Now Open Source

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 29 Comments Log In/Create an Account

Comments Filter:

"CUDA implementation built for Radeon GPUs" (Score:5, Interesting)

by illogicalpremise ( 1720634 ) writes: on Tuesday February 13, 2024 @09:39PM (#64237820)

Nope. Don't fall for it. The ROCm platform is targeted at DATACENTER GPUs. As soon as any consumer GPU becomes affordable it's quietly dropped from the next ROCm release.
If you go down this road expect to spend mega $$$$ on Instinct datacenter GPUs or buying a new high-end GPU every year.

- - Re:"CUDA implementation built for Radeon GPUs" (Score:5, Interesting)
    
    by ShanghaiBill ( 739463 ) writes: on Tuesday February 13, 2024 @10:27PM (#64237880)
    
    Just another ad...
    Not really. This is important. CUDA is very widely used for GP-GPU programming, including TensorFlow, PyTorch, and most other deep-learning libraries. If AMD's CUDA implementation is source-code compatible, then it can break Nvidia's near monopoly.
    
    - Re: "CUDA implementation built for Radeon GPUs" (Score:4, Interesting)
      
      by ArmoredDragon ( 3450605 ) writes: on Tuesday February 13, 2024 @10:47PM (#64237916)
      
      Binary compatible. E.g:
      https://github.com/vosen/ZLUDA... [github.com]
      Repr(C) basically tells the rust compiler to disable structure layout optimization, and rust specifics like tuples and complex enums aren't allowed, effectively making it C ABI compatible. No recompiling of the source should be necessary, existing binaries should just work.
      
    - Re: (Score:2)
      
      by illogicalpremise ( 1720634 ) writes:
      
      it can break Nvidia's near monopoly.
      Not until AMD stop playing stupid games and support their consumer hardware. I say this as somebody who bought 3 Intel Skull Canyon NUCs with Vega M (HPM 2.0) discreet graphics only to discover that ROCm had arbitrarily dropped support for the chipset. After 2 days of screwing around with custom binaries I was able to get tensorflow and pytorch running but the effort was draining and I basically lost interest. 6 months later they also dropped driver support in Catalyst for this specific GPU (because AMD and
      - Re: (Score:2)
        
        by ShanghaiBill ( 739463 ) writes:
        
        Consumer-grade GPUs are underpowered for training ANNs but are plenty good enough for running them.
        
        Re: (Score:2)
        
        by Rei ( 128717 ) writes:
        
        Depends what you're talking about. You're not going to be building a LLM foundation on them, but with a high-end consumer card or two, you can train pretty much any diffusion model, full finetune small LLMs, and make LoRAs and QLoRAs for mid-sized ones.
        I know someone who's messing around with a consumer-grade GPU cluster (e.g. 8x 3090s on PCIex16), though I don't know yet how it's going. My suspicion is that those PCI bandwidth limits will be frustrating for most tasks. But it'll be interesting to see.
        I'
        
        Re: (Score:2)
        
        by illogicalpremise ( 1720634 ) writes:
        
        "Tablet grade GPU" says guy with no product knowledge or context; but sure thing smug internet dude!
        I wasn't expecting miracles nor are the machines only for ML. Just wanted better brute-force KNN feature matching than CPU can do. If I needed heavy training I'd just migrate workload to the cloud. Doesn't excuse the miserable state of AMD GPU support compared to both of their rivals or really invalidate anything I said. I've done plenty of training/testing on consumer Nvidia processors with zero fucking arou
        
        Re: (Score:2)
        
        by illogicalpremise ( 1720634 ) writes:
        
        toyboxes with tablet grade GPUs
        Intel NUC8i7HVK - (Hades Canyon) w/ Vega M GH discreet graphics
        "24 CUs and 1536 stream processors. It has a base clock speed of 1063 MHz, a boost clock of 1190 MHz, peak theoretical compute of 3.7 TFLOPS. Memory bandwidth is 204.8 GB/s. 4GB of HBM2. ROP throughput at 64 pixels/clock."
        "The Core i7-8809G is the highest performing option and has a base of 3.1 GHz, Turbo lock of 4.2 GHz, and 8MB of cache."
        "Output to 6 4K displays simultaneously"
        "Total War: Warhammer: 70FPS @ 1080P"
        Sources:
        https://www.pcworld.co [pcworld.com]
      - Re: (Score:3)
        
        by ceoyoyo ( 59147 ) writes:
        
        "Discreet graphics."
        I think I found your problem. Discreet graphics are only good for watching porn.
    - Re: (Score:3)
      
      by Rei ( 128717 ) writes:
      
      Like most people, I can't drop car-money on a training server.
      It greatly matters to me whether their CUDA implementation works with consumer cards. That could pry me away from NVIDIA if they e.g. offer a *consumer* card with more VRAM or a NVLink equivalent or whatnot.
- Re: (Score:2)
  
  by TeknoHog ( 164938 ) writes:
  
  The ROCm platform is targeted at DATACENTER GPUs. As soon as any consumer GPU becomes affordable it's quietly dropped from the next ROCm release.
  This doesn't seem right at all. I'm using ROCm drivers for OpenCL applications right now on an RX 6600, an affordable consumer GPU. ROCm is open source (see the Wikipedia link in the summary) so there isn't an immediate danger of dropping support for a given hardware; you can always fork it and backport things etc.
  AMD also provides closed source drivers so perhaps you're referring to them?
  - Re: (Score:2)
    
    by illogicalpremise ( 1720634 ) writes:
    
    The ROCm platform is targeted at DATACENTER GPUs. As soon as any consumer GPU becomes affordable it's quietly dropped from the next ROCm release.
    This doesn't seem right at all. I'm using ROCm drivers for OpenCL applications right now on an RX 6600, an affordable consumer GPU. ROCm is open source (see the Wikipedia link in the summary) so there isn't an immediate danger of dropping support for a given hardware; you can always fork it and backport things etc.
    AMD also provides closed source drivers so perhaps you're referring to them?
    No. I'm referring to your card (and most others) having no official ROCm support from AMD and only working (if at all) via various hacks and black magic;
    https://github.com/ROCm/ROCm/i... [github.com]
    Like I said I got my Vega M chipset kind-of working too but only through frustrating hours of trial and error and even then the result was flaky and highly dependant on very specific driver and library versions, custom source compiles and/or undocumented settings found in long and rambling stackoverflow threads and help foru
    - Re: (Score:2)
      
      by TeknoHog ( 164938 ) writes:
      
      https://github.com/ROCm/ROCm/i... [github.com]
      Ah, I must have been lucky with my applications and the GPU. I actually started with Mesa OpenCL which seemed fine at first, but there were timeouts on my longer-running kernels, and ROCm has none of that.
      I do fairly simple but heavy numerical stuff, and it turns out AMD cards are much better for these uses. For example, double precision float speed is only half of single precision, whereas DP on Nvidia consumer cards is much slower. It's easy to check this as Nvidia also runs OpenCL, so in my experience
Leverage (Score:2)

by TwistedGreen ( 80055 ) writes:

I'm glad they leveraged Rust!
Isn't that a weasel word for "I don't actually know anything"
- Re: (Score:2)
  
  by illogicalpremise ( 1720634 ) writes:
  
  AFAICT this announcement isn't a ROCm release, it's just a compiler that sits on top of it. Normal ROCm/HIP requires changing source code headers to load HIP headers instead of CUDA ones. This project appears to be a workaround for that.
  The device support in ROCm itself is a sliding window depending on the version. It seems to only support a couple of generations in each release and not necessarily the latest. Last I checked (ROCm 5) they removed Polaris/Vega support and the only consumer cards supported we
  - Re: (Score:2)
    
    by serviscope_minor ( 664417 ) writes:
    
    It still doesn't make sense: Nvidia have both the consumer (cheap), pro (expensive) and datacentre (very expensive) where the capabilities vary. Often the consumer ones are pretty hard hitters, and you can do CUDA on anything from a 2GB 1050 up to their latest H100.
    Being able to run the compute on any of their cards then pick the appropriate one for your needs does not appear to have done them any harm.
    AMD being way behind must have observed NVidia minting it because having no real competition especially fo
This is dead in the water pretty much (Score:2)

by xwin ( 848234 ) writes:

From the README (FAQ section)
>>With neither Intel nor AMD interested, we've run out of GPU companies. I'm open though to any offers of that could move the project forward.
>>Realistically, it's now abandoned and will only possibly receive updates to run workloads I am personally interested in (DLSS).

Unfortunately AMD is not interested in running CUDA applications. Companies these days are all about lock-in. Which is fine but it only benefits the one with the largest user base. In ML field it
- Re: (Score:2)
  
  by jabuzz ( 182671 ) writes:
  
  Someone needs to write a tender for a sizable GPU system that mandates CUDA support to get AMD's attention.
- Re: (Score:2)
  
  by TeknoHog ( 164938 ) writes:
  
  I've refused to learn CUDA as I don't want my code to be at the mercy of a single GPU maker. The project looks interesting at first glance, but it seems like they'd just be playing catch-up with new CUDA developments. Open standards are much nicer, and besides OpenCL, I've got the impression that ROCm itself (which is open source) provides a lot of CUDA-like higher-level functionality.
- Re: This is dead in the water pretty much (Score:2)
  
  by Junta ( 36770 ) writes:
  
  I'd wager that they would love to, but are afraid of a legal battle with Nvidia.
  Now there's precedent that should favor interface cloning, but every time it happens it still has involved an expensive, protracted legal dispute. So they may not want to get transferred 6 up in that even if they should feel confident in the ultimate result.
Fan boys here (Score:4, Insightful)

by jmccue ( 834797 ) writes: on Wednesday February 14, 2024 @09:18AM (#64238640) Homepage

Seems to be a lot of Nvidia fan boys here. I hope this works out. Any Open source product that can break Nvidia's strangle hold on the market is good to me. I hope AMD continues on with this research, ignoring the MBAs, so we can have a real performant open source GPU for Linux and the BSDs.

- Re: (Score:3)
  
  by serviscope_minor ( 664417 ) writes:
  
  Seems to be a lot of Nvidia fan boys here.
  Do there? There seem to be many more people just shaking their heads at AMD's crushing incompetence and disdain for their users.
  I buy NVidia, of course I do, because I want to do deep learning. I can get anything from a cheapass 1050, to a consumer monster like the 4090 or deploy to a cloud server with an H100 and that shit just works. I can tell non computer science students to use pytorch and it will work without them having to learn how to use fucking docker, or
  - Re: (Score:3)
    
    by drinkypoo ( 153816 ) writes:
    
    I really wish that NVidia had some competition, but AMD just flat out won't do the legwork and are somehow worried about cheap consumer cards cannibalising their nonexistent market for datacentre cards, even though this clearly is not the case for NVidia.
    Yeah, nvidia solved this problem by making all their cards expensive. (I now have a 4060 16GB. It was $450. The absolute most I've ever spent on a GPU before was $200, and I was there at the beginning of the consumer GPUs... VooDoo 1 and 2, Riva TNT and TNT2, Permedia 2, PowerVR, Matrox, I tried them all. Nvidia was the best even then, although 3dlabs was pretty good. AMD wasn't even worth screwing with until they got OSS drivers.
    I hope this thing works out for them, I really do, because nvidia needs some c
    - Re: (Score:3)
      
      by serviscope_minor ( 664417 ) writes:
      
      Yeah, nvidia solved this problem by making all their cards expensive.
      Ha! Their cards are now expensive, stupid expensive and if you have to ask you can't afford it expensive. Still though the merely expensive ones appear to not cannibalise the stupid expensive ones.
      Though some of that I is companies doing shitty deals with Dell who only offer overpriced and often slower Quadro cards on their workstations...
      I was there at the beginning of the consumer GPUs... VooDoo 1 and 2, Riva TNT and TNT2, Permedia 2, P

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

AMD's CUDA Implementation Built On ROCm Is Now Open Source (phoronix.com) 29

AMD's CUDA Implementation Built On ROCm Is Now Open Source More Login

AMD's CUDA Implementation Built On ROCm Is Now Open Source

"CUDA implementation built for Radeon GPUs" (Score:5, Interesting)

Re:"CUDA implementation built for Radeon GPUs" (Score:5, Interesting)

Re: "CUDA implementation built for Radeon GPUs" (Score:4, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Leverage (Score:2)

Re: (Score:2)

Re: (Score:2)

This is dead in the water pretty much (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: This is dead in the water pretty much (Score:2)

Fan boys here (Score:4, Insightful)

Re: (Score:3)

Re: (Score:3)

Re: (Score:3)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot