CUDA, But Make It AMD

July 16, 2024

Compute Unified Device Architecture, or CUDA, is a software platform for doing big parallel calculation tasks on NVIDIA GPUs. It’s been a big part of the push to use GPUs for general purpose computing, and in some ways, competitor AMD has thusly been left out in the cold. However, with more demand for GPU computation than ever, there’s been a breakthrough. SCALE from [Spectral Compute] will let you compile CUDA applications for AMD GPUs.

SCALE allows CUDA programs to run as-is on AMD GPUs, without modification. The SCALE compiler is also intended as a drop-in swap for nvcc, right down to the command line options. For maximum ease of use, it acts like you’ve installed the NVIDIA Cuda Toolkit, so you can build with cmake just like you would for a normal NVIDIA setup. Currently, Navi 21 and Navi 31 (RDNA 2.0 and RDNA 3.0) targets are supported, while a number of other GPUs are undergoing testing and development.

The basic aim is to allow developers to use AMD hardware without having to maintain an entirely separate codebase. It’s still a work in progress, but it’s a promising tool that could help break NVIDIA’s stranglehold on parts of the GPGPU market.

40 thoughts on “CUDA, But Make It AMD”

Kryptylomese says:

July 16, 2024 at 11:33 am

Games and graphical applications run on both nVIDIA and AMD, on all the major operating systems. An abstraction layer for CUDA doesn’t seem impossible!

Report comment

Reply
1. jpa says:
  
  July 16, 2024 at 11:52 am
  
  OpenCL exists and works. It has historically been slower than CUDA, but newest benchmarks put them very close to each other performance-wise. Library availability is another matter, where CUDA has some advantage.
  
  Report comment
  
  Reply
  1. combinatorylogic says:
    
    July 17, 2024 at 4:06 am
    
    OpenCL was never “slower”, it was less performance-portable than CUDA (because CUDA only ran on a single line of hardware and did not need to be performance-portable).
    
    Report comment
    
    Reply
2. TG says:
  
  July 16, 2024 at 12:19 pm
  
  If you’re using CUDA you want to squeeze as much speed out of it as possible (i.e. if you are using CUDA you are probably trying to make money in one way or another)
  
  Report comment
  
  Reply
  1. GeneralFault says:
    
    July 16, 2024 at 2:14 pm
    
    Maybe. Though I was using it recently to experiment with image segmentation to apply background blur to a webcam feed. In that case, I would rather have something that supported a larger user population and didn’t use up all available CPU resources. Faster than CPU would be fast enough in this application.
    
    Report comment
    
    Reply
  2. Jon H says:
    
    July 16, 2024 at 3:21 pm
    
    “probably trying to make money in one way or another”
    
    Or save money, if CUDA means less cloud GPU time to pay for. Especially if you’re doing research in an academic context. Gotta stretch those grant dollars.
    
    Report comment
    
    Reply
  3. spaceminions says:
    
    July 17, 2024 at 1:08 pm
    
    Even if you weren’t trying to make any money, plenty of the free, low-entry-barrier things you might want to make use of funnel you towards cuda. For example pytorch forced me to borrow a nvidia gpu in the past, in order to mess with some voice related machine learning stuff. I later messed with the image synthesis stuff once that became popular, but it can be fairly slow on CPU.
    
    Report comment
    
    Reply
3. tester says:
  
  July 17, 2024 at 9:56 am
  
  Do they have something equivalent to cuBLAS or cuFFT? Without that they are not going anywhere near cuda core users.
  
  Report comment
  
  Reply
paulvdh says:

July 16, 2024 at 11:46 am

Any numbers on performance comparisons?

This triggers a memory from a floating point emulator I had running on my 80386 to do AutoCAD (No not an autocrat,you silly spelling thing) at home back in the ’90-ies. It worked, but it was terribly slow. After a while you learn exactly how far you can zoom in and out without triggering a “regenerate”, because that was a two minute wait.

Report comment

Reply
1. alialiali says:
  
  July 16, 2024 at 12:59 pm
  
  This is what it will ultimately come down to.
  
  But I don’t think this is intended for big computations where it’s likely even small performance differences will translate to significant sums (or hours).
  
  I bet this will be more useful for the I just want to run this tool locally for some reason.
  
  Still it will add pressure to Nvidia which is helpful.
  
  Report comment
  
  Reply
2. T says:
  
  July 16, 2024 at 10:27 pm
  
  Look into MIGraphX and the ROCm stack. There is already support for drop in replacement features for AMD GPUs
  
  Report comment
  
  Reply
3. ziggurat29 says:
  
  July 17, 2024 at 7:09 am
  
  Some numbers would indeed be useful.
  I suspect that the metaphor of an emulator is perhaps not the best, and I’d suggest instead that of a cross-compiler.
  So some interesting numbers might be found from running:
  * test algorithm implemented in CUDA and ROCm running on same silicon (presumably AMD)
  * test algorithm implemented in CUDA running through this ‘cross compiler’
  This might show what the abstraction penalty is, if any.
  I’m sure there is some room for variability just in the quality of code that the ‘cross compiler’ generates.
  It is my (admittedly limited) understanding of the subject that the majority of the performance gains comes from the underlying compute architecture, the volume of memory, and memory bandwidth, and with FLOPs having a smaller contribution. The user code defines the compute pipeline which is then expected to scorch through a truly large amount of data. So, will SCALE know how to do this effectively for the cross-target target silicon, or will it do it naively?
  
  Report comment
  
  Reply
scott_tx says:

July 16, 2024 at 12:14 pm

Nvidia’s lawyers are warming up their brief cases.

Report comment

Reply
1. TG says:
  
  July 16, 2024 at 12:21 pm
  
  Eh.. I think they want to keep a pet competitor around for antitrust purposes. Kind of funny that nvidia and intel both have the same duopoly prop-up operation.
  If AMD was failing too hard, they would pay to keep it alive so they didn’t have to risk being broken up
  
  Report comment
  
  Reply
  1. GeneralFault says:
    
    July 16, 2024 at 2:19 pm
    
    Have you noticed that AMD has approx double the market cap of Intel right now? And AMD has been growing whereas Intel has been sinking. NVIDIA on the other hand…. wow!
    
    Report comment
    
    Reply
    1. Conor Stewart says:
      
      July 18, 2024 at 9:24 am
      
      Yeah AMD and Intel is a more fair competition and AMD is crushing Intel it seems. However AMD vs Nvidia is something different altogether where AMD will seriously struggle to compete due to everything working better using Nvidia’s proprietary technology and most things being made to use Nvidia GPUs. AMD can mostly compete with Nvidia in low and mid range gaming but for professionals especially for AI or high end gaming you still need Nvidia.
      
      Report comment
      
      Reply
      1. spaceminions says:
        
        July 22, 2024 at 10:03 am
        
        High end gaming has gotten so high end that the terms have lost their meaning; maybe you could term the range AMD doesn’t offer as ultra high end. Very few people should care that AMD’s fastest option is not as fast as Nvidia’s since even AMD’s is $800 and Nvidia’s is double that. For most people those cards are just for the halo effect. People should care about the differences in ray tracing and upscaling and such, just like support for the compute that’s the subject of this article. But figuratively, just because the racing version of your camaro is faster than the racing version of some guy’s mustang doesn’t mean your base model 4cyl is necessarily any better than his base model 4cyl.
        
        Report comment
    2. Miles says:
      
      July 18, 2024 at 10:53 am
      
      I for one like having an underdog. Built a 5ghz (P-Core) 20-core system for $500 (yes that’s a used Z690 and a 14700 non-K), and that includes an Antec case, RGB and a good OEM Gold 600w PSU. But I’ve also built a 5700G in an HP 25L case for ~$400 so YMMV. Rumours are there may be a 12 P-core only chip coming to LGA1700, that I’d like to see.
      
      Got an RX 6800 for $190. So it would be nice if more software like Photoshop would work on AMD
      
      Report comment
      
      Reply
      1. spaceminions says:
        
        July 22, 2024 at 10:09 am
        
        Nice! Have you experienced the instability / possible degradation that’s been going on and under investigation with 13th-14th gen intel chips? I’ve avoided them where I can because I liked the AMD options in my price range better than the way Intel did their P and E cores, and from the sounds of it I’m glad I did.
        
        Report comment
2. Kafka says:
  
  July 16, 2024 at 12:21 pm
  
  Yes, hopefully they will be brief cases.
  
  Report comment
  
  Reply
3. Truth says:
  
  July 16, 2024 at 1:13 pm
  
  Or NVIDIA is dissecting it and looking for new CUDA commands to add that it will have lower performance on AMD hardware. Basically AMD now have to play catch-up with a standard fully controller by NVIDIA. NVIDIA internally will know the standard from day one, issue a newer revision of their software and there will be a delay before AMD can add any new commands. NVIDIA will always have the latest revision and AMD will be at best no versions behind after a delay.
  
  Report comment
  
  Reply
  1. GeneralFault says:
    
    July 16, 2024 at 2:18 pm
    
    Have you noticed that AMD has approx double the market cap of Intel right now? And AMD has been growing whereas Intel has been sinking. NVIDIA on the other hand…. wow!
    
    Report comment
    
    Reply
    1. GeneralFault says:
      
      July 16, 2024 at 2:19 pm
      
      oops, wrong reply. See reply to TG.
      
      Report comment
      
      Reply
  2. Gravis says:
    
    July 16, 2024 at 2:57 pm
    
    That would be true if AMD were the one writing the software. However, this is completely third-party and thus not a threat.
    
    Report comment
    
    Reply
Jon H says:

July 16, 2024 at 12:44 pm

Now if they could do a version for Metal on Apple chips.

Report comment

Reply
Linux user says:

July 16, 2024 at 1:08 pm

there is project https://github.com/vosen/ZLUDA Cuda drop in for amd

Report comment

Reply
1. Gravis says:
  
  July 16, 2024 at 2:55 pm
  
  Ugh… Rust. Pass.
  
  Report comment
  
  Reply
  1. pelrun says:
    
    July 16, 2024 at 11:00 pm
    
    Thank you for contributing absolutely nothing of value to this conversation.
    
    Report comment
    
    Reply
    1. scott_tx says:
      
      July 17, 2024 at 6:07 pm
      
      keep off his lawn too.
      
      Report comment
      
      Reply
ericmoderbacher says:

July 16, 2024 at 3:06 pm

https://github.com/ROCm/HIP

Report comment

Reply
deL says:

July 16, 2024 at 4:02 pm

NVIDIA – New Videocard Inve$tment Defunct Instant Applied.

Report comment

Reply
UT says:

July 16, 2024 at 6:32 pm

Are the results of this on AMD numerically identical to results from nVidia hardware?

Report comment

Reply
1. D says:
  
  July 16, 2024 at 9:07 pm
  
  Speaking for the machine learning engineers: we don’t care. Stochastic Gradient Descent is pretty resilient. You can truncate our floats brutally and we won’t even notice.
  Speaking for the crypto folks: oh noes my hashes might get rejected!!!
  
  Report comment
  
  Reply
Geo says:

July 16, 2024 at 10:29 pm

Going to leave this here: https://github.com/ROCm/HIP

Report comment

Reply
1. Greg A says:
  
  July 17, 2024 at 7:02 am
  
  “The header file contains mostly inlined functions and thus has very low overhead”
  
  bah humbug. it may be very good. the readme didn’t really answer my questions, and i’m not going to dig further. but inlining is not worth thinking about. function calls are cheap. everyone these days uses a good ABI with cheap function calls. “inline makes function calls cheap” is C++’s original sin. by the time Stroustrup was working on C++, Guy Steele had already debunked this fallacy. so much of C++ practice — both at the language committee and in the wild — is meditating on inlining.
  
  inlining isn’t fast. it just destructures your code. i’m pretty pessimistic about the output of anyone who has spent their day meditating on inlining.
  
  Report comment
  
  Reply
  1. Geo says:
    
    July 17, 2024 at 10:56 am
    
    They do have a TensorRT equivalent too
    
    https://github.com/ROCm/AMDMIGraphX
    
    Report comment
    
    Reply
coru says:

July 17, 2024 at 2:01 am

Development in this direction makes me realise, that the work I am currently doing to get sycl up and running might be in vain. I honestly cant fathom how many different projects and ways this is implemented.

Report comment

Reply
Greg A says:

July 17, 2024 at 6:51 am

an anecdotal report…i was curious about all this OpenCL stuff so i wanted to see what it’s like from a programmer perspective, just to dip a toe in and see if the water is warm.

it’s a neat API. at runtime you pass it a “kernel” as a string, and it compiles it to whatever its internal representation is. the kernel looks like a bit of C code but obviously has a bunch of restrictions and a few special features.

started out running on my Celeron N4000 (UHD 600) and its performance was abysmal. just like OpenGL, i struggled to know: is it accelerated at all or has it silently fallen back on host CPU software emulation? does the idiom in my kernel match the features of the accelerator? is the compilation overhead swamping my test case? is copying overhead swamping my test case?

i don’t like how opaque OpenGL and OpenCL are. i don’t understand the interfaces between kernel and userland, and the libraries are so deeply layered and are designed to mask a lot of it. like, i’m surprised as heck to say that acceleration seems to work under lxc / docker — how does it allocate this resource?? just like the late 90s, i know if hardware GL works by whether or not my videogames are fast. unlike the 90s, i can’t be sure my CPU isn’t fast enough to do glxgears in software.

so i was puzzling this out and decided to try it on my AMD Ryzen 3 2200G (Radeon Vega 8). same example that ‘worked’ on my laptop. the program crashed immediately. it was unkillable. after running the program, clinfo and radeontop would also crash. i got fed up quickly. my load average was pegged at 4.00 for months until i rebooted, from all the zombie processes.

so i really don’t know but overall i’m not impressed. it seems like people are hacking without a thought for anything but performance. if i had a real goal, i’d bother to go through the debugging process but i just wanted to know if the water is warm. it isn’t.

Report comment

Reply
reg says:

July 17, 2024 at 1:26 pm

The sad bottom line is if you need a heavy duty GPU is is going to be expensive. I have been playing with a bunch of AI stuff and have been looking into things and it is not a poor mans hobby. You need a machine that it will physically fit in and interface with, than you need a much larger power supply, and then the card itself.

If you already have the compatible AMD hardware and have some time on your hands, it might be something to play with. If you are a serious user in a production environment, I would be looking for the most “plug and play” solution, At this point in time I would not make promises based on a new project.

As a hobbyist and it really pains me to say this, but you are probably better off just renting gpu/tpu time from Google via colab. You get high end hardware to play with, you do not have to worry about it depreciating when you are not using it, you do not have to deal with power or cooling, and I suspect at my level of usership, I would never in the rest of my life use near the amount of time it would cost to get a mid scale gpu and house, power, and cool it.

But I do with the project success. I suspect there are a lot of people with AMD hardware that would like to be able to run CUDA transparently.

Report comment

Reply
Simon Schäfer says:

July 18, 2024 at 7:14 am

Can’t wait for NVIDIA to shut it down again like the last project

Report comment

Reply

Hackaday

CUDA, But Make It AMD

40 thoughts on “CUDA, But Make It AMD”

Leave a ReplyCancel reply

Search

Never miss a hack

If you missed it

Ore Formation: Introduction And Magmatic Processes

Remembering James Lovell: The Man Who Cheated Death In Space

Smartphone Hackability, Or, A Pocket Computer That Isn’t

VRML And The Dream Of Bringing 3D To The World Wide Web

Australia’s Space Program Finally Gets Off The Pad, But Only Barely

Our Columns

FLOSS Weekly Episode 842: Will The Real JQ Please Stand Up

The World’s First Podcaster?

Design Review: LattePanda Mu NAS Carrier

Neon Bulbs? They’re A Gas!

Hackaday Links: August 10, 2025

40 thoughts on “CUDA, But Make It AMD”

Leave a ReplyCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns