Import GPU: Python Programming With CUDA

February 25, 2025 by Bryan Cockfield 17 Comments

Every few years or so, a development in computing results in a sea change and a need for specialized workers to take advantage of the new technology. Whether that’s COBOL in the 60s and 70s, HTML in the 90s, or SQL in the past decade or so, there’s always something new to learn in the computing world. The introduction of graphics processing units (GPUs) for general-purpose computing is perhaps the most important recent development for computing, and if you want to develop some new Python skills to take advantage of the modern technology take a look at this introduction to CUDA which allows developers to use Nvidia GPUs for general-purpose computing.

Of course CUDA is a proprietary platform and requires one of Nvidia’s supported graphics cards to run, but assuming that barrier to entry is met it’s not too much more effort to use it for non-graphics tasks. The guide takes a closer look at the open-source library PyTorch which allows a Python developer to quickly get up-to-speed with the features of CUDA that make it so appealing to researchers and developers in artificial intelligence, machine learning, big data, and other frontiers in computer science. The guide describes how threads are created, how they travel along within the GPU and work together with other threads, how memory can be managed both on the CPU and GPU, creating CUDA kernels, and managing everything else involved largely through the lens of Python.

Getting started with something like this is almost a requirement to stay relevant in the fast-paced realm of computer science, as machine learning has taken center stage with almost everything related to computers these days. It’s worth noting that strictly speaking, an Nvidia GPU is not required for GPU programming like this; AMD has a GPU computing platform called ROCm but despite it being open-source is still behind Nvidia in adoption rates and arguably in performance as well. Some other learning tools for GPU programming we’ve seen in the past include this puzzle-based tool which illustrates some of the specific problems GPUs excel at.

Learn GPU Programming With Simple Puzzles

September 25, 2024 by Dave Rowntree 9 Comments

Have you wanted to get into GPU programming with CUDA but found the usual textbooks and guides a bit too intense? Well, help is at hand in the form of a series of increasingly difficult programming ‘puzzles’ created by [Sasha Rush]. The first part of the simplification is to utilise the excellent NUMBA python JIT compiler to allow easy-to-understand code to be deployed as GPU machine code. Working on these puzzles is even easier if you use this linked Google Colab as your programming environment, launching you straight into a Jupyter notebook with the puzzles laid out. You can use your own GPU if you have one, but that’s not detailed.

The puzzles start, assuming you know nothing at all about GPU programming, which is totally the case for some of us! What’s really nice is the way the result of the program operation is displayed, showing graphically how data are read and written to the input and output arrays you’re working with. Each essential concept for CUDA programming is identified one at a time with a real programming example, making it a breeze to follow along. Just make sure you don’t watch the video below all the way through the first time, as in it [Sasha] explains all the solutions!

Confused about why you’d want to do this? Then perhaps check out our guide to CUDA first. We know what you’re thinking: how do we use non-nVIDIA hardware? Well, there’s SCALE for that! Finally, once you understand CUDA, why not have a play with WebGPU?

Continue reading “Learn GPU Programming With Simple Puzzles” →

CUDA, But Make It AMD

July 16, 2024 by Lewin Day 40 Comments

Compute Unified Device Architecture, or CUDA, is a software platform for doing big parallel calculation tasks on NVIDIA GPUs. It’s been a big part of the push to use GPUs for general purpose computing, and in some ways, competitor AMD has thusly been left out in the cold. However, with more demand for GPU computation than ever, there’s been a breakthrough. SCALE from [Spectral Compute] will let you compile CUDA applications for AMD GPUs.

SCALE allows CUDA programs to run as-is on AMD GPUs, without modification. The SCALE compiler is also intended as a drop-in swap for nvcc, right down to the command line options. For maximum ease of use, it acts like you’ve installed the NVIDIA Cuda Toolkit, so you can build with cmake just like you would for a normal NVIDIA setup. Currently, Navi 21 and Navi 31 (RDNA 2.0 and RDNA 3.0) targets are supported, while a number of other GPUs are undergoing testing and development.

The basic aim is to allow developers to use AMD hardware without having to maintain an entirely separate codebase. It’s still a work in progress, but it’s a promising tool that could help break NVIDIA’s stranglehold on parts of the GPGPU market.

NeRF: Shoot Photos, Not Foam Darts, To See Around Corners

June 22, 2022 by Michael Shaub 9 Comments

Readers are likely familiar with photogrammetry, a method of creating 3D geometry from a series of 2D photos taken of an object or scene. To pull it off you need a lot of pictures, hundreds or even thousands, all taken from slightly different perspectives. Unfortunately the technique suffers where there are significant occlusions caused by overlapping elements, and shiny or reflective surfaces that appear to be different colors in each photo can also cause problems.

But new research from NVIDIA marries photogrammetry with artificial intelligence to create what the developers are calling an Instant Neural Radiance Field (NeRF). Not only does their method require far fewer images, as little as a few dozen according to NVIDIA, but the AI is able to better cope with the pain points of traditional photogrammetry; filling in the gaps of the occluded areas and leveraging reflections to create more realistic 3D scenes that reconstruct how shiny materials looked in their original environment.

If you’ve got a CUDA-compatible NVIDIA graphics card in your machine, you can give the technique a shot right now. The tutorial video after the break will walk you through setup and some of the basics, showing how the 3D reconstruction is progressively refined over just a couple of minutes and then can be explored like a scene in a game engine. The Instant-NeRF tools include camera-path keyframing for exporting animations with higher quality results than the real-time previews. The technique seems better suited for outputting views and animations than models for 3D printing, though both are possible.

Don’t have the latest and greatest NVIDIA silicon? Don’t worry, you can still create some impressive 3D scans using “old school” photogrammetry — all you really need is a camera and a motorized turntable.

Continue reading “NeRF: Shoot Photos, Not Foam Darts, To See Around Corners” →

What Kind Of GPU Are You?

July 22, 2021 by Al Williams 9 Comments

In the old days, big computers often had some form of external array processor. The idea is you could load a bunch of numbers into the processor and then do some math operations on all of the numbers in parallel. These days, you are more likely to turn to your graphics card for number crunching support. You’ll usually use some library to help you do that, but things are always better when you understand what’s going on under the hood. That’s why we enjoyed [RasterGrid’s] post on GPU architecture types.

If you can tell the difference between IMR (immediate mode) and TBR (tile-based) rendering this might not be the post for you. But while we knew the terms, we found a lot of interesting detail including some graphics and pseudo code that clarified the key differences.

Continue reading “What Kind Of GPU Are You?” →

Deep Learning Enables Intuitive Prosthetic Control

May 27, 2021 by Lewin Day 7 Comments

Prosthetic limbs have been slow to evolve from simple motionless replicas of human body parts to moving, active devices. A major part of this is that controlling the many joints of a prosthetic is no easy task. However, researchers have worked to simplify this task, by capturing nerve signals and allowing deep learning routines to figure the rest out.

The prosthetic arm under test actually carries a NVIDIA Jetson Nano onboard to run the AI nerve signal decoder algorithm.

Reported in a pre-published paper, researchers used implanted electrodes to capture signals from the median and ulnar nerves in the forearm of Shawn Findley, who had lost a hand to a machine shop accident 17 years prior. An AI decoder was then trained to decipher signals from the electrodes using an NVIDIA Titan X GPU.

With this done, the decoder model could then be run on a significantly more lightweight system consisting of an NVIDIA Jetson Nano, which is small enough to mount on a prosthetic itself. This allowed Findley to control a prosthetic hand by thought, without needing to be attached to any external equipment. The system also allowed for intuitive control of Far Cry 5, which sounds like a fun time as well.

The research is exciting, and yet another step towards full-function prosthetics becoming a reality. The key to the technology is that models can be trained on powerful hardware, but run on much lower-end single-board computers, avoiding the need for prosthetic users to carry around bulky hardware to make the nerve interface work. If it can be combined with a non-invasive nerve interface, expect this technology to explode in use around the world.

[Thanks to Brian Caulfield for the tip!]

NVIDIA Announces $59 Jetson Nano 2GB, A Single Board Computer With Makers In Mind

October 5, 2020 by Tom Nardi 47 Comments

NVIDIA kicked off their line of GPU-accelerated single board computers back in 2014 with the Jetson TK1, a $200 USD development system for those looking to get involved with the burgeoning world of so-called “edge computing”. It was designed to put high performance computing in a small and energy efficient enough package that it could be integrated directly into products, rather than connecting to a data center half-way across the world.

The TK1 was an impressive piece of hardware, but not something the hacker and maker community was necessarily interested in. For one thing, it was fairly expensive. But perhaps more importantly, it was clearly geared more towards industry types than consumers. We did see the occasional project using the TK1 and the subsequent TX1 and TX2 boards, but they were few and far between.

Then came the Jetson Nano. Its 128 core Maxwell CPU still packed plenty of power and was fully compatible with NVIDIA’s CUDA architecture, but its smaller size and $99 price tag made it far more attractive for hobbyists. According to the company’s own figures, the number of active Jetson developers has more than tripled since the Nano’s introduction in March of 2019. With the platform accessible to a larger and more diverse group of users, new and innovative applications for machine learning started pouring in.

Cutting the price of the entry level Jetson hardware in half was clearly a step in the right direction, but NVIDIA wanted to bring even more developers into the fray. So why not see if lightning can strike twice? Today they’ve officially announced that the new Jetson Nano 2GB will go on sale later this month for just $59. Let’s take a close look at this new iteration of the Nano to see what’s changed (and what hasn’t) from last year’s model.

Continue reading “NVIDIA Announces $59 Jetson Nano 2GB, A Single Board Computer With Makers In Mind” →