Import GPU: Python Programming With CUDA

Every few years or so, a development in computing results in a sea change and a need for specialized workers to take advantage of the new technology. Whether that’s COBOL in the 60s and 70s, HTML in the 90s, or SQL in the past decade or so, there’s always something new to learn in the computing world. The introduction of graphics processing units (GPUs) for general-purpose computing is perhaps the most important recent development for computing, and if you want to develop some new Python skills to take advantage of the modern technology take a look at this introduction to CUDA which allows developers to use Nvidia GPUs for general-purpose computing.

Of course CUDA is a proprietary platform and requires one of Nvidia’s supported graphics cards to run, but assuming that barrier to entry is met it’s not too much more effort to use it for non-graphics tasks. The guide takes a closer look at the open-source library PyTorch which allows a Python developer to quickly get up-to-speed with the features of CUDA that make it so appealing to researchers and developers in artificial intelligence, machine learning, big data, and other frontiers in computer science. The guide describes how threads are created, how they travel along within the GPU and work together with other threads, how memory can be managed both on the CPU and GPU, creating CUDA kernels, and managing everything else involved largely through the lens of Python.

Getting started with something like this is almost a requirement to stay relevant in the fast-paced realm of computer science, as machine learning has taken center stage with almost everything related to computers these days. It’s worth noting that strictly speaking, an Nvidia GPU is not required for GPU programming like this; AMD has a GPU computing platform called ROCm but despite it being open-source is still behind Nvidia in adoption rates and arguably in performance as well. Some other learning tools for GPU programming we’ve seen in the past include this puzzle-based tool which illustrates some of the specific problems GPUs excel at.

A screen capture from Portal 2 running in Asahi Linux. The Asahi Linux logo is in the bottom right of the image as a watermark. The environment is a concrete and glass building with elements of nature taking over the room on the other side of the glass from the character. A red circle with a grey cube above it is in the foreground.

Asahi Linux Brings Better Gaming To Apple Silicon

For those of you longing for better gaming on an Apple Silicon device, Asahi Linux is here to help.

While Apple’s own line of CPUs are relatively new kids on the block, they’ve still been around for four years now, giving hackers ample time to dissect their innards. The team behind Asahi Linux has now brought us “the only conformant OpenGL®, OpenCL™, and Vulkan® drivers” for Apple’s M1 and M2.

The emulation overhead of the system means that most games will need at least 16 GB of RAM to run. Many games are playable, but newer titles can’t yet hit 60 frames per second. The developers are currently focused on “correctness” and hope to improve performance in future updates. Many indie titles are reported to already be working at full speed though.

You can hear more about some of the fiddly bits of how to “tessellate with arcane compute shaders” in the video below. Don’t worry, it’s only 40 minutes of the nine hour video and it should start right at the presentation by GPU dev [Alyssa Rosenzweig].

If you want to see some of how Linux on Apple Silicon started or some of the previous work on hacking the M1 GPU, we have you covered.

Continue reading “Asahi Linux Brings Better Gaming To Apple Silicon”

C64 Gets A Graphics Upgrade Courtesy Of Your Favorite Piano Manufacturer

The Commodore 64 was quite a machine in its time, though a modern assessment would say that it’s severely lacking in the graphical department. [Vossi] has whipped up a bit of an upgrade for the C64 and C128, in the form of a graphics expansion card running Yamaha hardware.

As you might expect, the expansion is designed to fit neatly into a C64 cartridge slot. The card runs the Yamaha V9958—the video display processor known for its appearance in the MSX2+ computers. In this case, it’s paired with a healthy 128 kB of video RAM so it can really do its thing. The V9958 has an analog RGB output that can be set for PAL or NTSC operation, and can perform at resolutions up to 512×212 or even 512×424 interlaced. Naturally, it needs to be hooked directly up to a compatible screen, like a 1084, or one with SCART input. [Vossi] took the time to create some demos of the chip’s capabilities, drawing various graphics in a way that the C64 couldn’t readily achieve on its own.

It’s a build that almost feels like its from an alternate universe, where Yamaha decided to whip up a third-party graphics upgrade for the C64. That didn’t happen, but stranger team ups have occurred over the years.

[Thanks to Stephen Walters for the tip!]

Learn GPU Programming With Simple Puzzles

Have you wanted to get into GPU programming with CUDA but found the usual textbooks and guides a bit too intense? Well, help is at hand in the form of a series of increasingly difficult programming ‘puzzles’ created by [Sasha Rush]. The first part of the simplification is to utilise the excellent NUMBA python JIT compiler to allow easy-to-understand code to be deployed as GPU machine code. Working on these puzzles is even easier if you use this linked Google Colab as your programming environment, launching you straight into a Jupyter notebook with the puzzles laid out. You can use your own GPU if you have one, but that’s not detailed.

The puzzles start, assuming you know nothing at all about GPU programming, which is totally the case for some of us! What’s really nice is the way the result of the program operation is displayed, showing graphically how data are read and written to the input and output arrays you’re working with. Each essential concept for CUDA programming is identified one at a time with a real programming example, making it a breeze to follow along. Just make sure you don’t watch the video below all the way through the first time, as in it [Sasha] explains all the solutions!

Confused about why you’d want to do this? Then perhaps check out our guide to CUDA first. We know what you’re thinking: how do we use non-nVIDIA hardware? Well, there’s SCALE for that! Finally, once you understand CUDA, why not have a play with WebGPU?

Continue reading “Learn GPU Programming With Simple Puzzles”

Hackaday Links Column Banner

Hackaday Links: September 15, 2024

A quick look around at any coffee shop, city sidewalk, or sadly, even at a traffic light will tell you that people are on their phones a lot. But exactly how much is that? For Americans in 2023, it was a mind-boggling 100 trillion megabytes, according to the wireless industry lobbying association CTIA. The group doesn’t discuss their methodology in the press release, so it’s a little hard to make judgments on that number’s veracity, or the other numbers they bandy about, such as the 80% increase in data usage since 2021, or the fact that 40% of data is now going over 5G connections. Some of the numbers are more than a little questionable, too, such as the claim that 330 million Americans (out of a current estimate of 345.8 million people) are covered by one or more 5G networks. Even if you figure that most 5G installations are in densely populated urban areas, 95% coverage seems implausible given that in 2020, 57.5 million people lived in rural areas of the USA. Regardless of the details, it remains that our networks are positively humming with data, and keeping things running is no mean feat.

Continue reading “Hackaday Links: September 15, 2024”

Startup Claims It Can Boost CPU Performance By 2-100X

Although Moore’s Law has slowed at bit as chip makers reach the physical limits of transistor size, researchers are having to look to other things other than cramming more transistors on a chip to increase CPU performance. ARM is having a bit of a moment by improving the performance-per-watt of many computing platforms, but some other ideas need to come to the forefront to make any big pushes in this area. This startup called Flow Computing claims it can improve modern CPUs by a significant amount with a slight change to their standard architecture.

It hopes to make these improvements by adding a parallel processing unit, which they call the “back end” to a more-or-less standard CPU, the “front end”. These two computing units would be on the same chip, with a shared bus allowing them to communicate extremely quickly with the front end able to rapidly offload tasks to the back end that are more inclined for parallel processing. Since the front end maintains essentially the same components as a modern CPU, the startup hopes to maintain backwards compatibility with existing software while allowing developers to optimize for use of the new parallel computing unit when needed.

While we’ll take a step back and refrain from claiming this is the future of computing until we see some results and maybe a prototype or two, the idea does show some promise and is similar to some ARM computers which have multiple cores optimized for different tasks, or other computers which offload non-graphics tasks to a GPU which is more optimized for processing parallel tasks. Even the Raspberry Pi is starting to take advantage of external GPUs for tasks like these.

Retrogadgets: The Ageia PhysX Card

Old computers meant for big jobs often had an external unit to crunch data in specific ways. A computer doing weather prediction, for example, might have an SIMD (single instruction multiple data) vector unit that could multiply a bunch of numbers by a constant in one swoop. These days, there are many computers crunching physics equations so you can play your favorite high-end computer game. Instead of vector processors, we have video cards. These cards have many processing units that can execute “kernels” or small programs on large groups of data at once.

Awkward Years

However, there was that awkward in-between stage when personal computers needed fast physics simulation, but it wasn’t feasible to put array processing and video graphics on the same board. Around 2006, a company called Ageia produced the PhysX card, which promised to give PCs the ability to do sophisticated physics simulations without relying on a video card.

Keep in mind that when this was built, multi-core CPUs were an expensive oddity and games were struggling to manage everything they needed to with limited memory and compute resources. The PhysX card was a “PPU” or Physics Processor Unit and used the PCI bus. Like many companies, Ageia made the chips and expected other companies — notably Asus — to make the actual board you’d plug into your computer.

Continue reading “Retrogadgets: The Ageia PhysX Card”