Using Integer Addition To Approximate Float Multiplication

Once the domain of esoteric scientific and business computing, floating point calculations are now practically everywhere. From video games to large language models and kin, it would seem that a processor without floating point capabilities is pretty much a brick at this point. Yet the truth is that integer-based approximations can be good enough to hit the required accuracy. For example, approximating floating point multiplication with integer addition, as [Malte Skarupke] recently had a poke at based on an integer addition-only LLM approach suggested by [Hongyin Luo] and [Wei Sun].

As for the way this works, it does pretty much what it says on the tin: adding the two floating point inputs as integer values, followed by adjusting the exponent. This adjustment factor is what gets you close to the answer, but as the article and comments to it illustrate, there are plenty of issues and edge cases you have to concern yourself with. These include under- and overflow, but also specific floating point inputs.

Unlike in scientific calculations where even minor inaccuracies tend to propagate and cause much larger errors down the line, graphics and LLMs do not care that much about float point precision, so the ~7.5% accuracy of the integer approach is good enough. The question is whether it’s truly more efficient as the paper suggests, rather than a fallback as seen with e.g. integer-only audio decoders for platforms without an FPU.

Since one of the nice things about FP-focused vector processors like GPUs and derivatives (tensor, ‘neural’, etc.) is that they can churn through a lot of data quite efficiently, the benefits of shifting this to the ALU of a CPU and expecting (energy) improvements seem quite optimistic.

Musings On A Good Parallel Computer

Until the late 1990s, the concept of a 3D accelerator card was something generally associated with high-end workstations. Video games and kin would run happily on the CPU in one’s desktop system, with later extensions like MMX, 3DNow!, and SSE providing a significant performance boost for games that supported them. As 3D accelerator cards (colloquially called graphics processing units, or GPUs) became prevalent, they took over almost all SIMD vector tasks, but one thing that they’re not good at is being a general-purpose parallel computer. This really ticked [Raph Levien] off and it inspired him to cover his grievances.

Although the interaction between CPUs and GPUs has become tighter over the decades, with PCIe in particular being a big improvement over AGP and PCI, GPUs are still terrible at running arbitrary computing tasks, and even PCIe links are still glacial compared to communication within the GPU and CPU dies. With the introduction of asynchronous graphic APIs this divide became even more intense. [Raph]’s proposal is to invert this relationship.

There’s precedent for this already, with Intel’s Larrabee and IBM’s Cell processor merging CPU and GPU characteristics on a single die, though both struggled with developing for such a new kind of architecture. Sony’s PlayStation 3 was forced to add a GPU due to these issues. There is also the DirectStorage API in DirectX, which bypasses the CPU when loading assets from storage, effectively adding CPU features to GPUs.

As [Raph] notes, so-called AI accelerators also have these characteristics, with often multiple SIMD-capable, CPU-like cores. Maybe the future is Cell after all.

Import GPU: Python Programming With CUDA

Every few years or so, a development in computing results in a sea change and a need for specialized workers to take advantage of the new technology. Whether that’s COBOL in the 60s and 70s, HTML in the 90s, or SQL in the past decade or so, there’s always something new to learn in the computing world. The introduction of graphics processing units (GPUs) for general-purpose computing is perhaps the most important recent development for computing, and if you want to develop some new Python skills to take advantage of the modern technology take a look at this introduction to CUDA which allows developers to use Nvidia GPUs for general-purpose computing.

Of course CUDA is a proprietary platform and requires one of Nvidia’s supported graphics cards to run, but assuming that barrier to entry is met it’s not too much more effort to use it for non-graphics tasks. The guide takes a closer look at the open-source library PyTorch which allows a Python developer to quickly get up-to-speed with the features of CUDA that make it so appealing to researchers and developers in artificial intelligence, machine learning, big data, and other frontiers in computer science. The guide describes how threads are created, how they travel along within the GPU and work together with other threads, how memory can be managed both on the CPU and GPU, creating CUDA kernels, and managing everything else involved largely through the lens of Python.

Getting started with something like this is almost a requirement to stay relevant in the fast-paced realm of computer science, as machine learning has taken center stage with almost everything related to computers these days. It’s worth noting that strictly speaking, an Nvidia GPU is not required for GPU programming like this; AMD has a GPU computing platform called ROCm but despite it being open-source is still behind Nvidia in adoption rates and arguably in performance as well. Some other learning tools for GPU programming we’ve seen in the past include this puzzle-based tool which illustrates some of the specific problems GPUs excel at.

A screen capture from Portal 2 running in Asahi Linux. The Asahi Linux logo is in the bottom right of the image as a watermark. The environment is a concrete and glass building with elements of nature taking over the room on the other side of the glass from the character. A red circle with a grey cube above it is in the foreground.

Asahi Linux Brings Better Gaming To Apple Silicon

For those of you longing for better gaming on an Apple Silicon device, Asahi Linux is here to help.

While Apple’s own line of CPUs are relatively new kids on the block, they’ve still been around for four years now, giving hackers ample time to dissect their innards. The team behind Asahi Linux has now brought us “the only conformant OpenGL®, OpenCL™, and Vulkan® drivers” for Apple’s M1 and M2.

The emulation overhead of the system means that most games will need at least 16 GB of RAM to run. Many games are playable, but newer titles can’t yet hit 60 frames per second. The developers are currently focused on “correctness” and hope to improve performance in future updates. Many indie titles are reported to already be working at full speed though.

You can hear more about some of the fiddly bits of how to “tessellate with arcane compute shaders” in the video below. Don’t worry, it’s only 40 minutes of the nine hour video and it should start right at the presentation by GPU dev [Alyssa Rosenzweig].

If you want to see some of how Linux on Apple Silicon started or some of the previous work on hacking the M1 GPU, we have you covered.

Continue reading “Asahi Linux Brings Better Gaming To Apple Silicon”

C64 Gets A Graphics Upgrade Courtesy Of Your Favorite Piano Manufacturer

The Commodore 64 was quite a machine in its time, though a modern assessment would say that it’s severely lacking in the graphical department. [Vossi] has whipped up a bit of an upgrade for the C64 and C128, in the form of a graphics expansion card running Yamaha hardware.

As you might expect, the expansion is designed to fit neatly into a C64 cartridge slot. The card runs the Yamaha V9958—the video display processor known for its appearance in the MSX2+ computers. In this case, it’s paired with a healthy 128 kB of video RAM so it can really do its thing. The V9958 has an analog RGB output that can be set for PAL or NTSC operation, and can perform at resolutions up to 512×212 or even 512×424 interlaced. Naturally, it needs to be hooked directly up to a compatible screen, like a 1084, or one with SCART input. [Vossi] took the time to create some demos of the chip’s capabilities, drawing various graphics in a way that the C64 couldn’t readily achieve on its own.

It’s a build that almost feels like its from an alternate universe, where Yamaha decided to whip up a third-party graphics upgrade for the C64. That didn’t happen, but stranger team ups have occurred over the years.

[Thanks to Stephen Walters for the tip!]

Learn GPU Programming With Simple Puzzles

Have you wanted to get into GPU programming with CUDA but found the usual textbooks and guides a bit too intense? Well, help is at hand in the form of a series of increasingly difficult programming ‘puzzles’ created by [Sasha Rush]. The first part of the simplification is to utilise the excellent NUMBA python JIT compiler to allow easy-to-understand code to be deployed as GPU machine code. Working on these puzzles is even easier if you use this linked Google Colab as your programming environment, launching you straight into a Jupyter notebook with the puzzles laid out. You can use your own GPU if you have one, but that’s not detailed.

The puzzles start, assuming you know nothing at all about GPU programming, which is totally the case for some of us! What’s really nice is the way the result of the program operation is displayed, showing graphically how data are read and written to the input and output arrays you’re working with. Each essential concept for CUDA programming is identified one at a time with a real programming example, making it a breeze to follow along. Just make sure you don’t watch the video below all the way through the first time, as in it [Sasha] explains all the solutions!

Confused about why you’d want to do this? Then perhaps check out our guide to CUDA first. We know what you’re thinking: how do we use non-nVIDIA hardware? Well, there’s SCALE for that! Finally, once you understand CUDA, why not have a play with WebGPU?

Continue reading “Learn GPU Programming With Simple Puzzles”

Hackaday Links Column Banner

Hackaday Links: September 15, 2024

A quick look around at any coffee shop, city sidewalk, or sadly, even at a traffic light will tell you that people are on their phones a lot. But exactly how much is that? For Americans in 2023, it was a mind-boggling 100 trillion megabytes, according to the wireless industry lobbying association CTIA. The group doesn’t discuss their methodology in the press release, so it’s a little hard to make judgments on that number’s veracity, or the other numbers they bandy about, such as the 80% increase in data usage since 2021, or the fact that 40% of data is now going over 5G connections. Some of the numbers are more than a little questionable, too, such as the claim that 330 million Americans (out of a current estimate of 345.8 million people) are covered by one or more 5G networks. Even if you figure that most 5G installations are in densely populated urban areas, 95% coverage seems implausible given that in 2020, 57.5 million people lived in rural areas of the USA. Regardless of the details, it remains that our networks are positively humming with data, and keeping things running is no mean feat.

Continue reading “Hackaday Links: September 15, 2024”