CUDA, But Make It AMD

Compute Unified Device Architecture, or CUDA, is a software platform for doing big parallel calculation tasks on NVIDIA GPUs. It’s been a big part of the push to use GPUs for general purpose computing, and in some ways, competitor AMD has thusly been left out in the cold. However, with more demand for GPU computation than ever, there’s been a breakthrough. SCALE from [Spectral Compute] will let you compile CUDA applications for AMD GPUs.

SCALE allows CUDA programs to run as-is on AMD GPUs, without modification. The SCALE compiler is also intended as a drop-in swap for nvcc, right down to the command line options. For maximum ease of use, it acts like you’ve installed the NVIDIA Cuda Toolkit, so you can build with cmake just like you would for a normal NVIDIA setup. Currently, Navi 21 and Navi 31 (RDNA 2.0 and RDNA 3.0) targets are supported, while a number of other GPUs are undergoing testing and development.

The basic aim is to allow developers to use AMD hardware without having to maintain an entirely separate codebase. It’s still a work in progress, but it’s a promising tool that could help break NVIDIA’s stranglehold on parts of the GPGPU market.

 

The ’80s Multi-Processor System That Never Was

Until the early 2000s, the computer processors available on the market were essentially all single-core chips. There were some niche layouts that used multiple processors on the same board for improved parallel operation, and it wasn’t until the POWER4 processor from IBM in 2001 and later things like the AMD Opteron and Intel Pentium D that we got multi-core processors. If things had gone just slightly differently with this experimental platform, though, we might have had multi-processor systems available for general use as early as the 80s instead of two decades later.

The team behind this chip were from the University of Califorina, Berkeley, a place known for such other innovations as RAID, BSD, SPICE, and some of the first RISC processors. This processor architecture would be based on RISC as well, and would be known as Symbolic Processing Using RISC. It was specially designed to integrate with the Lisp programming language but its major feature was a set of parallel processors with a common bus that allowed for parallel operations to be computed at a much greater speed than comparable systems at the time. The use of RISC also allowed a smaller group to develop something like this, and although more instructions need to be executed they can often be done faster than other architectures.

The linked article from [Babbage] goes into much more detail about the architecture of the system as well as some of the things about UC Berkeley that made projects like this possible in the first place. It’s a fantastic deep-dive into a piece of somewhat obscure computing history that, had it been more commercially viable, could have changed the course of computing. Berkeley RISC did go on to have major impacts in other areas of computing and was a significant influence on the SPARC system as well.

Early “Computer Kit” Really Just A Fancy Calculator

We’re big fans of calculators, computers and vintage magazines, so when we see something at the intersection of all three we always take a look. Back in 1966, Electronics Illustrated included instructions in their November issue on building, in their words, a “Space-Age Decimal Computer!” using neon lamps, a couple of tubes, and lots of soldering. The article starts on page 39 and it’s made fairly clear that it will be an expensive and complicated project, but you will be paid back many times over by the use and experience you will get!

Our modern idea of a computer differs greatly from the definitions used in the past. As many readers likely know, “Computer” was actually a job title for a long time. The job of a computer was to sit with pen, paper, and later on electromechanical devices, and compute and tabulate long lists of numbers. Imagine doing payroll for large companies completely by hand, every month. The opportunity for errors was large and was just part of doing business. As analog and later transistor-based computers started to be developed, they replaced the jobs of human computers in calculating and tabulating numbers. This is why IBM was originally called the Computing, Recording and Tabulating Company!

Continue reading “Early “Computer Kit” Really Just A Fancy Calculator”

Startup Claims It Can Boost CPU Performance By 2-100X

Although Moore’s Law has slowed at bit as chip makers reach the physical limits of transistor size, researchers are having to look to other things other than cramming more transistors on a chip to increase CPU performance. ARM is having a bit of a moment by improving the performance-per-watt of many computing platforms, but some other ideas need to come to the forefront to make any big pushes in this area. This startup called Flow Computing claims it can improve modern CPUs by a significant amount with a slight change to their standard architecture.

It hopes to make these improvements by adding a parallel processing unit, which they call the “back end” to a more-or-less standard CPU, the “front end”. These two computing units would be on the same chip, with a shared bus allowing them to communicate extremely quickly with the front end able to rapidly offload tasks to the back end that are more inclined for parallel processing. Since the front end maintains essentially the same components as a modern CPU, the startup hopes to maintain backwards compatibility with existing software while allowing developers to optimize for use of the new parallel computing unit when needed.

While we’ll take a step back and refrain from claiming this is the future of computing until we see some results and maybe a prototype or two, the idea does show some promise and is similar to some ARM computers which have multiple cores optimized for different tasks, or other computers which offload non-graphics tasks to a GPU which is more optimized for processing parallel tasks. Even the Raspberry Pi is starting to take advantage of external GPUs for tasks like these.

The World’s First DIY Minicomputer Was Almost Australian

The EDUC-8, a DIY minicomputer design that came out in “Electronics Australia” magazine, was almost the world’s first in August 1974. And it would have been tied for the world’s first if inventor [Jamieson “Jim” Rowe] hadn’t held back from publishing to rework the design to expand the memory to a full 256 bytes. The price of perfectionism?

Flash forward 50 years, and [Gwyllym Suter] has taken on the job of recreating the EDUC-8 using modern PCBs, but otherwise staying true to the all-TTL design. He has all of his schematics up on the project’s GitHub, but has also sent us a number of beauty shots that we’re including below. Other than the progress of PCB tech and the very nice 3D-printed housing, they look identical. We have to admit that we love those wavy hand-drawn traces on the original, but we wouldn’t be sad about not having to solder in all those jumpers.

Continue reading “The World’s First DIY Minicomputer Was Almost Australian”

Comparing X86 And 68000 In An FPGA

[Michael Kohn] started programming on the Motorola 68000 architecture and then, for work reasons, moved over to the Intel x86 and was not exactly pleased by the latter chip’s perceived shortcomings. In the ’80s, the 68000 was a very popular chip, powering everything from personal computers to arcade machines, and looking at its architecture and ease of programming, you can see why this was.

Fast-forward a few years, and [Michael] decided to implement both cores in an FPGA to compare real applications, you know, for science. As an extra bonus, he also compares the performance of a minimal RISC-V implementation on the same hardware, taken from an earlier RISC-V project (which you should also check out !)

Utilizing their ‘Java Grinder’ application (also pretty awesome, especially the retro console support), a simple Mandelbrot fractal generator was used as a non-trivial workload to produce binaries for each architecture, and the result was timed. Unsurprisingly, for CISC architectures, the 68000 and x86 code sizes were practically identical and significantly smaller than the equivalent RISC-V. Still, looking at the execution times, the 68000 beat the x86 hands down, with the newer RISC-V speeding along to take pole position. [Michael] admits that these implementations are minimal, with no pipelining, so they could be sped up a little.

Also, it’s not a totally fair race. As you’ll note from the RISC-V implementation, there was a custom RISC-V instruction implemented to perform the Mandelbrot generator’s iterator. This computes the complex operation Z = Z2 + C, which, as fellow fractal nerds will know, is where a Mandelbrot generator spends nearly all the compute time. We suspect that’s the real reason RISC-V came out on top.

If actual hardware is more your cup of tea, you could build a minimal 68k system pretty easily, provided you can find the chips. The current ubiquitous x86 architecture, as odd as it started out, is here to stay for the foreseeable future, so you’d just better get comfortable with it!

Continue reading “Comparing X86 And 68000 In An FPGA”

The Amiga We All Wanted In 1993

To be an Amiga fan during the dying days of the hardware platform back in the mid 1990s was to have a bleak existence indeed. Commodore had squandered what was to us the best computer ever with dismal marketing and a series of machines that were essentially just repackaged versions of the original. Where was a PCI Amiga with fast processors, we cried!

Now, thirty years too late, here’s [Jason Neus] with just the machine we wanted, in the shape of an ATX form factor Amiga motherboard with those all-important PCI slots and USB for keyboard and mouse.

What would have been unthinkable in the ’90s comes courtesy of an original or ECS Amiga chipset for the Amiga functions, and an FPGA and microcontroller for PCI and USB respectively. Meanwhile there’s also a PC floppy drive controller, based on work from [Ian Steadman]. The processor and RAM lives on a daughter card, and both 68040 and 68060 processors are supported.

Here in 2024 of course this is still a 1990s spec board, and misty-eyed speculation about what might have happened aside, it’s unlikely to become your daily driver. But that may not be the point, instead we should evaluate it for what it is. Implementing a PCI bus, even a 1990s one, is not without its challenges, and we’re impressed with the achievement.

If you’re interested in Amiga post-mortems, here’s a slightly different take.