Until the late 1990s, the concept of a 3D accelerator card was something generally associated with high-end workstations. Video games and kin would run happily on the CPU in one’s desktop system, with later extensions like MMX, 3DNow!, SSE, etc. providing a significant performance boost for games that supported it. As 3D accelerator cards (colloquially called graphics processing unit, or GPU) became prevalent, they took over almost all SIMD vector tasks, but one thing which they’re not good at is being a general parallel computer. While working on a software project this really ticked [Raph Levien] off and inspired him to cover his grievances.
Although the interaction between CPUs and GPUs has become tighter over the decades, with PCIe in particular being a big improvement over AGP & PCI, GPUs are still terrible at running arbitrary computing tasks and PCIe links are still glacial compared to communication within the GPU & CPU dies. With the introduction of asynchronous graphic APIs this divide became even more intense. The proposal thus is to invert this relationship.
There’s precedent for this already, with Intel’s Larrabee and IBM’s Cell processor merging CPU and GPU characteristics on a single die, though both struggled with developing for such a new kind of architecture. Sony’s PlayStation 3 was forced to add a GPU due to these issues. There is also the DirectStorage API in DirectX which bypasses the CPU when loading assets from storage, effectively adding CPU features to GPUs.
As [Raph] notes, so-called AI accelerators also have these characteristics, with often multiple SIMD-capable, CPU-like cores. Maybe the future is Cell after all.
2 thoughts on “Musings On A Good Parallel Computer”
The Super Nintendo had used co-processors in early 90s already.
Or “Mappers” in NES or Gameboy terminology.
SuperFX chip might be most popular, but there also had been 6502 derivatives (or 65C816 derivatives rather) doing the work of the main processor.
https://en.wikipedia.org/wiki/List_of_Super_NES_enhancement_chips
“Until the late 1990s, the concept of a 3D accelerator card was something generally associated with high-end workstations.
Video games and kin would run happily on the CPU in one’s desktop system, with later extensions like MMX, 3DNow!, SSE, etc.
providing a significant performance boost for games that supported it.”
That’s true, I remember the MMX being hyped at the time Windows 98 was new.
It was neat, because it was mapped on the existing x87 registers.
Seems like a waste, but it solved an issue: context-switching.
Many older multitasking environments such as Windows 3.x did save/restore x87 status when multiple applications were running and trying to use math co-processor.
So sharing the math co-processor didn’t cause a mess.
MMX registers mapped on x87 registers were saved/restore same way, so MMX could be used in such environments without any changes.
(SSE could be used by applications under older single-tasking systems such as DOS, too.)
Back when MMX was new, there had been visions of MMX replacing 3D accelerators and modems.
MMX was being considered to implement software-based DSPs, which could do a lot of things. High-speed modems, voice recognition, 3D sound, rendering, virtual reality etc.
Photoshop and other drawing programs used MMX optimized filters.
The positive side was that the processor doing 3D calculations had fast access to caches and main memory,
while 3D accelerators such as 3dfx Voodoo or NEC PowerVR had to hog the PCI bus.
Data had to be copied between main memory and graphics memory, slowing things down.
Unfortunately, MMX was integer and could have had benefited greatly by working in tandem with something like an x87 FPU (80×87 was also capable of integer math btw).
AMD’s 3DNow! Instructions sorta combined the strenghts of both MMX and x87 FPU on top of the old math co-processor registers, but wasn’t adopted by intel (VIA oder Cyrix did adopt 3DNow!).
Btw, also interesting were so-called “Transputer” cards for parallel processing.
They had their own memory and processor and were being programmed by the host system.
LISP systems had been built on Transputer technology, too, I think.
That’s interesting insofar, because LISP used to be related to A.I. (things like neural nets, expert systems etc).
Transputer was both a series of inMOS processors (T800 etc), as well as a concept, I think.
https://en.wikipedia.org/wiki/Transputer
https://www.abortretry.fail/p/inmos-and-the-transputer
