If you have a curiosity about how fancy graphics cards actually work, and why they are so well-suited to AI-type applications, then take a few minutes to read [Tim Dettmers] explain why this is so. It’s not a terribly long read, but while it does get technical there are also car analogies, so there’s something for everyone!
He starts off by saying that most people know that GPUs are scarily efficient at matrix multiplication and convolution, but what really makes them most useful is their ability to work with large amounts of memory very efficiently.
Essentially, a CPU is a latency-optimized device while GPUs are bandwidth-optimized devices. If a CPU is a race car, a GPU is a cargo truck. The main job in deep learning is to fetch and move cargo (memory, actually) around. Both devices can do this job, but in different ways. A race car moves quickly, but can’t carry much. A truck is slower, but far better at moving a lot at once.
To extend the analogy, a GPU isn’t actually just a truck; it is more like a fleet of trucks working in parallel. When applied correctly, this can effectively hide latency in much the same way as an assembly line. It takes a while for the first truck to arrive, but once it does, there’s an unbroken line of loaded trucks waiting to be unloaded. No matter how quickly and efficiently one unloads each truck, the next one is right there, waiting. Of course, GPUs don’t just shuttle memory around, they can do work on it as well.
The usual configuration for deep learning applications is a desktop computer with one or more high-end graphics cards wedged into it, but there are other (and smaller) ways to enjoy some of the same computational advantages without eating a ton of power and gaining a bunch of unused extra HDMI and DisplayPort jacks as a side effect. NVIDIA’s line of Jetson development boards incorporates the right technology in an integrated way. While it might lack the raw horsepower (and power bill) of a desktop machine laden with GPUs, they’re no slouch for their size.