Here’s Why GPUs Are Deep Learning’s Best Friend

September 3, 2023

If you have a curiosity about how fancy graphics cards actually work, and why they are so well-suited to AI-type applications, then take a few minutes to read [Tim Dettmers] explain why this is so. It’s not a terribly long read, but while it does get technical there are also car analogies, so there’s something for everyone!

He starts off by saying that most people know that GPUs are scarily efficient at matrix multiplication and convolution, but what really makes them most useful is their ability to work with large amounts of memory very efficiently.

Essentially, a CPU is a latency-optimized device while GPUs are bandwidth-optimized devices. If a CPU is a race car, a GPU is a cargo truck. The main job in deep learning is to fetch and move cargo (memory, actually) around. Both devices can do this job, but in different ways. A race car moves quickly, but can’t carry much. A truck is slower, but far better at moving a lot at once.

To extend the analogy, a GPU isn’t actually just a truck; it is more like a fleet of trucks working in parallel. When applied correctly, this can effectively hide latency in much the same way as an assembly line. It takes a while for the first truck to arrive, but once it does, there’s an unbroken line of loaded trucks waiting to be unloaded. No matter how quickly and efficiently one unloads each truck, the next one is right there, waiting. Of course, GPUs don’t just shuttle memory around, they can do work on it as well.

The usual configuration for deep learning applications is a desktop computer with one or more high-end graphics cards wedged into it, but there are other (and smaller) ways to enjoy some of the same computational advantages without eating a ton of power and gaining a bunch of unused extra HDMI and DisplayPort jacks as a side effect. NVIDIA’s line of Jetson development boards incorporates the right technology in an integrated way. While it might lack the raw horsepower (and power bill) of a desktop machine laden with GPUs, they’re no slouch for their size.

20 thoughts on “Here’s Why GPUs Are Deep Learning’s Best Friend”

Then says:

September 3, 2023 at 1:28 am

So its more like container shipping? :p

Report comment

Reply
1. tadpole says:
  
  September 3, 2023 at 3:07 am
  
  When we don’t have the time to really understand something, we use analogies. We explain something new by using an already understood concept. To some degree everything is a representation, but when we use representations of representations ad. infinitum, it at some point willingly trades understanding for simplicity and can result in a percieved understanding that bypasses any factual basis. I see this a lot in AI, and the attempts to easily make it explainable. So to ammend your above statement, I would say it is not so much container shipping, as the collection of individual container on a particular ship before they have entered the harbor. Get it? No?
  
  Report comment
  
  Reply
dfzx says:

September 3, 2023 at 3:44 am

He still doesn’t explain the deep learning connection, does he? He explains the 3 significant benefits of GPU architecture over CPUs, but doesn’t explain why the problems associated with deep learning are better solved by having those 3 benefits.

I still don’t know what the fundamental problems are with deep learning. But at least I do now know that high memory bandwidth, thread parallelism and fast register access are the basis of the solutions to them.

Report comment

Reply
1. Dude says:
  
  September 3, 2023 at 6:11 am
  
  It’s because deep learning, artificial neural networks, etc. are essentially emulating another “computer” in the computer’s memory. It’s like a physics simulation where you compute how the parts of the machine operate instead of directly computing the logic that it implements – largely because you don’t know what the logic is. You simply run the machine over and over, changing and varying its internal rules by trial and error, until it does the job that you want – and you don’t care how it does that just as long as it does.
  
  So what you have is a massive matrix of data that represents the internal state of the virtual computer, and you’re trying to “run” it as fast as possible to test its response, and that sort of a job is best suited for vector processors like GPUs. The computation itself is very simple, just some addition or multiplication etc. without any branching, but the matrix is so big that you also need extremely fast memory access to churn through it – otherwise your thousands of parallel CPUs would be sitting idle waiting on memory most of the time.
  
  Report comment
  
  Reply
  1. M says:
    
    September 3, 2023 at 10:34 am
    
    “Computer emulating a computer” tells people nothing.
    
    Neural networks are composed of layers of neurons, with connections inbetween. The outputs of the neurons of one layer gets passed to the next, but the neurons in each later are pretty much independent of each other. Thus, you can compute the result for each neuron in a layer in parallel with the layer’s other neurons, in theory.
    
    A CPU has a single thread (or perhaps a handful, but let’s assume one) and has to chug through the neurons one-by-one regardless of whether they could be handled in parallel. A GPU with 1000 cores can compute the results of a 1000-neuron layer in parallel in one step, then move on to the next layer. That yields a 1000x speedup compared to a single CPU core.
    
    Report comment
    
    Reply
    1. Dude says:
      
      September 3, 2023 at 3:05 pm
      
      We’re not talking of just neural networks, but all sorts of systems that can be represented in similar ways. What the “AI” is doing is running a simulation of some hypothetical mechanism or a computing machine, that is supposed to resemble how brains process information. To understand the point, we don’t need to go to details of how an SNN works – it could be any such “cellular automata”. We could just as well be iterating through different versions of the Game of Life to find one that can drive a car, or buy you napkins off of Amazon.
      
      Seeing that such computing machines and their representations all necessarily reduce to the Turing Machine, what you are doing IS just running a virtual computer in the memory of your actual computer, by simulating the hardware of said machine “piece by piece”. The state of each piece is represented by data in memory, and you shuffle the data around to make the virtual computer run.
      
      >That yields a 1000x speedup compared to a single CPU core.
      
      Not if you don’t have the memory bandwidth and speed to keep those 1000 cores fed with data. Ordinary CPUs are designed to handle complex computational problems of the size that fit in the CPU cache. A GPU is optimized to perform simple computational routines on massive amounts of data, which fits AI simulations well – or rather, our AI simulations have been formulated to run well on GPUs.
      
      Report comment
      
      Reply
      1. Dude says:
        
        September 3, 2023 at 3:22 pm
        
        Though most AI simulations are not Turing complete. It’s far more likely to find a simpler solution that performs only the task you require of it, and nothing else.
        
        Report comment
      2. Nimrod says:
        
        September 4, 2023 at 3:12 am
        
        I totally agree with your summarisation as to what is being achieved in the GPU, all intelligent systems are just simulations of other systems. But firther to this, the more accurate and granular the simulation is the more ‘understanding’ it has of the external system it is trying to simulate. Although there is also a problem with the overarching managing system, or what I like to call the operator. The operator chooses the system to simulate and the rules of how the simulator operates. In doing, the top level operator or manager could ask the system to simulate something that is not true, like a deep fake or a belief in the flat world theory. In doing so the underlying system ‘believes’ in a reality that does not exist. Further to this, we should ask, who or what is the overarching operator of the operator, this can be another intelligent system, or just a component of the environment.
        
        Report comment
      3. M says:
        
        September 4, 2023 at 2:39 pm
        
        Too many words, too much conjecture, too many incorrect details.
        
        Report comment
Cyna says:

September 3, 2023 at 5:34 am

And the hack would be? I do not come here to read common news…

Report comment

Reply
1. Cyna says:
  
  September 3, 2023 at 5:36 am
  
  Common AI clickbait news, that is, in which I see no value.
  
  Report comment
  
  Reply
  1. Miles says:
    
    September 4, 2023 at 10:17 am
    
    Yeah, he didn’t even mention Compute cards like Nvidia Tesla line, and the ending paragraph seems to imply he doesn’t even know about them because he mentions ‘display outputs’ which Tesla cards don’t even have.
    
    Report comment
    
    Reply
support1c0f862c91 says:

September 3, 2023 at 7:56 am

Today in BizarroLand we get our facts from Quora, our investment advice from Yahoo!Finance and our medical diagnosis from WebMD.

Report comment

Reply
1. Thinkerer says:
  
  September 3, 2023 at 2:14 pm
  
  That’s, like, so yesterday! Now we get our facts from TikTok, our investment advice from Reddit’s r/wallstreetbets and our cyberchondriac medical diagnosis from herbalist FB pages.
  
  Just think how people are making their election choices…well, you don’t have to actually…like…think….
  
  Report comment
  
  Reply
None says:

September 3, 2023 at 10:53 am

“move cargo (memory, actually) around” and “Of course, GPUs don’t just shuttle memory around, they can do work on it as well”

The more apt term would be data, the memory itself stays in place, only the memory contents is moved and operate upon.

Maybe this comes from C’s memcpy, but this refers to main memory as opposed to hard disk or other physical storage media, not that memory is actually copied (which it isn’t).

Copying a disk makes sense, when you actually have physical media you move around, instead of mere data transfer.

Report comment

Reply
Saddest zoomer of All Time says:

September 3, 2023 at 7:01 pm

Buying NVIDIA hardware today is like buying a Volkswagen in 1939.

Report comment

Reply
Miles says:

September 4, 2023 at 10:14 am

If the ‘Maxwell’ architecture is enough for you there is an Nvidia card with 24GB GDDR5 memory for around $110 the Tesla M40. I confess to not knowing what the current hobbyist Large Language Models have in the way of hardware requirements.

Report comment

Reply
1. Miles says:
  
  September 4, 2023 at 10:19 am
  
  If you need a Pascal card the Tesla P40 is available around $200 and has 24GB GDDR5X
  
  Report comment
  
  Reply
2. Miles says:
  
  September 4, 2023 at 10:58 am
  
  I almost forgot, there may be some BIOS hacks for AMD CPU integrated graphics (APU) to give them a larger ‘dedicated’ portion of system memory for use in a large language model. This way you can use less expensive (64GB DDR4 ~$80 or 64GB DDR5 ~$100)
  
  Report comment
  
  Reply
Miles says:

September 4, 2023 at 11:13 am

Looks like even a Kepler Tesla K40 might run a CUDA 11 LlammaSharp backend: https://github.com/SciSharp/LLamaSharp

According to the documentation here: https://en.wikipedia.org/wiki/CUDA

The k40 12GB card is around $30, although the K80 (two K40 on a single card) is only $40. So it could be cheap to get into.

Report comment

Reply

Hackaday

Here’s Why GPUs Are Deep Learning’s Best Friend

20 thoughts on “Here’s Why GPUs Are Deep Learning’s Best Friend”

Leave a ReplyCancel reply

Search

Never miss a hack

If you missed it

The Fight To Save Lunar Trailblazer

Hacking When It Counts: DIY Prosthetics And The Prison Camp Lathe

Dearest C++, Let Me Count The Ways I Love/Hate Thee

Personal Reflections On Immutable Linux

Crunching The News For Fun And Little Profit

Our Columns

Robots Want The Jobs You Can’t Do

Hackaday Links: July 13, 2025

Trickle Down: When Doing Something Silly Actually Makes Sense

Hackaday Podcast Episode 328: Benchies, Beanies, And Back To The Future

This Week In Security: Bitchat, CitrixBleed Part 2, Opossum, And TSAs

20 thoughts on “Here’s Why GPUs Are Deep Learning’s Best Friend”

Leave a ReplyCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns