Learn GPU Programming With Simple Puzzles

Have you wanted to get into GPU programming with CUDA but found the usual textbooks and guides a bit too intense? Well, help is at hand in the form of a series of increasingly difficult programming ‘puzzles’ created by [Sasha Rush]. The first part of the simplification is to utilise the excellent NUMBA python JIT compiler to allow easy-to-understand code to be deployed as GPU machine code. Working on these puzzles is even easier if you use this linked Google Colab as your programming environment, launching you straight into a Jupyter notebook with the puzzles laid out. You can use your own GPU if you have one, but that’s not detailed.

The puzzles start, assuming you know nothing at all about GPU programming, which is totally the case for some of us! What’s really nice is the way the result of the program operation is displayed, showing graphically how data are read and written to the input and output arrays you’re working with. Each essential concept for CUDA programming is identified one at a time with a real programming example, making it a breeze to follow along. Just make sure you don’t watch the video below all the way through the first time, as in it [Sasha] explains all the solutions!

Confused about why you’d want to do this? Then perhaps check out our guide to CUDA first. We know what you’re thinking: how do we use non-nVIDIA hardware? Well, there’s SCALE for that! Finally, once you understand CUDA, why not have a play with WebGPU?

8 thoughts on “Learn GPU Programming With Simple Puzzles

  1. i’m still discouraged because my first attempt to learn opencl (last year) revealed a bunch of kernel / driver bugs that resulted in unkillable tasks. it’s definitely still an uphill challenge — “steep learning curve”. i hope tutorials like this reflect a greater amount of foot traffic shining a light on these low-hanging bugs

  2. Biggest thing with a gpu

    Is that unlike a CPU, one instruction can modify more than one memory location or you can do several different instructions on different memory location at the same time

    In parallel, not serially like a CPU

    So things like branching and conditionals and boolean may work different

    Though gpu Instructions are more simple than CPU, gpu also have hardware support for floating point operations

    You can think gpu instruction set like risc but the instructions work in parallel and somewhat non linear fashion

      1. But you technically can run an entire is on a gpu

        Like my old amd Fiji with 3rd party overclock support and 4GB hbm ram

        1.4 ghz and slight memory overclock

        Use a few shaders and some floating point and mmu

        You could get windows running on a gpu running on a windows PC cpu

        Both natively and in real time

        Won’t be play gta v tho

        1. This reminds me of a thought I’ve had brewing for awhile now. I was imagining a CPU with very basic stripped down cores but loads of them. Use a segment of them for software defined fetching and scheduling.
          Could strip them down real far and do multiplication as addition across a bunch of cores for example.
          I wonder if you could get the small to fit enough and get a high enough clock that, with compiler optimization for it, it would be performant.

          1. I think you’re reinventing bitslice design. It’s a thing with a long history. I designed a hybrid structure of FPGA (for routing and data munging tasks) and bitslice ‘cores’ back in college 25 years ago. Well, I wrote it down anyway, it never made it to reality. I did come across an EE researcher who was studying 1-bit CPU cores, optimised for super high clock rates. I didn’t hear much more about that either.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.