A Real GPU On The Raspberry Pi — Barely.

[Jeff Geerling] saw the Raspberry Pi Compute Module 4 and its exposed PCI-Express 1x connection, and just naturally wondered whether he could plug a GPU into that slot and get it to work. It didn’t. There were a few reasons why, such as the limited Base Address Register space, and drivers that just weren’t written for ARM hardware. A bit of help from the Raspberry Pi software engineers and other Linux kernel hackers and those issues were fixed, albeit with a big hurdle in the CPU. The Broadcom chip in the Pi 4, the BCM2711, has a broken PCIe implementation.

There has finally been a breakthrough — Thanks to the dedicated community that has sprung up around this topic, a set of kernel patches manage to work around the hardware issues. It’s now possible to run a Radeon HD 5000/6000/7000 card on the Raspberry Pi 4 Compute Module. There are still glitches, and the Kernel patches to make this work will likely never land upstream. That said, It’s possible to run a desktop environment on the Radeon GPU on a Pi, and even a few simple benchmarks. The results… aren’t particularly inspiring, but that wasn’t really ever the point. You may be asking what real-world use is for a full-size GPU on the Pi. Sure, maybe crypto-mining or emulation, or being able to run more monitors for digital signage. More than that, it might help ensure the next Pi has a working PCIe implementation. But like many things we cover here, the real reason is that it’s a challenge that a group of enthusiasts couldn’t leave alone.

36 thoughts on “A Real GPU On The Raspberry Pi — Barely.

  1. Don’t get me wrong, it’s amazing overall but there are still things to improve it in the next release. The Raspberry Pi 4 has had several broken things. At least they fixed it by moving to version 4b but why is there a 4b version? Why does the CM4 only use USB 1.0 and it defaults to off? Why is there only PCIe 1x OR USB 3.0 and not both? Why are they all basically out of stock and have been for months and months now? Why do they not have any M.2 NVMe ability? At least OS wise there are other options now.

    Raspberry Pi 3, some Amazon Fire tablets, Roku media players and the Nintendo Switch use Cortex A53. The Cortex A73 is a newer model, often found as the big cores in big.LITTLE arrangements from Qualcomm and Samsung. The successor (with 30% greater performance) to the Cortex A72 is used in the Raspberry Pi 4. Interestingly, it supports out-of-order execution and branch prediction, while the Cortex A53 does not. These are all basically higher end Arduinos, right? Obviously not only slightly better but seem to be evolutions of that type of hardware?

    What version microprocessor is the RP5 going to use? Can they offer them actually for sale at a decent price and can they include something with both USB C as well as M.2 and/or PCIe as a standard offering?

    1. >At least they fixed it by moving to version 4b but why is there a 4b version? Why does the CM4 only use USB 1.0 and it defaults to off? Why is there only PCIe 1x OR USB 3.0 and not both? Why are they all basically out of stock and have been for months and months now?

      The “B” is the format, not the version. “A” boards have the USB and Ethernet ports lopped off to create a smaller footprint. There was a 3A, but never a 4A.

      Normally the PCIe lane is used for the PCIe-to-USB3 chip, so using the USB3 ports also occupies the PCIe. The chip natively only has one USB 2 (?) port.

      They’ve made a couple of news posts about the shortages: basically the Pi4 is getting hit by the chip shortage like anything else, and for now they’re prioritising commercial buyers because that actually affects people’s jobs in other companies. I’d bet there’s also a degree of not wanting to give commercial users a reason to jump ship to another device.

      1. https://hackaday.com/2019/07/16/exploring-the-raspberry-pi-4-usb-c-issue-in-depth/
        They did have to add some components to resolve the issues with the original 4th version. It’s been covered at length before.

        The currently used RPi 4b chip does only have one lane and you get PCIe or USB 3 and that’s it. Would like a new RPi version with a chip that offers both so you can have USB 3 and at least some type of PCIe which you can easily convert with inexpensive hardware to use M.2 if you want to do so since M.2 is basically using a PCIe lane. Not exactly ideal but faster than SATA and much faster than USB 3 for that matter.

        1. A minor hardware revision so it plays nice with some chargers doesn’t make it a 4B though… Its just ver 1.1 or something like that of the 4B model..

          USB3 vs direct use of the same gen2 PCIe x1 lane doesn’t make a great deal of difference (I’ve tried it), it is there but really USB3 is able to run so close to native PCIe in read/write IFF its the only device in use that it doesn’t actually make a real world difference, and I’d generally rather have USB3 as you can then trivially split the bus up between more devices…

          The real thing that tends to kill USB3 drives performance on Pi’s etc is that they are not really USB3 drives, but crappy flash packaged with a blue USB tip, that maybe just maybe does actually have all the connections to be USB3, rather than just USB2 but in blue, but either way won’t actually run that fast. But use a USB3 to M.2 Caddy and you get basically the same performance you can get natively using the PCIe lane, slower but by nothing much…

          Maybe the next CM module will offer both, but I highly doubt a normal Pi ever will – the form factor doesn’t really allow it, and with the primary function of the Pi’s normal model I can’t seem them changing that. To me its far more likely to get the next SOC having a gen 3 lane, or perhaps more gen 2 lanes so you get more bandwidth to the SOC to run the NVME/USB 3 etc, but probably only able to make full use of it with the CM.

          1. There are a lot of terrible USB 3.x control chips coupled with terrible NVRAM chips, packaged into cheap USB flash drives. They usually have decent read speeds but write speeds are abysmal, sometimes *slower than a middling USB 2.0 flash drive*. They’ll often put a chunk of RAM cache on them so that small writes of a few megabytes go quickly but if you try to throw one large file or a lot of small files at them, the cache will fill up then the write speed crashes to single digit megabytes per second – and you’re looking at hours to fill up a 128 gig drive.

            ALL the manufacturers make this crap. PNY, Kingston, Sandisk… They also make garbage usb 2.0 flash drives that are even slower writing. The only way to tell if one is slowpoke writing if there’s no reviews online is to buy and try.

            If you want guaranteed high speed USB write performance and don’t mind more bulk is to get a PCIe x4 NVME SSD and a USB-C external case for it. I got a free 256 gig one then spent $24 on an EZCAST model S8000 for it. Over 430 megabytes/second sustained read *and write* on original USB 3.0. Should be faster on USB 3.1 or C.

    2. If you want M.2 NVMe its trivial to find a CM4 carrier board that gives you that, even one that can split the PCIe lane to many extra devices if you must have more PCIe…

      As for offering them for sale at a decent price, well that is part of why the Pi4 doesn’t use a chip with more than 1 lane of gen 2 PCIe (if memory serves), which is actually quite alot for these small sort of systems, many of the SOC type devboard/sbc approaches have no PCIe lane in the SOC at all, nor native USB3 in the SOC, as that also costs more.

      As it stands right now I use a Pi4 for probably 90 maybe 95% of my computer use, its hooked up to my nice monitor, keyboard, mouse and sound system and works just as well as my big desktop – to the point that the big power hog is largely just a gamine machine now and the only time the difference jumps out is when dragging around or full screening a video, where the Pi4 does make you wait alot more than a modern GPU/CPU combo. So they are way more than jumped up Arduino, being actually a very useful computing package, able to run things like opencv at usable frame rates, they just happen to have GPIO so you can often use them like an Arduino if you want to, and don’t have to have something like an Arduino or teensy on the USB to interface with your own electronics..

      Also Arduino are usually used to run only the program of interest, and so will get ‘real-time’ timings, and are better for some tasks where running Linux your Pi GPIO related programs are at the mercy of the task scheduler – might be fine, and the massively more potent CPU/GPU combo in the Pi lets it do things the Arduino can’t even hope to, but they are very different really.

      While Pi’s are definately hit by the supply shortage I’ve seen plenty of them available at my usual suppliers, just not all the models..

      1. The CM4 IO board does technically work with a CM4, by design. You lose USB 3 though. USB 2.0 is turned off by default (https://www.jeffgeerling.com/blog/2020/usb-20-ports-not-working-on-compute-module-4-check-your-overlays) but you can work around that but it will still always be USB 2.0 due to a chip issue in the sense that the chip used on the RPi 4b literally will not support both. Though at least one person has been able to work around that and add USB 3.0 to a CM4 device while removing PCIe 1x so it can sort of technically be worked around on CM4 boards but you still only get one or the other. https://blog.zakkemble.net/rpi4-pci-express-bridge-chip/

        What you don’t get is a decent form factor with the CM4 IO board. You get a M.2 sticking straight up, way beyond the enclosure. Other companies are working on or have released something cosmetically and physically better at least such as Oratek’s TOFU though it needs some kind of case and not just something 3D printed or with no way of installing a heatsink on the M.2 device.

        But you still lose USB 3.0 and yes, the CM4 IO board is technically just to try things out rather than be something you regularly use but sheesh even the power supply has no basic protection so you better give it the right voltage or your PCIe or M.2 device is going to be very unhappy very quickly.

        Combining an Arduino plus a RPi CM4 is interesting as well https://blog.adafruit.com/2020/12/10/this-is-piunora-a-carrier-for-the-raspberry_pi-cm4-in-an-adafruit-metro-arduino-form-factor-by-timonsku-oshw/ though not sure how that works offhand.

        The GPIO on a RPi 4b is very nice but it’s still a computer and not a microcontroller. At least there is the RP2040 which is technically a dual core microcontroller and that will very likely continue to offer better and faster hardware when the next version is released.

        https://rpilocator.com/ is showing basically no stock of anything and what shows up almost immediately sells.

        If you can find anything at your usual suppliers, be sure to let people know because many RPi products have been largely out of stock for months now or show up and are very quickly sold. Not looking for stock personally and have used the RPi 4b for a while now and it’s a fantastic device. Just looking for a few upgrades for the eventual future version is all.

        1. You can have both USB3 and NVME, or anything else PCIe you want, it isn’t even hard to find boards that do so – You really don’t want use the official RPI dev board for the CM for anything but developing – use one of the many many many carrier boards for it, for all sorts different purposes!

          You get one that pretends to be a Pi4 in layout very successfully, some have many PCIe slots so you CAN have USB 3 and NVME, and external wifi, and now even GPU (sort of at least) – just note they are sharing the same single lane of PCIe bandwidth, so you can’t get full performance out of all of them all at once.

          1. If you know of any product that can do both 1x PCIe and USB 3 to a RPi 4b at the same time, please let me know!

            Maybe it is technically possible to add a 1x PCIe USB 3 output device combined with a NVMe reader that is all using the same x1 port? I have seen multiple USB 3 devices added but was under the impression that turned off the use of any kind of NVMe at the same time through the 1x PCIe slot which is mostly really only available with a CM4 anyway.

            Plus USB 3 was designed to let you use multiple devices. PCIe sort of caps out in terms of bandwidth and that’s less of a concern anyway because this is going to be fairly slow to begin with but did not think you could really add it all to a single 1x port is all.

            Agree with you about the CM4 IO board being mostly about testing and development. Still could use a bit more built in basic voltage protection though rather than just let you send 24V and fry internals.

          2. There are chips that let you split up PCIe in many ways – turn a 16x into many 8x/4x/1x or share a 1x etc, so you can put the USB3 chip behind one of those and get other PCIe. (I had a design concept in mind so have dived into the PCIe rabbit hole, but those chips were way to pricey or unobtainable with the chip shortage (perhaps resolved now, not looked))

            For instance
            No idea if its any good, but if Jeff has mentioned it with the diligence he puts in I’d expect it to be solid – it has 2 USB3.0 ports (I assume just one USB3 controller) and NVMe on the same board.

          3. And if you really really must use a Pi 4B model rather than the identical SOC CM4 – start by desoldering the USB3 chip, solder on the clever PCB to bridge the PCIe lane straight to the usb 3 connector (featured on HAD somewhere), and there is a crypto miner board that uses USB3 cables (but not signalling) to take the PCIe lane to a PCIe to 16x slot, from there you can put a PCIe splitter board in (making sure its the sort that can share lanes – which I suspect is all of them, but I don’t know), then you plug in the USB3 PCIe card, and your NVMe…

            Way way more work, ugly and probably delicate result – but it amounts to the same thing eventually..

            Which is why so many folks are over the moon with the CM4 containing a PCIe lane! All these fun things like Jeff and friends trying to make GPU work is suddenly possible!

  2. Well, there are more better suited CPUs that this Broadcom piece-o-sh.. line of CPUs the Raspberry Pi foundation is relentlessly using. Take some 6-cores Rockchip for example:
    The RK3399 has 4 A53 cores + 2 A72 cores. Some very capable boards using this CPU exist like the Rock Pro 64:

    For 80USD, you got something that blows a Pi4 away … and it´s in stock.
    Sure, some will whine “but the documentatiooooon! the old kerneeeeel!” except they are wrong: Much of the documentation for Pi* can be easily applicable to such a board, and that board is also in mainline kernel.

    1. Agreed, but there is a reason why pine SBCs (or any of the other alternatives, and there are a few) have not taken the market by storm: they are not a Raspberry Pi. Being slightly faster, or slightly more efficient, or having this or that extra hardware is not that important. Community, assured availability and support are.

      1. There’s nothing like having a large community that all support doing the wrong thing, helps that cosy feeling of belonging when outsiders call out the bullcrap. … there’s a word for that dynamic… oh yah… cult.

      2. It amazes me how some can discount the importance of software and platform for a system. This entire exercise to get a GPU running shows the importance of that. It was only possible because of the community and support.

        1. I agree. I got “another” brand for network storage, and the wiki was so out of date that its files no longer worked. You had to wrestle through things and recompile the os. When all you want to do is plug it in, and just have it work (except maybe your expected customized tasks), “other brands” are hit and miss. Older Pis are guaranteed, and newest pi always works within a short time.

      3. i think you’ve hit a core fact that there is a very real attraction to raspi just because it’s raspi. no matter how good or not it is, there is just so much of it.

        but i think “Community, assured availability and support” is a poor summary of what raspi brings to the table.

        the raspi community is large and the ground is well-travelled but the community is the same as the windows community, they’re all outsiders looking in…you don’t get the advantage of an open source community…you just see evidence of a thousand other people having banged their head on this wall before you did. they never crossed the wall and whatever fix they came up with was already obsoleted by the time you found out about it because of all the things that changed on the other side of the wall.

        availability is a laugh, raspi has awful availability. that’s a funny thing because it’s true to say, well, everyone is having availability problems. it’s true. i’m inclined to cut raspi a lot of slack. but one piece of slack i won’t cut is pretending that it isn’t happening. raspi availability is, at this point, subpar even as linux ARM SBCs go. 6 months ago, i would’ve believed it, but now it’s basically been 6 months since they’ve been in stock??

        support is … man! i mean! raspi does provide a large amount of support in terms of poorly-designed poorly-documented buggy closed-source driver / bootloader / monitor software. other vendors don’t provide that support because *they don’t need to*. i know some people have had different experiences depending on what they’re trying to do and what their relationship is to raspi foundation but overall my exposure to raspi support has been a bunch of “no you can’t get there from here” with a heaping dose of defensive “and you shouldn’t want to” condescending rhetoric thrown in. it has been really weird to me to see the contortions raspi fans and company representatives go through to avoid admitting that there are serious flaws in raspi.

        everything has flaws but the specific rhetoric that’s sprung up around raspi is really off-putting to me. describe it as pretty good for the price and i’m nodding my head but then people start calling it the most open or so on and i’m just bewildered, why mislead?

    2. Does the GPU actually work on that? I never got the GPU working on my pinebook. It’s the only reason I’m using a Pi for specific projects, the GPU actually works, for 3D and video.

    3. Bit exaggerated the Broadcom SOC is pretty damn capable, yes its not the most powerful, or the cheapest but the Pi’s are decently powerful and pretty cheap, while also having great community, long term availability, and support that means you can just use an original now stupidly obsolete Pi and the only thing slowing you down is that its rather slower than a new one…

      That is pretty much where all the others fall down, we made 10000 of them in batch one, and batch 2 will be different with changes that require a new device tree, kernel etc. The support for the gen 1 boards will be zip, as we are on gen 3 or some entirely new project now.

      Its not that you can’t use such things, but they are massively less useful than the Pi as they require massively more work, usually without the documentation you need to easily compile your own up to date kernel etc…

      1. there’s a grain of truth to what you say, in that raspi has been raspi for a very long time as these things go.

        but i’ve been a pi user for just over a year now and they’ve already obsoleted the one pi-specific program i wrote, by changing the kernel interface. so now if i update the raspi os, if i want to use any of its new glorious features (which i very much do, it’s the only reason i’d update), i have to rewrite my program from scratch again. good news is they switched to a standard interface so this time around would be a *LOT* easier than the first time around. but then, once i switch to the standard interface, the program simply won’t work on older pis that don’t support the new interface.

        and every time i look up some problem, i find 5 year old community posts where people are trying to figure out why some hack doesn’t work anymore. thanks to the “community”, i can see that a dozen people got together and did the detective work and found out which firmware update broke their hack. how useful!

        different strokes, for sure. a lot of people find that kind of community useful. if you aren’t trying to do anything too particular, like, say, play media files, it’s good enough. but i thiink a lot of what you’re saying is simply FUD. you’re just saying that something you haven’t done is much harder than something you have done, and features you haven’t experienced aren’t worth anything, and walls you haven’t run into shouldn’t bother anyone else.

        1. You run into walls in places with all of them, but the vast difference with the Pi is you never ever have to spend a week trying to figure out how to boot it to a newish kernel, let alone boot and actually have all the hardware working with no documentation at all (seriously most Android phones are easier to deal with than other SBC I’ve seen) you only need to do your own project(s) whatever they are.

          And as a vast amount of the Pi is open there are so few things you will have trouble with that you can’t just solve with some effort. Sure its not everything open, and not perfectly documented, can be a pain to find bits you know are documented sometimes as its always moving and updating as the documentation is so often being updated. But then the BIOS of most computers, and their GPU’s VBIOS is entirely closed source, there isn’t anything out there where everything is open that actually has useful for general computing performance.

          And if the Pi wasn’t so open, stuff like making a GPU play nice at all wouldn’t be possible at all!

          1. in another thread, you told me you didn’t remember which “other boards” you’ve used. has that changed?

            if you’re talking about problems with boards you don’t even know what are, that’s pretty definitionally FUD

            and the pi is not open. it’s extraordinarily closed. “real” GPU support is brand new to pi within the last 18 months but it’s common with all the other boards. every board i’ve looked at features some sort of open bootloader…it can be a pain in the butt, you can brick it, you have to follow a certain process to update it. i’m not saying it’s perfect. but the closed bootloader on the pi is unusual, and the enormous number of interfaces that it blocks is also unusual.

            when i say the pi is closed, it’s because i’ve beat on it personally. i don’t know why you keep saying all these other anonymous boards are closed when you can’t even name one.

    4. I think you miss an important thing. If one wants to pay $80, there are plenty of options. At $35, not so much. Especially with the support it has.

      There is a very simple reason the Pis use Broadcom chips. Eben works at Broadcom.

      Most people also miss the entire point of the Pi. It’s a cheap board to get people access to computers at a cheap price. If what you need/want is something with betters specs, then you shouldn’t even consider a Pi.

    5. The rock pro 64 is actually slower than the pi4b despite the more powerful rk3399 chip, and that’s also true for most other boards using the rk3399. I mean, if raspberry pi wanted to make an sbc that’s much more powerful they could, but that’s not their goal. Their goal is to get inexpensive computers in the hands of people who wouldn’t otherwise have access so they can learn programming. They use broadcom chips not cause broadcom makes the best chips, they use broadcom cause Ebon Upton worked there as an architect. Yes, the pi could be more powerful and most likely more exspensive, but if that doesn’t fit the goal if the raspberry pi foundation it doesn’t make sense for them to.

    1. What goes around comes around. An unexpanded TI-99/4A was essentially running its software on the Video Display Processor and its 16K RAM while the CPU with its 256 *bytes* of RAM was handling pointers and registers and generally acting as a data traffic cop.

  3. I first flashed on the idea of an absolute minimum h/w Folding@Home rig. While shoehorning GPUs onto every available PCI-e slot on the SuperMicro mobos I’ve been using to fold with over the last two years, ‘ntop’ made it painfully clear just how much of a bottleneck PCI-e x1 is for this use case. On the flip side, if you’ve already got the h/w and want to give it a go, it’s not useless.

    1. Having RTFA, “CUDA cores would still be inaccessible”, rats. It’s not clear to me whether Nouveau now offers OpenCL access to Nvidia GPUs, beyond what’s possible with Coriander. Perhaps better luck w/ AMD, but I don’t have any of their cards.

      1. I am under the impression it would be easier with AMD due to more open sourciness and years of experience in Linux with it. You can probably fiddle around with a cheaper used GCN 1.0 card to get the rudiments down and translate that to later stuff.

  4. what the pi needs is a bus, a fully operational p cie bus would open a lot of opportunities. i have seen stacks for pi motherboards but bus that can have pi add on cards and standardize upgrades i feel sounds nice. yes you can always step up to a mainframe what have you, but the idea is to make a small footprint programmable system. a small bus would allow better robotics. :)

  5. The reason I want a gpu on a pi is for artificial intelligence and to run lots of servo and actuators on a robot. This is why it seems nvidia jetson is a superior board than the raspberry pi. But seems for diy board it needs a gpu.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.