Turing Pi 2: The Low Power Cluster

We’re not in the habit of recommending Kickstarter projects here at Hackaday, but when prototype hardware shows up on our desk, we just can’t help but play with it and write it up for the readers. And that is exactly where we find ourselves with the Turing Pi 2. You may be familiar with the original Turing Pi, the carrier board that runs seven Raspberry Pi Compute boards at once. That one supports the Compute versions 1 and 3, but a new design was clearly needed for the Compute Module 4. Not content with just supporting the CM4, the developers at Turing Machines have designed a 4-slot carrier board based on the NVIDIA Jetson pinout. The entire line of Jetson devices are supported, and a simple adapter makes the CM4 work. There’s even a brand new module planned around the RK3588, which should be quite impressive.

One of the design decisions of the TP2 is to use the mini-ITX form-factor and 24-pin ATX power connection, giving us the option to install the TP2 in a small computer case. There’s even a custom rack-mountable case being planned by the folks over at My Electronics. So if you want 4 or 8 Raspberry Pis in a rack mount, this one’s for you.

The Appeal — And the Risks

“Wait, wait”, I hear you say, “There’s plenty of ways to rack-mount Raspberry Pis!” Certainly. The form factor options are handy, but the real magic is the rest of the board. Individually controlled power supply for all four boards from a single ATX power supply makes for a very clean solution. Need to reboot a hung Pi remotely? There’s the Baseboard Management Controller (BMC) that will do full power control over the network. That’s the real killer feature: the BMC is going to run Open Source firmware, and will power some very clever functions. Want UART to troubleshoot a boot problem? It’s available from all four nodes on the BMC. Need to push a new image to a CM4? The BMC will include image flashing functions. Built into the board is a Gigabit network switch linking the Pis, the BMC, and two external Ethernet ports, all supporting VLANs.

On the other hand, not much of the BMC wizardry is actually implemented yet on the review units. This is the project’s biggest promise and the place it could go awry. Putting together a stable firmware with all the bells and whistles in the three months before scheduled ship date may be a bit optimistic. I’m expecting a working firmware, with updates to refine the experience in the months following launch.

Then there’s the expanded IO. The board comes with a pair of Mini PCIe ports, 4 USB3 ports, and a pair of SATA ports. This works via the PCIe lanes exposed by the various compute modules. Nodes 1 and 2 are connected to the mini PCIe ports, node 3 to the SATA, and node 4 to the USB3 ports. On top of that, a switchable USB2 port can be dynamically assigned to any of the existing nodes. Oh, and there’s an HDMI output from node 1, so even more options, like running a Pi CM4 8GB as a desktop machine. A late option added to the Kickstarter bolts four NVMe ports to the bottom of the board, one per slot, though not every compute module has the PCIe lanes to support it.

Now keep in mind that I’m testing a pre-production unit (more on that later), and not all of the above is actually working yet. Quite a few changes are slated for the production boards vs my unit, and the BMC firmware on this board is absolutely minimal. There is also the supply-chain issues we’ve continued to cover here on Hackaday, but the TP2 has the advantage of being designed during the shortage, so should be able to avoid using hard-to-source parts.

Use-Case

Now let’s talk about what this *doesn’t* do. This may seem obvious, but the Turing Pi 2 doesn’t give you a single ARM machine with 16+ processing cores. There isn’t enough magic onboard to make the devices act like a unified multi-processor computer. I’m not sure there’s enough magic anywhere to really pull that off. However, what you do get is four easily-managed machines that are perfect for running light-weight services or Docker images.

Looking for a platform for learning Docker and Kubernetes? Or a place to host Gitlab, Nextcloud, and a file server? Maybe you want to play Nginx as a front-end proxy, and several devices running services behind it? The Homelab-in-a-box nature of the TP2 makes it a useful choice for all of the above. And even though you can’t reasonably do all the above on a single Raspberry Pi, a programmable cluster of 4 of them does the job quite nicely. The VLAN support means that you can add virtual NICs to your nodes, and create an internal network. With the two physical Ethernet ports, you could even use your TP2 as your primary router, on top of everything else it can do.

Real-World Testing

So what’s the actual state of the project? I have my pre-production board currently booting a Raspberry Pi CM4, a Pine64 SOQuartz module, an NVIDIA Jetson Nano, and the Jetson TX2 NX. The Jetson Xavier NX had a quirk requiring a minor board modification, but runs like a champ once that was done. There are the normal warts of a pre-production board, like extra dip switches all over the place, and a few quirks, like Ethernet only coming up at 100M for some devices. These are known issues, and a good example of why you do a test run of rev 0 boards. The final product should have all the kinks worked out.

I’ve been monitoring power draw, and the most I’ve managed to pull is a mere 30 watts of power. This suggests a real-world use case, an off-grid compute cluster. The mini-PCIe ports should allow for an LTE modem (Or you can use Starlink if you’re *way* off grid). Add a couple cameras and install the Zoneminder docker images, and you have a low-power video monitoring solution. Add a RTL-SDR dongle, and the rtl_433 software listening to a solar-powered weather station, and you can track the weather at your remote location, too. Just for fun, I ran a Janus docker image on one of the Raspberry Pi CM4s on my TP2. Janus is the WebRTC server we’ve integrated into Zoneminder, and I was able to live stream 12 security cameras at 1080p, only using around 25% of the available processor power, or a load of 1 on a four core Pi. It’s a testament to how lightweight Janus is, but also a great example of something useful you could do with a TP2.

What’s Next

The Kickstarter is over, with better than two million dollars raised, but don’t sweat it, because you will soon be able to purchase a Turing Pi 2. Ordering will be handled through the Turing Pi website itself, stay tuned for the details. There will be a few months til the final revision of the board is finished and shipped, hopefully with some killer firmware and everything working exactly as advertised. Then finally there’s the alluring RK1 compute board, with up to 32 GB of ram and eight cores of Arm goodness from the RK3588. That’s a little further out, and may be a second Kickstarter campaign. I asked about mainline support for the RK1, and was told that this is a primary goal, but they’re not exactly sure on the timing. There is quite a bit of excitement around this particular chip, so look forward to the community working together to get all the needed bits in place for mainline support.

There may be an unexpected consequence of the Turing Pi 2 and RK1 using the NVIDIA Jetson SO-DIMM connector. Imagine a handheld device built on the Antmicro open source Jetson Baseboard, that woks with multiple compute modules. I mentioned the Pine64 SOQuartz: That’s not an officially supported board in the TP2, but because Pine64 built it to the CM4 specifications, it clicks right into the adapter card and works like a champ. There’s an interesting possibility that one or two of these compute module interfaces will gain enough of a critical mass, that it gets widely used in devices. And if anyone wondered, using the TP2 CM4 adapter doesn’t magically allow booting a CM4 in a Jetson Nano carrier board. Yes, we checked.

So is the Turing Pi 2 for you? Maybe. If you don’t mind juggling multiple single-board computers, and the mess of cabling required, then maybe not. But if the ability to slot four SBCs in a single mini-ITX case, with a BMC that makes life way easier sounds like a breath of fresh air, then give it a look. The real test will be when the finished product ships, and what shape the support is in. I’m cautiously optimistic that it won’t be terribly late, and that it will have working OSS firmware. I’m looking forward to getting my hands on the final product. Now if you’ll excuse me, I think I need to go set up an automated system for building aarch64 docker images.

36 thoughts on “Turing Pi 2: The Low Power Cluster

  1. I love this idea,this is clearly a top-notch project, and I was looking for something like this very recently. But the economics make it a bit tricky to justify right now — a 16-core AMD Ryzen 9 with an inexpensive motherboard is currently only ~$700, which isn’t far off from populating this $200 carrier board with 4 Jetson Nanos — but the 16-core processor will be vastly more capable. It’d be interesting to see what could be done with very low cost compute modules.

    1. If you want raw power, it does not make sense for a bunch of Pis like that, you’re right.

      But if you want hardware HA at home for a bunch of things you run, they are cheap enough that if you acquire 3 or 4 of them over time that it makes them easier on your pocket.

      I have a DL 360 G7 I picked up on eBay for $100 (+another $100 for shipping). Much more powerful and capable then my Pi cluster (I do not use one of these boards, they’re just Pi4 stacked with some screws), but it by itself offers 0 HA. If it dies, whatever is running on it is out of commission.

      With the multi pi cluster, if one of them dies, whatever containers I have running on it will just start up on another pi. Add 2 UPS from amazon, and a old phone as a backup for your broadband provider and you have a fully capable HA system at home.

      Would I ever use one of these things at work? No.

      1. Why would the answer to high-availability, in your use-case, to not be simply buying another $200 DL 360 G7?

        I get it that a cluster of 3 or 4 things is better than a cluster of 2 things, but a HA cluster of Pis seems like something that is academic in nature.

        Not that there’s anything wrong with academics: I’m writing this from a Windows VM that is running on a Linux box with QEMU/KVM with a bunch of hardware passed-through, and that is backed by ZFS. It works well, and I built it just to learn about some of these things (and stuck with it because it does work well). It’s my only “real” computer at home, and it’s a VM-running beast that I’m very pleased with myself for having taken the time to understand.

        Except: Is this product even useful for learning about HA?

        “The board comes with a pair of Mini PCIe ports, 4 USB3 ports, and a pair of SATA ports. This works via the PCIe lanes exposed by the various compute modules. Nodes 1 and 2 are connected to the mini PCIe ports, node 3 to the SATA, and node 4 to the USB3 ports. On top of that, a switchable USB2 port can be dynamically assigned to any of the existing nodes. Oh, and there’s an HDMI output from node 1, so even more options”

        I parse from this this that if one node dies, and it was doing important IO for the whole of the system, then everything dies that relied on that IO. That’s not what I consider high-availability: When different nodes have very different IO capabilities, then there are tasks that simply can’t transfer from one to another when one node fails.

        More succinctly: How is this even useful?

        1. All those Pis in my cluster up and running at the same time doing their jobs, consume less power than 1 of those DL 360s. Just because those servers can be obtained cheaply, does not mean they are the right candidate for the job.

          But to answer your question, sure, as it is setup it can be a limitation depending on what you need it to do. But here again, it comes down to is it the right thing for the job? For example, it has HDMI, but do you really need it? Maybe for setting it up, but after that you may always connect to it remotely (as I do with all of my Pis)

    2. Something like the Ryzen 7 5700G APU would be a good alternative and most likely a winner in terms of power consumption per unit of computation. I haven’t tried it but I suspect that it can emulate the entire RPi cluster as VMs and still have resources left over.

      1. Hard to say, as the recent AMD offerings are all computation per watt pretty damn good, and the APU graphics are at least to me rather astonishing BUT the idle power draw just can’t get close to as low the Pi’s and their calc per watt is good too. So real world loading may well make the Pi better for some folks, and the Ryzen for others.

        It also has the downside of being a VM – which has security concerns and performance limitations , where this cluster sounds like you can drop in a beefier ‘VM’ pretty trivially if you need to, and each computer is from what I can tell isolated from the other pretty much entirely, so escaping from one to run on another node is almost impossible, much smaller attack surface and with an opensource BMC that aught to work out safe enough in practice quickly.

  2. Can we please stop claiming something like this is a good platform to learn Kubernetes, HPC etc.? It’s not. It’s way too expensive, too slow, too inflexible and adds a lot of RPi specifics. At least this board here has an actual BMC like the real things.

    The easiest way to learn Kubernetes is Microk8s, the easiest way to simulate a Cluster of something are VMs or containers. That’s why you never see RPi clusters in education. They’re just expensive gimmicks.

    1. Sorry, I disagree.

      I learned k8s on a 7 pi cluster, not on a board like this though, just 7 plain P4B because I could acquire them at the time, 1/month for $35. I had a couple, I got another one, than another one, then another one and so on.

      Those are now still used as a cluster and serve very well as hardware HA. They’re connected to dual UPS, dual networks (regular internet provided by my broadband provider and the backup is an old cell phone with 4G). They do other Pi like stuff at the same time too besides the cluster thing, but that’s another topic.

      If one of those Pis die, the services on it will just magically pop up on another one.

      Meanwhile the DL 360 G7 I got on eBay for $100 (and another $100 shipped) in my rack can run tons of VMs for the same purpose, but provides no hardware HA, even if I put it on UPS, dual NICs, etc., etc.

      It is a very cheap way to acquire and build HA into your home lab.

      So yes, if you have 3 Pis laying around, and a lot of people do, they’re a great way to get into K8S.

      Even if you have just 1, getting two more is way more economical then the alternative of getting servers on eBay.

      Would I use an Pi K8S cluster at work? Obviously not.

          1. @Greg A. That’s insane. The purpose of HA is to keep thing running if something fails, not discard it because nothing fails.

            Besides, if you want to learn it’s easy enough to kill one of them on purpose. Pull a power chord, remove an SD card, cause a kernel panic, etc.

            You setup HA hoping you’ll never need it. Especially when you’re running all actives. One of them fails, now at least 2 of your nodes will have to share resources.

      1. Not to say the power consumption should be reduced compared to a x86 server.
        I own a Dell Precition T5500 which two CPUs (6 cores each) … It sucks 250 Watts idle, and near 1000 Watts running full power.
        So I agree with you: you’ve found a use case for which using RPIs is the right solution.
        But I found those type of solution useful for teaching purposes. Yes, you could learn distributed work, HPC, or kurbenetes with a single computer running virtual machines, but I can tell that a RPIs cluster will definitely catch the attention and the interest of the students. With motivation and fun, learning is better !

    2. You don’t need to learn pi specifics. The author has boards from two other brands currently mounted. IIRC the So-quartz is marching quickly towards a mainlined kernel, and the Jetsons are properly interesting devices.

    3. I have to disagree on your last point.

      After having deployed a dozen or so production K8s clusters (some on managed services from AWS, Linode, and DigitalOcean, others bare metal on both beefy servers and low-power ‘edge’ compute)…

      Learning on _real hardware_, especially on multiple separate nodes, with physical networking layers in between them, exposes tons of little quirks in the way Kubernetes (and really any clustering software) behaves with more than one physical system.

      Things like how storage, networking, ‘self-healing’, full-cluster upgrades, and backups work are _vastly_ different on a real cluster than on a VM/microk8s/k3s/single-server setup.

      So there is value in using real hardware, and using some Pis (or other ARM boards) for this purpose takes a lot less energy and space for a ‘lab cluster’. Not to mention I don’t have four Intel NUCs blowing hot air on me all day!

      So for many things, the Pi cluster is a terrible fit (compared to even a modest AMD or Intel system). But for learning Kubernetes, it’s about ideal IMO, just as long as you can get Pis (or an alternative that works well) for close-to-MSRP.

        1. Usually the things that matter are identical (to the point I’ve deployed my production cluster on a set of Pis), the main difference is sometimes people don’t build arm64 images, only x86, but since the M1 and BuildX came out, thankfully that’s getting quite rare.

  3. the kickstarter raised $2M?? so 10,000 people pre-ordered this thing? somehow people are gonna buy 40,000 CM4s??? i don’t get it. what are they going to do with it? the absurdly low price per unit is attractive for a certain class of problem but i simply don’t understand the use case for a cluster of them. once you start buying a stack of them, your price goes up high enough that you could get much much much higher performance.

    i simply don’t understand. won’t these things just be powered on once or twice for the novelty and then get ignored??

    1. The Turing Machines guys are hoping to have enough numbers to buy some CM4s as distributers, which should help.

      Let’s see, the board at $210 and 4 CM4s, let’s say the 4gb boards with 8gb flash, $55 a piece. Add the convert boards at $10 a piece: $470 for the cluster. I hope to eventually have a follow-up about what you can do with that, but that’s definitely enough power to run useful services. Zoneminder, nextcloud, gitlab, Wireguard, pi-hole, etc. Sure, you could do that with an old desktop or cheap server, but at 30 watts?

      *Shrugs* I’m sure some of them will be novelty items. I literally intend to do automated docker builds and testing with mine.

        1. But your ‘more capable’ ‘*new*’ setup won’t have the resilience of many pi’s (or alternatives), almost certainly is actually LESS capable at some workloads while on average being similar for others, and will never consume as little power in the real world for almost all users! -The idle of Intel/AMD just isn’t that good and in calc per watt atm Intel isn’t even in the running, and AMD stuff while good is mostly good for amd64 arch rather than actually competitive with most arm architecture devices.

          If you really want a single all powerful workstation type server you are shopping for a very use than a genuine cluster computer, and there are many things you would rather have on isolated hardware that requires so little compute power even a Pi 1 is overkill – there are many different requirements out there, some of which can sort of overlap…

          1. i was astonished to learn every thing in your comment is counterfactual. certainly there are intel processors that are power hogs but for 5 years at least, low-end intel laptop processors (celeron n4000, etc) go into an idle mode that is quite effective.

            two years ago, i had never used a pi, and i had not used an intel laptop processor made within the last decade, and i would have found these claims you just made totally compelling. but now i have used them and i have learned two astonishing facts: intel cpu idle mode counts for something these days, and pi doesn’t have an idle mode! pi doesn’t have an idle mode! oh. my. goodness. pi doesn’t have an idle mode! it’s an arm, just like your phone is an arm, BUT IT DOES. NOT. HAVE. AN. IDLE. MODE. in ambient at my house, with passive cooling on each, my laptop is 40C (which i am using) and my idle pi is 50C. it’s insane! it’s unbelievable!

            pi has many strengths, no need to make up its weaknesses as strengths. it’s not a particularly power-efficient chip.

          2. The pi pulls while under no load – as in ‘at idle’ perhaps as high as few 100mw, though I have seen less than 100mw on the one I really studied for prolonged periods, the only reason a pi will be hot is the tiny heatsinking potential compared to your beefier processor that MUST have real cooling to run at all, and is attached to a much more massive PCB for more passive heatsink area – if you haven’t done anything to help the Pi you have no heatsink and a tiny PCB vs a heatsink capable of dissipating hopefully around 70-100W (or maybe even more as that is the sort of standard power draw at full load for most laptop/desktop CPU) – just massively massively better cooling as it really has to have it to work at all…

            I right now have a Pi4 massively overclocked (for a Pi4 anyway – not claiming its got anything like modern AMD64 CPU performance but its got way more than its ‘supposed’ to) and when 100% loaded for days it won’t get over 30C, at least if my room is a comfortable temperature for me… Because guess what its got a massive (and passive salvaged) heatsink on it! (bit of heatwave here right now, so the ambient is something getting close to 30 far to often and the Pi obviously has to be hotter than ambient)

            And a PI is more than than just the CPU even if the intel CPU could idle that low, which it can’t – its got low power states that might rivial it, but those are ‘suspended’ more than ‘idle’ as its not actually ready to go instantly, and any Intel system also has its memory which is a massive power hog in comparison (especially to the stacked CPU-memory SOC SBC’s), the motherboard chipset also has meaningful power draw – the whole system required to make the processor run has a meaningful increase in power demands for the Intel/AMD systems, and all that extra IO potential even when its unused does consume some power!

            Didn’t say a Pi was magically power efficient, but it does as a whole system idle at very very low power consumed, and have a good calc to watt, its not the best at either by a long margin, though neither are the common consumer laptop AMD/Intel’s, but its still really damn impressive, more so for being so cheap! (well in normal times when its easy to source as MSRP)

          3. Oh also I was comparing a server capable of really pretending to do all the many nodes something like this cluster can, vs the actual hardware cluster – which means its needs a great deal more powerful components than your low end ‘efficient’ netbook, its playing in a different game entirely.

          4. Foldi-One, i’m always impressed by your interest in having this same conversation, and presenting the same counterfacts. if you’ll recall, last time you told me about pi power consumption, i quoted reports at you that said a different number. i thought i had made an impression on you but i see i was wrong about that, as perhaps you do not remember the conversation at all?

            “The pi pulls while under no load – as in ‘at idle’ perhaps as high as few 100mw,”

            anyways, i just found two resources that reports pi 4b draws 2.7W at idle. for comparison, my n4000 whole laptop draws 2.3W at idle WITH THE BACKLIGHT ON.

            this great article took the extra step of comparing pi to an intel processor, finding that the pi and the intel had comparable idle consumption but the intel would draw more under load.

            https://uni.hi.is/helmut/2021/06/07/power-consumption-of-raspberry-pi-4-versus-intel-j4105-system/

            fwiw i would estimate that my n4000 is 3x as fast as a pi, and the j4105 is purportedly 2x as fast as the n4000. so that j4105 might deliver 6x the processing power for only 2x the electricity, and with similar idle consumption. and if we’re looking at a cluster of 4 pis, the j4105 will still be marginally faster than the ideal performance of the full cluster, and the cluster will have 4x the idle consumption of the intel.

            and the laptop is just as good at everything as the pi is, including i/o…the laptop’s usb-c port is a big bottleneck but the pi is just as plagued with bottlenecks…it doesn’t even have full-bandwidth gige.

          5. Oops I did put the wrong unit in meant AMPS not watts, which at 5v does matter, 2.7w is still a fair bit higher than I’ve personally observed by not unreasonably so – afterall there are so many variables as to what a system is really doing ‘at idle’, and how long you measure it over can make a big difference to your measurement 2.6w for me is still higher than the peak I ever saw ‘at idle’ but with background tasks creating significant spikes over the lowest it can actually go while running how you measure can make a difference, I wonder how much difference your choice of desktop environment (if any) and such makes to idle…

            And yes some modern x86-64 systems can do ok at idle but the COMMON x86-64 systems do not, even in the ‘efficient’ end, as the Intel/AMD CPU market is very dominated not by their one or two ultra efficiency offering but by the desire to drive 4K and up of pixels, with snappy responsive feel – the more than it can open a few web pages, maybe watch a video and a word processor type system, usually with good IO expansion via PCIe/USB3 bus not ultimate efficiency, and all the supporting hardware that actually lets that system work has meaningful power draw! Making any conclusion about an Intel/AMD processor stand alone vs the entire system of a Pi pointless.

            And when it comes to hardware the most cut down efficient end of the Intel/AMD can’t do everything this cluster type machine can do, as it doesn’t have all the threads, PCIe, memory bandwitch etc of its bigger brothers, which could at least pretend too – its massively cut down so while the performance at some things is going to be better, its still a faster single CPU those super cutdown ultra efficiency targeted machines are still limited, not to mention less useful for many of the tasks such a cluster might be put to with the different IO and thread counts – its a perfectly useable single user computer no doubt, but its not a cluster replacement or a good candidate to fit into most cluster IMO, mostly as those super low end efficient processors are usually not available as anything other than soldered to a laptop mainboard – that is alot of extra unneeded e-waste to make the cluster – where the lowest end of the socketed assemble your own system power draw isn’t that good – as it needs so much more internal hardware to meet the potential needs…

    2. Quote: “i simply don’t understand. won’t these things just be powered on once or twice for the novelty and then get ignored??” Answer: Does it ‘have’ to be used 24×7? Does it have to be ‘practical’? Even if it cost thousands? Shoot, I just built a TOS Startrek computer that blinks lights and talks. Spent a few C bills on parts for it. Useful? Used? Not really, but I did/do have fun building it and I’ve learned quite a bit from it. Now going to learn more with a decent microphone (give it some ears) for speech recognition that I’ve stuck in it… Thinking about a camera too for sight projects… Some day it will get shoved into a corner too as just a ‘novelty’ :) . But so what?

    3. I’m using mine (Turing pi 1) to drive an array of SDRs with an LTE router / GPS NTP on one of the nodes. far from a toy….

      that being said, yeah its going to be a major PITA to get Pi4CM…but thankfully you can use them, Nvidia SOMs, and Turing pi is offering their own SOM package.

    4. If I got one of these it would be running 24/7, I’ve got many pi’s doing things that could be put in such a system, and some of them are running 24/7 because they are in use 24/7, others are on 24/7 because going to them for a power cycle is more effort than its worth, plus being able to use the jetson boards with their rather specialist processing goals seems like it would be very very useful to me.

    5. Aren’t Jetson mainboards basically $200 anyway? I don’t know how much use Jetson modules actually get in business, but you could easily use this for burn-in testing.

      The Jetson things always struck me as wacko-expensive, but what do I know.

      1. They are very expensive as small dev board type ‘computer’ things go now, but also very special in what they can do from a very small form factor and reasonably power efficient at it too. So if your needs match what it is good for they are to my knowledge the only really valid choice right now, they just don’t have rivals.

        You can use a small ‘real’ PC/laptop perhaps but its not like those are cheap either…

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.