Raspberry Pi Server Cluster In 1U Rack-Mount Case

[Paul Brown] wants to take advantage of off-site server colocation services. But the providers within [Paul]’s region typically place a limit of 1A @ 120V on each server. Rather than search out commercial low-power solutions, [Paul] embraced the hacker spirit and built his own server from five Raspberry Pi 4b single board computers.

The task involves a little bit more than just mounting five Pi4s in a chassis and calling it done. There is an Ethernet switch connecting all the modules to the network, and each Pi has a comparatively bulky SSD drive + enclosure attached. By far the most annoying part of the assembly is the power supply and distribution cabling, which is further complicated by remote controlled power switching relays (one of the computers is dedicated to power management and can shut the other four modules on and off).

Even if you’re not planning on building your own server, check out the thoroughly documented assembly process and parts list — we particularly liked the USB connector to screw terminal breakout connector that he’s using for power distribution. For all the detailed information, assembly instructions and photos, we think a top-level block diagram / interconnection drawing would be very helpful for anyone trying to understand or replicate this project.

There are a lot of connections in this box, and the final result has a messy look-and-feel. But in fairness to [Paul]’s craftsmanship, there aren’t many other ways to hook everything together given the Raspberry Pi form-factor. Maybe a large and costly PCB or using CM4 modules instead of Raspberry Pi boards could help with cable management? In the end, [Paul] reckons he shelled out about $800 for this unit. He compares this expense with some commercial options in his writeup, which shows there are some cheaper and more powerful solutions. But while it may be cheaper to buy, we understand that strong urge to roll your own.

We’ve written about many Pi cluster projects in the past, including this one which contains a whopping 750 Raspberry Pis. Have you ever used a colocation service, and if so, did you use a DIY or an off-the-shelf server?

28 thoughts on “Raspberry Pi Server Cluster In 1U Rack-Mount Case

  1. I have two servers being colocated now, both of them I built from Supermicro parts. Both cost more than $800, but the extra processing and storage oomph is worth it for me. If this has the power he needs, sounds like a reasonably good solution.

    I do wonder about his relay board. It looks very similar to one I’ve used for low current switching. The relays all eventually became unreliable, and I eventually switched to a solid state alternative. I hope that won’t eventually bite him in the same way.

    1. Are the plan prices comparable? and if not, why wouldn’t they offer 500mA @ 240V as a cheaper option? Ultimately it boils down to watts, but the economics of each country and provider works out a bit differently. 120W is clearly smaller than 240W, it certainly costs more to run two light bulbs than to run one. And I’m of the mind that successful businesses will pass their costs onto their customers. (I pay retail electrical rates for a constant 600W pull in my green house, so I may be overly sensitive to power costs)

      The US offers both 120V and 240V service, even in colocation. Large data centers in the US gets a 480V service (and I doubt there is an exception to this case). What differs is that US and CA residential wiring is a bit convoluted when compared to UK (and I assume AU). Industrial service is more similar due to practical reasons. Not that it is easy to adapt a motor made for 50 Hz 3-phase to 60 Hz, but there is more to motors. (I used to work with an electric arc furnace, it made in Germany but runs on US power)

      1. Yeah, but the space and administrative cost of your 20W bulb is simply not lower than that of your 40W bulb. There’s a minimum granularity at which you can do business. Especially for something that involves people actively running around and putting things in racks, I don’t think the electricity costs are the main cost factor.

        Your 1 HU that uses half the power is still 1 HU. It’s still one invoice. It’s still one guy installing stuff. It’s still insurance. The list goes on. So, that’s probably why they don’t offer a 500 mA option – the overhead gets too dominant, and the cost-per-customer simply doesn’t justify the effort. Gotta ask yourself: how many people will afford the 500 mA tier, at what profit to you, that wouldn’t afford the 1 A tier? If the answer is “less than 1A customers that then switch to a lower-margin tier”, then hey, don’t offer that tier.

  2. i thinking. why nobody create a network over GPIO ?
    4 connecteg raspberry pi with fast connections (fastest than memory like in supercomputers) will be more efficient than one faster machine. lan every-to every is importatnt more than power.

    1. > 4 connecteg raspberry pi with fast connections (fastest than memory like in supercomputers)

      Can not exist, the raspberry Pi SoC has no high-speed external bus. Plus, the GPIO headers are completely unsuited for high-bandwidth digital signals.

      > will be more efficient than one faster machine.

      No! An 240W thermal design power intel Xeon is way better in calculations (of the type you care about) per Watt than a raspberry Pi. Honestly, embedded hardware devs regularly ridicule the raspberry Pi’s TV set-top-box SoC for being so old (read: low performance per watt) and having no proper power-saving modes.

      This is a commonly perpetuated myth! So let me clarify once more: A Raspberry Pi is *not* a power efficient computing machine. It might be more efficient to run a Pi off a USB supply than a PC to check on something once a second or so. But 4 or 5 Pis on a server ATX supply? Hell no, this is a waste of energy. A small µATX with a modern x86 will probably use less in idle, and be capable of WAY more things than the badly coupled, in themselves old-generation, low memory-bandwidth SoCs of the Pis, especially with the Pi’s inefficient power-converter architecture.

      > lan every-to every is importatnt more than power.

      70 years of high-performance computing research would like to point out that this has never proven to be right, and we’ve had *a lot* of small tiling CPU architectures (Tilera, Tile64, Larrabee), and even more clustering architectures, and it shows that there’s a use case for many *non-communicating* very power-efficient single-trick horses (GPU shaders!) and for mixed-bandwidth architectures (what you’ll find in “clusters” these days: someting like a rack of 20 boards, with two CPU slots each, which each CPU having 16 cores, of which pairs of 2 share the most local caches, groups of 4 the next higher layer of caching.

      Never, ever, do you need a bunch of weak CPUs with low interface bandwidths to build something better and faster than the next bigger machine with less cores but a better memory bandwidth architecture.

      And to repeat: the Raspberry Pi does not use a modern CPU. This is really the fallacy to begin with. You want compute per € in capex and opex? You sure as hell won’t buy RPis for that.

      Take the RPi for what it is – a cheap embedded computer with a large community and a focus on enabling. It’s not anything you can build a highly efficient machine out of.

      1. So incredibly 100% agree. The only real question is: what is the target workload? Are we sim-ing out some random network architecture? Are we setting up a target for backups? 120-watts is paltry in the colo world. I guess what we have achieved here is a cluster of slow systems that can be remotes into to do some small amount of work. Like you said a uATX board with a SSD running a hyper visor can do everything that needs to be done with these boards, with greater flexibility. Love the ingenuity love the spirit. Keep hacking my friends!

      2. Technically the Pi does have such high speed bus(ses), but it’s Broadcom proprietary, undocumented and not originally meant for networking, though probably could do some arcane token ring with it with sufficient brute force and ignorance coupled up with cleverness.

        But will that be worth it? No, but it’ll have the “hold my beer and watch this” vibe on lockdown.

      3. Hear, hear. Pis and other SBCs are great but they are not part of a high performance or high performance-per-power solution (except maybe as a “glue” or embedded control component, I guess.) A pi cluster is an educational project which takes up less room than buying a bunch of older 1U servers and learning how to build a cluster with them.

        1. Exactly. An educational tool. I”ll be honest, though, with the ability to run arbitrary VMs on any server and set up networking, I don’t really know what you’d teach with that, aside from the actual “plugging things in” experience, which honestly isn’t that great with RJ45/Ethernet.

          Everything, from netboot to MPI that would be relevant to a cluster use case can be demonstrated with software – and it’s more realistic, too. Datacenters use virtualization extensively, that’s why it is developed.

          Seeing that you’d probably pay per public IP address, too, I must admit, 4 Pis in a box – not that useful to me, but maybe someone sublents these to people who don’t understand VMs, and then it’s fine.

      4. Some applications run more efficiently on ARM. Typically, those small ARM chips are weak on floating point but reasonably good at integers. For example, when it comes to mining Perk, a cluster of cheap smartphones does way better both in performance per watt and performance per dollar than any x86 chip at the time.

        1. hey, could you hand me a benchmark where a raspberry Pi 4 CM (namely, an A72, that’s an early-generation ARMv8-A with the bare minimum NEON extensions) outperforms an x86 in integer ops per Joule, and a non-synthetic application that can make use of that? Never heard of “Perk”, and I’m sure that by now there’s a dozen cryptocurrencies that are optimized for mining on mobile hardware, so, ummmmm, that really does not convince me. I don’t think you’ll mine any cryptocurrency with 80€ hardware in a 30€ rackspace that includes energy cost, sorry.

          My DSP experience simply says that both memory bandwidth and CPU speed on the Pi are very limiting factors: a quad-core 1.2 GHz thing can, even at full SIMD utilization not really compute with many modern x86 CPUs, and also not with modern Smartphone Octacores; the difference between a 2015 A72 and a 2017 A75 in “math operations of any kind per cyclce” is *roughly* (very roughly) a factor of 2.

          Let’s face it: If companies were fighting like crazy to produce millions of set-top boxes with the BCM2711, then Broadcom wouldn’t be willing to sell them for low prices to the RPi Foundation. And that’s fair:

          Very much like Bryce [says](https://hackaday.com/2021/07/21/raspberry-pi-server-cluster-in-1u-rack-mount-case/#comment-6366543):

          > Hear, hear. Pis and other SBCs are great but they are not part of a high performance or high performance-per-power solution (except maybe as a “glue” or embedded control component, I guess.)

          The mission of the RPi is to make embedded compute accessible. They sell the CM4 to engineers that never made the jump, either out of convenience or simply because their application doesn’t need anything more specialized/better to more capable or more low-power devices. I doubt there’s many things the CM4 is the “optimal” choice for – but it really does not have to be.

          You can optimize a chip for throughput in a server application, or you can optimize it for usage in low-cost set top boxes. These goals are kind of contradictory, so, guess what the BCM2711 did.

    2. Thats the issue though, they don’t have fast connections. The GPIO caps out at 125MHz if you’re very lucky, and bitbanging something sensible over GPIO on the CPU is going to be pretty intense in its own right.

  3. How about installing a high power USB power bank to provide UPS support for safe shutdown incase of a power failure, especially since MicroSD cards are prone to failure when power is suddenly removed.

  4. if compute modules weren’t missing hardware, the PCB for holding a bunch of them could be mostly passive. That said, I would pay a fair bit to have a carrier board in mATX form factor for 4-8 modules (CM4). The ATX IO plate could either be a bunch of 9-pin serial connectors (not ideal, but easy). Or perhaps route USB ports of each Pi which would at least streamline the installation/update even if it is a bit cumbersome. The most powerful option would be to take the PCIe of one or two CM4 and fit a riser card where the IO shield would go. then one of the nodes can host a 10GbE interface.

    1. May I ask for what purpose you’d use that? ARM server boards are not that expensive, and would outperform a bunch of CM4s in every way (I can perceive). What the would you do with 10 GbE on a RPi? (I mean, seriously, nothing in the Rpi4 is made to do anything with that amount of data. 10 Gbit/s is about half of the realistic read rate from the RAM, meaning that your CM4 would be fully occupying the memory bus just with reading/writing RAM in DMA transfers, if you had any CPU left to coordinate these).

      1. just for comparison: a 8-core, 16 Thread AMD Ryzen 7 at 3GHz (3.7 GHz max) costs about (incl. German VAT) 180€; a µATX mainboard for that around 50€, and 32 GB RAM 130€, for a total of 360€ incl. taxes you get a machine that idles at about 10W and makes a rather server if you want to.

        Your four CM4s with 8 GB RAM are 70€ each, so that’s cheaper by 80€, indeed, but you get actually pretty bad performance (a quad-core ARMv8 at 1.5 GHz really doesn’t play in the same league as four threads of a Zen2 architecture core), and you’d still need your carrier board, would have zero possibility to attach any reasonable-speed storage or networking (you get _1_ external PCIe Gen 2 lane, that’s 5 Gbps in total, not even sufficient for one SSD), your RAM has a total bandwidth that’s lower than the USB speed of the x86 board…

        Upsides: each CM4 will only take about 2.5 W idling, so you’re not worse than the x86 board!

        What I’m trying to say: Your 4- to 8-CM4 board computes with lower- to at best midrange x86 hardware in terms of pure computing, if there was any storage, external or memory bandwidth to even compete. I’m sorry if I fail to see the use case of CM4 clusters!

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.