Many readers will have had their first taste of experimentation with cluster computing through the medium of the Raspberry Pi. The diminutive Linux capable boards can easily be hooked up as a group via a network hub, and given the right software become a whole that is greater than the sum of its parts. None of us will however have reached the heights of the Raspberry Pi cluster shown by Oracle at their Oracle OpenWorld conference, a mighty rack packing a cluster of no less than 1060 Pi 3 B+ boards. This machine is touted as a supercomputer and it’s worthy of the name, though perhaps it’s not in the same class as the elite in that field.
Getting that number of individual 3Bs into a human-sized rack is no easy feat, and they have gone for custom 3D-printed racks to hold the boards. PoE would have resulted in too much heat dissipation, so instead they use USB power from an array of large multi-way USB power supplies. A set of switches provide the networking, and a conventional server sits in the middle to provide storage and network booting.
It’s certainly a cool way to wow the crowds at a conference, but we’re unsure whether it delivers the best bang for your supercomputing buck or whether it’s more useful as a large room heater. Meanwhile you can take a look at a few more modest Pi clusters, with unusual operating systems, or slightly more adherence to convention.
Thanks [Frisco] for the tip.
Honest question: aside from the “cool” factor and/or learning about cluster computing, what benefit would there be to have a cluster of RasPi’s? Will it make typical computing tasks – web browsing, video/audio/image editing faster? What programs would be able to take advantage of the cluster? And is all that do-able by the typical home tinkerer?
Clusters are often used for highly parallel scientific computing workloads. They are not useful for “typical” computing tasks like browsers/editors.
You could use them to render a bunch of different videos/codecs simultaneously, but it’s not going to help you with the editing part.
I would use this for something like a test/dev HPC cluster, booted with xCat and job controlled by either Moab or Slurm. Not for production jobs, mind you — just to prove that the proposed programatic setup will boot and accept jobs as expected. Once this proved the process out, I’d run production HPC on a room full of Dell PowerEdge 740s or equivalent COTS servers.
Nobody would use R740’s for a compute cluster. It would be C6420’s machines (aka four nodes in a 2U chassis). All the other vendor’s have similar machine’s. Also nobody would sensibly use 3b Pi’s for a comoute cluster due to lack of real 1Gbs ethernet. That cripples the machine performance wise. The lack of RAM is also a huge issue. At the time Oracle built this a BananaPi would have made infinitely more sense. The Pi4 fixes those issues if you use the 4GB RAM model. My day job administer an actual HPC system.
No one, except the company I work for that regularly buys R7*0s in bulk. Of course, our needs are specific — we need 10T of local scratch space for the image processing we do in HPC.
I do agree that your solution is better for most “normal” HPC needs. Very similar to the US Govt. solution I toured last year, except they went with a COTS solution with 1 node per 2U
Only sith deal in absolutes
Nope, pretty useless for “user-style” usage. You can’t distribute your surfing workload across loosely coupled machines (and that’s what these are – there’s no direct DMA between CPUs on different Pis at all).
Video editors might have something like a “cluster backend”, where you might be able to benefit from automatic background handoff to a compute cluster for e.g. rendering and compression jobs. But honestly, when it comes to BAM! per buck, this cluster isn’t even remotely competitive, and when it comes to watts per BAM!, even less so.
In short: Unless you’re really having a workload where you don’t actually want a proper cluster, but an array of small, well-isolated machines, RPis are definitely **NOT** the way to go, sorry.
There is one situation where a cluster of Pi’s might be the most efficient – scaling with workload or power availability (say running off solar cells). As normal clusters at least in similar physical dimensions can only be minimally controlled in this way – you only have a handful of segments you can turn properly off (also usually takes much longer to get past the bios stages of a reboot on the bigger machines).
Like your array of well-isolated machine not the most common usecase I’d think.
Though a smaller cluster like that could do well here (I do have solar and running my big workstation under load doesn’t match the best calculations per watt of whichever model Pi I tested against (think it might have been an original pi B or pi 2 as this was a while ago). So if the network hardware is efficient enough it could actually let me do most of my work off my little laptop farming the loads off to the cluster for completion at the rate my solar cells are overproducing being both more efficient calculations per watt wise and better for my wallet as there should be minimal electric bill (I assume I would still run it overnight if there was a load for it) – and a newer workstation which might manage the same calc per watt would cost just as much if not more but not gain such vast dynamic scaling potential to match the spare power..
https://www.raspberrypi.org/blog/raspberry-pi-clusters-come-of-age/
I would guess that computer time on a real supercomputer is still a scarce commodity. So having a ‘smaller’ supercomputer that uses the same architecture would be useful as a testing platform to see if your code runs fine. It runs slower, but it gives you the chance to debug your application without you having to take any time from the real supercomputer.
You can do the same thing with a couple of old desktop pc’s that you probably already have. No value in the pi in this instance. This is just a case so somebody spending a lot of money. Pi’s do occasionally have places they are perfect for. IMHO this is not one of them.
It’s the ideal machine for hosting a dial-up BBS that could support 142.6 million simultaneous 9600 baud dial-in users. You would need a larger conference room to host the modem rack however.
Have everyone just Telnet in!
If your workload depends _only_ on memory bandwidth, and not on latency between nodes, then the PI’s may provide the best bandwidth/euro.
The only thing I can think of, aside from education, is rendering or any type of encoding the Pi (or if you used a different ARM board) that the processor has a dedicated engine to speed up. H.264 for instance.
I’d find a small render cluster of cheap codec specific ARM systems in parallel practically useful.
I have mechanical engineering simulations that often take a work day or more to solve. I would love to have results in minutes. Maybe in 20 years.
Tremendous benefit if you are Oracle. 1060 x4 cores, Oracle charges per core.
I could see this being useful for doing mass compiles for an ARM distro, or running tests and fuzzing on ARM binaries. Essentially, anything that actually requires the ARM arch.
I remember when some tried to mine Perk in VMs running on a PC. It didn’t work very well (if at all) since the mining code was native ARM and the emulation overhead completely decimated performance. And even if it didn’t, there wasn’t a PC that can beat the performance per dollar or performance per watt of those really cheap smartphones.
you seen 1 pi cluster.
you seen them all.
power use VS reality…
Loading this to the limit and to be nice calculating on 2A per PI its 2120A at 5 volts.
Even with a nice 240 volts as in europe it’s 10KW or 44A.
That is what a larger house would need in dead cold winter.
It will for sure keep the room its standing in warm.
Nice! I’ll get myself a Pi cluster instead of an electric heater for this winter. Although I’d want to stay below the 13A limit for a regular outlet’s power use in the UK.
13A per socket. 30A per ring, just saying ;-)
I have other things to power in my home! :-P
I really do like the idea of a little Pi cluster for some experimentation with parallel computing techniques, but it’s going to be at the bottom of the priority list for quite some time.
Hey Shannon –
I grabbed one of these recently to experiment. It’s been running some BOINC jobs smoothly for a week now. Worth a look!
https://clusterhat.com/
Yup, I do see the point of small clusters for “playing with”. I am continually tempted by the idea of a “laptop supercomputer” implemented by getting an old luggable laptop, putting 7-12 pi zeros in the lead acid battery bay and using it as a terminal to them. Probably running off wall power only, but there’s a lot of dead space in those old beasts, one might even get 10,000mAh of Lithium whatever, in flat packs under the motherboard, or taking the space of the 3.5 inch HDD which you replaced with compact flash.
For comparison, the top ranked super computer in the world, the Summit IBM Power9 system at Oak Ridge National Labs draws 10,096 kW of power, or 10 mW, and runs a $7 million per year electric bill.
The #100 ranked system, the Mihir Cray XC40, draws 950 kW of power.
Clustering isn’t cheap, no two ways about it. This is only exasperated for a pure educational setup with no viable commercial expectations to recoup costs (or government funding)
every. single. time…. every time I see a discussion in metric, I see an order of magnitude error…
>10,096 kW of power, or 10 mW
10 milli Watt ??
https://en.wikipedia.org/wiki/Metric_prefix
Also from what i find the max power consumption for the 3B+ is 5W, not 10W as the OP states.
This seems relevant:
https://www.jeffgeerling.com/blog/2019/everything-i-know-about-kubernetes-i-learned-cluster-raspberry-pis
If the individual jobs were small enough, a set of Pi’s might actually be good enough for the job. However, I’ve never seen devops types keep their jobs small.
My spitballing is saying you get about the same MIPS as 5 quad Xeon boards, which would be half the power, and I think half the price.
But you would not get the free article advertisement. ;)
I was just asking myself the question you just answered. This is cute, but if I was building a cluster I would start with other building blocks. So what “advantage” does this Pi cluster offer over a cluster of Xeons or other candidates? You would think that Oracle would have pondered this before investing in this — and likely they did.
Perhaps it is just for publicity.
Maybe helps research scalability issues with gruntier kilocpu systems. Or for a class of simple integer problems that care more about clock ticks where you get the same speed on a P4 3Ghz or the latest i7 @ 3ghz (per core)
I suspect it was built just for getting attention at the trade shows – half of parallel processing is the interconnect, and the Pi 3B’s ethernet is limited to USB2 speeds.
Yeah it must be a tradeshow/tech demo system – Its a great portable demo of their software’s scaleablility. So many nodes and it still works well (I assume).
And if it works on heaps of Pi’s it can work for your probably smaller heaps of threadripper/Xeon etc based systems.
There’s some classes of problem that are minimal data, maximum crunch, pretty much the kind of things done by distributed computing projects.
One can buy a used dual Xeon hexcores server with dual power supplies and tons of ECC for around $600, sometimes less.
A comment made elsewhere was that this can help with distribution testing, though with ‘pay per time’ hosting it may not be the cheapest option.
Eg,
Want to test if your program will distribute accross many nodes, and testing software for stability.
See how a load balancing system distributes and monitors 1000 systems,
Its not good performance, but it is a lower cost method to see how something works when split across many machines, not counting virtualization and the limits there.
Maybe it runs multi-version of Oracle’s Database product. I think that you’d need another supercluster to figure out the billing / licences :-)
@Alan Hightower — thanks for the smile of a BBS with millions of BBS users. Wonder if an alternate time line that would be the successful version of AOL?
AOL failed in your timeling?
With another 80 raspberries they could have demonstrated 1080p ray tracing with one CPU per row of pixels.
Now that’s an interesting thought. A sing line of 1920 ray traced pixels per pi. Maybe it’s weak GPU could handle it as well. I wonder how fast a single pi could render that line, probably not 24fps, but would be interesting to see none the less.
Is this thing why I cannot purchase a Pi 3 B+ anywhere on the net?
That rack frame needs a flashing red light on top and an audio circuit to produce the vrorp vrorp sound to complete the Tardis look.
235GFLOPS rasph pi 4b