We’ve seen the supercomputer cluster work of [Nick Smith] from the UK before, but his latest build is quite lovely. This time around, he put together a 96-core supercomputer using the NanoPi Fire3, a Raspberry Pi alternative that has double the number of cores. His post takes you through how he built the supercomputer cluster, from designing the laser-cut acrylic case to routing the power cables.
The best part of it, though, is the neat way he built a lovely front panel health display. Using a Pimoroni PHat, the display shows CPU load, temperature, disk & network activity for each of the 12 boards using only a single SPI cable from one of the NanoPi boards. This board runs a Mosquito MQTT server that polls status info from each board once a second, then updates the display, so it should be quick to spot any problems. That’s important because [Nick] wanted to keep the build quiet, so he used two almost silent fans that are throttled by controlling the voltage. It’s not a greedy cluster, though, as [Nick] found that it only used 55W when running flat-out, and 24W when idling. It’s another neat build, and the write-up helps you see why [Nick] did things a certain way. It’s therefore a good read for anyone interested in how you plan and build SBC systems. You should also check out previous builds from Nick, such as laser-cut cases for SBCs and his Raspberry Pi cluster.
And as always there is nothing super about his “supercomputer” cluster, other than cost and effort. Linpack numbers are in line with average dual core laptop.
He has built a super computer cluster which knocks the socks off an average dual core laptop.
You restricted your comment to using metrics such as linpack etc, I selected metrics such as coolness, desirability, craft etc.
Well “clustering” is cool, but that doesn’t make it a supercomputer any more than “tall” makes me a tree.
Actually it does. “Super Computer” refers to the architecture of the system, being massively parallel computing and storage nodes running in parallel. It doesn’t refer to speed or capability beyond the above.
An Intel laptop is not going to be massively parallel even if you had one that ran at 999 ghz clock speed. At best you get a single CPU with 4 or perhaps 6 cores.
The Cray-1 super computer ran its processing nodes at 80mhz, and had 1 mega-word of memory, a word being 64 or 72 bits depending on the hardware or software view (8 parity bits per word)
It’s like claiming a 1 mhz logic analyzer isn’t actually a logic analyzer because you can’t debug a PCI express bus :P
move a bit to the left I need some shade on my screen…. It may not be a enterprise solution but for teaching/learning these SBC clusters are available to a lot more people than anything else.
Thanks for the article, I didnt know about the nanopi before at all, now I just have to buy one :-)
it may be bad for linpack, but maybe it’s good for other algorithms that parallelize well. It has 96 cores. Sure , it’s not going to beat or come close to a GPU, or even a mid-end cpu, but if you need to test the robustness of an algorithm across a network, this is a nice cheap way to do it.
It’s surprisingly similar to the Perk mining clusters that many have built back in the day, except those were based on cheap smartphones because they were cheaper than the dev boards available back then. Some have tried just running a bunch of VMs on a PC but then found that if they could even get the mining software to run, it would perform very poorly.
computationally it’s probably not any better than a 4GHz x86 quad-core. But in terms of parallel operation it could probably run Erlang nicely. At the very least it is useful for experimentation and learning, even if the raw horsepower isn’t there. (which is really the goal of these sorts of projects)
This is purely for fun.
If you want performance, you’d go with GPUs instead.
Eg, my GPU server runs at ~70Tflops at full precision. It contains 8 GPUs, costs over $6k, runs at 1400W.
For a similar performance, you’d need like over $100k on these boards (approx 6000 of them), using well over 18kW.
Alternatively, you can run a few cheap Xeon E5 2650l v2 CPUs, has 20 threads, costs about $60+$100 for the Mobo running 90W each.
Like, 1000 of them ($160k @ 90kW).
So considering the server CPU alternative, these single boards are Lots more affordable!
FriendlyARM has a kit of the plastic parts and standoffs for 8 of their NanoPi NEO boards for $15. I think there are NEO’s with the 4 core Allwinner chips….. Yeah, H5, which is four A53. So, $175 for 8 of them and the rack.
For people teaching networking and parallel processing or experimenting, that is a pretty good deal. Plus I believe the Allwinners have Armbian support, as well as Ubuntu Core and all the usual.
I also saw they have been running Tensorflow and a bunch of other cool stuff on their boards.
When you absolutely, positively need to train a Recurrent Neural Network at speed and don’t have any good desktop real estate, accept no substitutes.
Seriously, were I not a pleb, I would want something like this to train Deep-learning sensor integration networks before loading them into simpler processors for use in sensor networks. I.E. an RNN running on a microcontroller like an ESP 8266 can monitor a batch of sensors and process the outputs with very little power. Under normal conditions, it would send a journal of several hours worth of data every now and then, keeping the radio off otherwise. Under unusual conditions, like one or more of the sensors giving data outside of what is normal for a given time or situation, the processor can wake up the radio at that moment and start sending live information.
Even before deep learning was a big thing, RNN’s were successfully used with surveillance cameras so that they would only notify a human operator if something unusual happened. Unusual wasn’t defined by anything concrete, but rather the general patterns in the data that had been trained in over time. Examples included a camera watching the door that didn’t react to people walking in front of the door or people exiting, but would react to someone trying to enter via that door. There were also installations that monitored fences and could discern a person moving on the far side of the fence and one moving between fence and camera.
This is the only way that the sheer amounts of sensory data that IoT-style networks generate will ever be useful to humans.