64 Rasberry Pis turned into a supercomputer

In retrospect, it was only a matter of time before someone turned a bunch of Raspberry Pis into a supercomputer.

The Raspi supercomputer is the result of a project headed up by University of Southampton professor [Simon Cox]. Included in the team are a gaggle of grad students and [Simon]’s 6-year-old son who graciously provided the material, design, and logistics for the custom LEGO case.

The Iridris-Pi supercomputer, as the team calls their creation, consists of 64 Raspberry Pis, all configured for parallel processing using a lightweight version of MPI. [Simon] was kind enough to put up an excellent guide for turning two (or more) Raspberry Pis into a supercomputer.

The machine has a full 1 TB of disk space provided by a 16 GB SD card in each node. Although the press release doesn’t go over the computational capabilities of the Iridris-Pi, the entire system can be powered from a single 13 A supply.

If you’re wondering what it would take to get a Raspberry Pi supercomputer into the TOP500 list of supercomputers, a bit of back-of-the-envelope computation given the Raspi’s performance and the fact the 500th fastest computer can crank out about 60 TeraFLOPS/s, we’ll estimate about 1.4 Million Raspis would be needed. At least it’s a start.

56 thoughts on “64 Rasberry Pis turned into a supercomputer

  1. Some time ago i had a dream to make this happen with beagle boars. There were no raspis invented.

    I’d say this micro – super – computer is one great educational tool to teach parallel processing, at a low cost.

    1. yeah… that’s right.

      I’m still waiting for a time to be able to buy the not_yet_existing $25 version from major distributors as a device that is on stock all the time.

  2. “The team wants to see this low-cost system as a starting point to inspire and enable students to apply high-performance computing and data handling to tackle complex engineering and scientific challenges as part of our on-going outreach activities.”

    Wouldn’t an average i5/i7 desktop do just fine for this at a lower cost than *2500* GBP?

    1. The complex engineering and scientific challenges they’re talking about are the problems specific to parallel computing. Designing algorithms that work well in parallel, handling communication and dealing with communication bottlenecks, sharing loads, etc. The actual power of the system is not important.

    2. I completely agree. You can teach the principles of high-performance computing and data handling with any multi-core machine effectively (and more cheaply). There are plenty of available libraries that will handing IPC across an IP framework (DBUS and zmq to name a few I’ve used…) for which programming complex engineering/scientific challenges can be used.

    3. It would be even cheaper to go to a local electronic scrap yard and get a bunch of pentium III/IV/AMD K7 mainboards, buy some PCI-GBit ethernet cards and a many port gigabit switch and connect them as a Beowulf/OpenMPI cluster. => More GFlops/s for sure, cheaper, recycling of scrap electronics.

  3. The problem is the ethernet throughput: with my IGEPv2 (a Beagle Board clone) I can’t exceed 5 MB/s transfering file from it to my notebook.

    The maximum computing power is too limited by the speed of data exchange between devices

  4. I don’t understand the 13A current number. If 64 RPis can be powered by 13A, that would be 203 mA– much lower than the recommended 700mA rating. OK, 3.5 fold headroom– I can of see that (sort of). But looking at the original article, their cluster “runs off a single 13 Amp mains socket.” In which case, that number means nothing. The UK uses 220V, which at 13A is 2800W. Assuming 80% efficiency going down to 5V, means it could supply 457 A at 5V which should be enough to power a lot more than 64 RPis.

    1. the 700mA figure includes both usb ports being ran to the limit, along with hdmi or composite going
      those are all un-used, so that leaves some current free

      1. Looking into this further, I don’t think the 13A refers to the power supply but rather to the outlet. Look at http://en.wikipedia.org/wiki/BS_1363#Fuses

        which describes the electrical outlet standard used in the UK. It turns out that 13A is one of the outlet current ratings.

        I agree with you that the current consumption _might_ be that low (or even lower) without using USB/HDMI but I think the summary “entire system can be powered from a single 13 A supply” part is an incorrect reading of “runs off a single 13 Amp mains socket” from the original article.

      2. Furthermore a 13a plug in the UK has a 13 amp fuse incorporated into the plug which will blow if you take more than that over time.
        By blow of course, it doesn’t pop with the first overcurrent, its a function of time, fuse type, overcurrent demanded etc. But you get the picture, if it plugs into one outlet, it’s obeying the 13amp rating because that 13a fuse in that plug is mandatory.

    2. From my observations over the last few days, I’d say ~250 mA is the standby current for an idle RasPi, with around 400 mA being normal for one under high load (e.g. compiling programs).

      The only thing connected at the time was a network cable, but the only thing it was being used for was SSH.

      Therefore, I’d say 200 mA is too low – 13 A almost definitely refers to the mains supply, and being yet another reminder that the unit they should have used is watts, esp. since with that number of nodes, it might be more efficient to have a single 3.3 V regulator for the entire cluster…

  5. Seriously? I have resisted the urge to razz you guys about diluting your content with worthless Raspberry Pi posts as it seemed you were doing better about that, but this one is bad.

    When all the comments are negative, you have a problem.

    1. I disagree with your statement. This is an interesting build and you’re upset because it has the Raspberry PI “buzzword” in it. The people who developed the Raspberry PI are trying to further the education of this field. Most of the negative comments are about shipping of personal units and not about the build.

      Is hackaday not about making or modifying your own equipment (make a supercomputer). This site has a broad audience and isn’t tailored to an esoteric group that wants to exclude others because there’s a high standard not met.

      How about elaborate on your comment and maybe people can debate about it, a passive aggressive remark doesn’t prove anything.

      My comment on the build:
      It is interesting and the experience making one is probably worth it but I don’t think of it as an alternative money/dollar value for FLOPS. I would love to get a couple RPI’s and start getting my nephew interested this field!!!

  6. Pissed, This is the reason why mines delayed another 8 weeks? Now if they could only manage to cluster more plants manufacturing them we’d all be happy.

  7. I don’t find it all surprising that a tiny number of units were provided to a university to do this.. it’s good PR for a start.. well done to all, especially the lego builder :)

      1. So find me an R-Pi story that has a comment section that doesn’t have..
        1) A list of vaguely similar products mentioned as better, where better = not an R-Pi.
        2) Endless “But mine hasn’t come yet” posts.
        3) Some twit complaining that this is getting too much coverage.

        Reality.. The R-Pi is a good idea. If you don’t see why, that is your problem.

        And remember.. Real engineers concede there is more than one solution to any problem.

  8. If anyone remembers the FAWN cluster, there are actually real-world problems that this type of build can excell at, like hash lookup. Basically compute needs are neglegable, and what you need is memory capacity and speed per node.

  9. Please correct me wrong in terms of calculation. Does this tell me with 64 Raspberry Pi, assuming all model is based on ARM1176JZF-S which is 700MHz, does that mean in math, it is 64 RP x 700MHz = 44.8 GHz of supercomputer?

    1. Nope. Clustering won’t increase the clock-speed of the CPU’s. It increases the processing power and lowers the load. It’s like saying, a 1.5 GHz Dual-Core CPU can process more than a 2 GHz Single-Core CPU.

      1. Wait a minute R
        ” It’s like saying, a 1.5 GHz Dual-Core CPU can process more than a 2 GHz Single-Core CPU.”

        You have to be careful about making statements like this – two 1.5GHz cores working concurrently CAN do more work than a single 2GHz CPU.

        This differentiating factor is that you’ll need to be able to split the processing between two cores. A single-threaded application will perform better on a single 2GHz CPU. But if it was a two-process application, the dual core will perform much better.

        Some graphics cards have over 1000 cores so they can divide a large problem into much smaller problems which can be distributed amongst the cores.

  10. Missing the obvious bonus – it has 64 HDMI outputs. Video wall!

    The total bandwidth is a pretty significant – I’ve often toyed with the idea of using HDMI/DVID-D interfaces for inter-FPGA comms. Cables are cheap, and a some FPGA boards have two or more interfaces.

  11. oh the horror, if someone did this as a PR stunt im sure theyr getting fired or sued.

    its one thing to pull off a technical feat to get people to buy, but when some people have been waiting over 6 months for ONE unit PREordered, it makes people cancel thier orders and buy something else, something they can own without having to “pray” or “hope for” (*%*&% sakes, this isnt a date, lottery, or even a mystery.

    if you cant fucking ship them, dont fucking take orders and then deliver to someone else.

    thats called being an ass and im sure there are people canceling thier orders en-masse right now.

    and someone getting fired

    EITHER THAT OR THE PERSON THAT MADE THIS SUPER-PI ROBBED A TRUCK FULL OF EM!

  12. the 500th fastest computer can crank out about 60 TeraFLOPS/s, we’ll estimate about 1.4 Million Raspis would be needed. At least it’s a start.

    Yet again somebody used an acronym they didn’t understand. FLOPS = FLoating point OPeration per Second. Thus FLOPS/s becomes floating point operation per s^2 which is an acceleration.

  13. This is not so nice. As agreed, no need to use RPi’s for such a project, and furthermore, I still have not gotten my Pi which I ordered months ago, scheduled to be delivered around weeks ago.

    1. To all the people whining about how you still haven’t got your Pis, how about naming and shaming your suppliers?

      My first RasPi arrived months back. Recently I ordered a 2nd one, and it arrived with 6 weeks, as promised. (Both times via element14.)

      Don’t be angry at the people who already have RasPis – we’re not the problem. Be angry at the ones who have failed to deliver yours in the time promised.

  14. Anyone else want to have a sook about not getting their RPi yet? Coz you know.. this is the best place to whinge about it :)
    Well done to all involved. What a great project!

  15. Quit whining. Yes you could do this with virtual machines, but nothing beats the hands on experience of physical hardware. For example, virtual machines don’t overheat but as hot as my pi gets, I’m sure 64 units can have a few issues.

    It’s a college, you know, that place where you learn stuff. This is about the cheapest way to build such a parallel system and I am sure that class is learning a lot. Great project.

  16. What’s next, 64 Casio calculators connected in one “supercomputer”… Raspberry Pi is not a high performance computer, it’s for network/multimedia. If they wanted a “supercomputer”, they could get 4 intel i7 processors and required motherboards and probably get a much higher performance and lower power consumption for much less money. I’m sorry, but this is just stupid waste of resources and time, and that would be ok if they done it with their own money and time, and not wasting public money.

  17. Don’t forget about the 26 GFlops capability of the graphics unit that can be used for Gen Purpose number processing. A rough order of magnitude comparison is an i7 980X gets about 109 GFlops. So about 5 Raspis running a fully parallel benchmark could equal an i7. Assuming 100% efficient use of GPU processing would only require 2308 Raspis to make 60 TFlops. Derate that to 50% and you are still fewer processors and less heat than some of the other massively parallel processors in the top 500. Many of these are also GPU based.

  18. That’s exactly what I wanted to do. I got 2 RasPis at the moment (one as media player, one as a “server”), but I think I’ll order 6 to 8 more. I hoped the Model A would get a ethernet port, then they would be even cheaper for that “supercomputer” purpose. But they don’t have ethernet, so I’ll go for model B.

  19. Hey guys.
    Has so eine experience with this kind of cluster and truecrypt? Could i buy a second ( or third ) pi and run an reasonable fileserver ( fully encrypted HDD ) with VPN etc. or do i need some more pi` for this task?

    Thanks,
    B_S

  20. Hello,
    I’m really interested about this clustering stuff, I want to build a home data center of 10 PIs so I can run a web server accessible by the external world through the domain and the static IP address so I need your help about it, if each Pi is 700mb speed and 512 ram does it mean that the web server will act like 700 x 10 =7000mb + 512 x 10 = 5120 mb ram ? If not then how can you evaluate the server’s performance? I mean what so i get when I cluster 10 PIs or 20 or 64??

    Thank you

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s