Turtles All The Way Down, 40 Propeller MCU Skyscraper

Why bother interconnecting 40 Propeller microcontrollers one on top of the other? For the power that comes from parallel processing of course! [Humanoido] put the setup together for a total of 1280 ports, 640 counters, and more all running at 6.4 billion instructions per second for the low low price of 300-500$ by our count. The “skyscraper” even comes complete with software and schematics, promising developers the ability to expand or adapt for any venture. Why would we need such a setup in the first place? For any of the following: vision tracking/modification, artificial intelligence, advanced robotic control, or more.

Related: [Humanoido] loves putting MCUs together, check out one of his other creations the Basic Stamp supercomputer.

[Thanks Logan996]

29 thoughts on “Turtles All The Way Down, 40 Propeller MCU Skyscraper

  1. Except it’s a poor design for *any* of those uses. But I guess that comes with the territory when Parallax is your only hammer. Try a pile of XMOS XK-1 for something usable?

  2. He seems to have tons of projects like that, and he never shows a real use or application for them (except of having several BASIC stamps beep synchronised), or did I miss something?

  3. Hear ya, Delta.. My hoax-bells started ringing on seeing the video of a previous project mentioned above. I always like to not-judge-too-quickly (although i sometimes do) so I tried finding anything that would prove authenticity. All I found was more bells going off. On the most recent page he has a link to the schematics. But it never was meant to be a link; “..”
    I wouldn’t buy one before demanding (and getting) concrete proof

  4. The interconnect is the really tricky part — A few years ago, as a for-fun project to learn Verilog, I implemented something like the CM1 direct from Danny Hillis’ specifications in his dissertation, in a Xilinx FPGA, but I didn’t finish the interconnect. I added in a picoblaze processor as a serial console to access the processors, but always thought it’d be neat to feed it over ethernet or USB. I was able to fit about 256 of the 1-bit processors with about 50% utilization on a spartan 3e starter kit, but intuitively since the processors in Hillis’ dissertation are essentially 1-bit ALU’s with some addressing, I felt as though the density could be vastly improved and much of the gate count was likely going into the memory addressing and block ram.

    i love to see folks these microcontroller clusters, and it’s even more impressive when they build some simple software demonstration to go along with them. i’m not sure that it has any particularly useful applications other than as an academic persuit, but that’s more than enough! :)

  5. this reminds me of the time i interconnected 42 pentium cpus and ok bullshit but: 6.4 billion instructions per second
    what is that like Gips or Bips?
    and how does this differ from 6.4ghz cuz no comprende señor

  6. The real question is how many FLOPS does this thing put out?

    @Jeditalian
    6.4 billion instructions per second is 6.4GIPS.
    Hertz are cycles per second as you’re probably aware. In computers only a certain number of operations can be executed in any one cycle of the clock, dictated by their bit rate(32 or 64 for personal computers). Newer desktops operate in 64-bits at their clock speed eg; 3.6GHz 64-bit.
    Now each segment of code has a size associated with it, so moving a data from one section of the RAM to another may require 4 bits to accomplish it. So you can do this operation 16 (4/64) times per cycle. Since your CPU goes at 3.6 billion hertz you can move that data around 57.6 billion times per second. Moving it counts as one instruction,so if all the computer did was that move operation it would operate at 57.6GIPS.

  7. @Leigh @peter Maybe you could use the COGs for IO and the inter connects and use the same hyper cube design that the CM-1 used.
    Just for fun mind you. Not really a good system.
    PS peter wow on the CM1 on an FPGA Well part of one anyway. Very cool. Right up there with the CRAY-1 on an FPGA.

  8. @leigthoa

    Wtf is a bitrate?? The amount of instructions per clock is determined by design… I.e pipeline design, availability of execution units…

    @article

    Makes no sense. Has zero real world use.

  9. @Slanesch: by schematic do you mean the Verilog code? I just used a stock Spartan 3E starter kit (although I had hoped of one day making a board full of FPGAs for some fun connection-machine-like and other experiments).

    if you’re a verilog and xilinx person (the control processor is a picoblaze), feel free to e-mail me. the code is a few years old and it’s somewhere in the middle of development (by a person learning verilog), and i’m in the last few weeks of writing my own disertation so i can’t devote any time to it/cleaning it up, but i might be able to just zip it up and send it off for purely academic interest?

  10. @cantido
    I’m assuming that comment was directed to me, so:
    Bit rate probably isn’t an accurate description, but what I was referring to was the chip architecture. From a simple perspective, Personal computers are either 32 or 64-bit machines depending on age. So again in a simple model every clock cycle they can execute 32 or 64 bits worth of instructions,4 or 8 instructions respectivly. This model is ignoring hyper-threading, pipelines, memory tricks and other speed enhancing techniques.

  11. @Anonymouse:
    Did you read the datasheet on those prop chips? Each of them has 8 cores with simultaneous instruction processing. This thing can move a buttload of data.

    It can only run at a max of 80MHz, but it can process more than 53 times the instructions per cycle of the highest-end Intel i7 6-core monster.

    Even allowing for 4-opcode-per-cycle SIMD operations on all 6 cores, you still perform more than twice as many operations on this prop tower.

  12. Anon

    The individual processors on the Prop are actually not that powerful by themselves. Due to their design they top out at around 5 MIPs at 80Mhz if you are running a C program. If you code a assembly program(which has to be less than 2k total size) you can achieve 20 MIPs performance. A $6 ARM basically beats the Prop like a cheap drum.

    If you want to play around with real parallel processing check out the Xmos processors.

  13. He called it UltraSpark o_0 … seriously??? I wonder if he’ll call his OS Solariss…
    Interesting – but I would be more impressed if he had built it with some actual purpose or task in mind…

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.