The 13.5 Million Core Computer

Having a dual- or quad-core CPU is not very exotic these days and CPUs with 12 or even 16 cores aren’t that rare. The Andromeda from Cerebras is a supercomputer with 13.5 million cores. The company claims it is one of the largest AI supercomputers ever built (but not the largest) and can perform 120 Petaflops of “dense compute.”

We aren’t sure about the methodology, but they also claim more than one exaflop of “AI computing.” The computer has a fabric backplane that can handle 96.8 terabits per second between nodes. According to a post on Extreme Tech, the core technology is a 3-plane wafer processor, WSE-2. One plane is for communications, one holds 40 GB of static RAM, and the math plane has 850,000 independent cores and 3.4 million floating point units.

The data is sent to the cores and collected by a bank of 64-core AMD EPYC 3 processors. Andromeda is optimized to handle sparse matrix computations. The company claims that the performance scales “almost linearly.” That is, as you double the number of cores used, you roughly half the total run time.

The machine is available for remote use and cost about $35 million to build. Since it uses 500 kW at peak run times, it isn’t free to operate, either. Extreme Tech notes that the Frontier computer at Oak Ridge National Labs is both larger and more precise, but it cost $600 million, so you’d expect it to be more capable.

Most homebrew “supercomputers” we see are more for learning how to work with clusters than trying to hit this sort of performance. Of course, if you have a modern graphics card, OpenCL and CUDA will let you do some of this, too, but at a much lesser scale.

55 thoughts on “The 13.5 Million Core Computer

  1. This oldie can remember a time when the Cray-1 seemed this exotic, this expensive, this power-hungry.

    Now I carry a device around in my pocket that is orders of magnitude faster, has orders of magnitude more memory and runs off a battery.

    I wonder if I will ever carry around something more powerful than Andromeda in my pocket?

      1. What’s logically next? Literally diamond style carbon? A “living” solid state build? Some more exotic combination rather than primarily elements? There are already structural limitations of extremely atomically close CPUs and GPUs and we could go more 3D with them but only to a point and that has it’s own limitations.

        The next question is what is it going to be able do specifically? Most supercomputers are extremely good at very specific things. Most of which we wouldn’t need to handle daily or simple tasks. Heck, a basic ARM can now at least do a pretty good job of things like turning on a light bulb.

          1. Fair play, but my counter point would be that everytime we design an optical input system capable of solving CAPTCHAs, they change the CAPTCHAs, and suddenly I’m left trying to find which of six images have two identical hieroglyphics in it, not seeing any, and wondering if this is how I discover that I’m an android.

    1. Not ‘static’ RAM though :) . I too have 16GB ‘dynamic’ DD4 RAM in my laptop…. And 64GB in my home workstation…. 16GB seems to be the sweet spot now for most machines whether Windoze or Linux. I remember the time I ‘upgraded’ from 64K to 256K of RAM in my DEC Rainbow… I also remember when a 10MB HDD was a ‘want’ but to expensive… so stuck with floppies…. Those were the days.

      1. Skipping 3 steps…

        Luxury!

        We had to thread our magnetic core by hand. After drawing the fine wire in the Hull makerspace forge from discarded Lucas electric parts found on the side of the road.

    1. Dual core is not really that exotic, it isn’t used much now but it isn’t exotic. What are you on about steam anyway? Are you on about the steam deck? If so that has 4 cores and 8 threads.

      If you mean that you don’t need 4 cores and only need 2 and it is just a marketing gimmick then you are just wrong. There is a huge difference in performance with extra cores and lots of applications and games now are able to make use of multiple cores.

  2. I guess it depends on how you define a core. e.g. if you count GPU cores as cores, then the IBM Summit has 23.6 million GPU cores (plus a paltry handful of 200k POWER9 CPU cores). If you want to complain that a GPU core is not a ‘real’ general purpose CPU core, then neither are the ‘cores’ in Cerberas’ WS wafers.

    1. If we look at nVidia’s naming of their CUDA “cores”, then one very quickly realizes that they actually talk about floating point units, not actual cores.

      If we look at AMD’s Streamed multi “processors”, then guess what. This is also just floating point units.

      Neither of these two “cores”/”processors” have any real associated parts that makes them qualify as a core or a processor for that matter. But rather just an execution unit within a larger processor/core.

      A generalized definition of a core would be a unit capable of executing arbitrary code. Ie, having execution units, registers/memory, and most importantly an instruction decoder.

      One can make arguments to if multiple decoders share execution units that they can be “multiple cores”. But simultaneous multi threading is often the more correct name. But this is rather debatable. Though AMD’s bulldoze CPUs tested the lengths of the definition, ending up in court over their processors having half as many cores as advertised, since they shared far too many instructions to the point that each of the cores inside the pair of cores were significantly limited in their performance by each other.

  3. I went into software because I was on a team that built a supercomputer, and while it was big and sexy, and fun to get pictures next to the racks of servers doing a burn-in, it struck me that most compute power in the world sits idle because it lacks software to really push it. I now ruthlessly push big iron to automate things that waste humanity. While at the same time lamenting that the bigger and more connected I make it, the more time I have to waste with layers of obfuscation, virtualiztion, and containerizing so a bug in a logger (actually many bugs at every layer) doesn’t make me violate the trust my customers put in us.

    supercomputers are inspiring in their potential, but I usually find the work assigned to them tends to lack any luster at all.

    1. I couldn’t find a definitive answer. The wikipedia entry for magnetic-core memory mentions nothing larger than 256K 36 bit words — about 9 megabits in a huge cabinet.

      If you consider ferroelectric RAM chips to be magnetic-core memory, then 16 megabit chips are available from Infineon.

  4. Just y’all wait; some clever gear-head is gonna micro-machine a fleet of Babbage Engines and put them on a chip.
    A microprocessor can only do so much heavy-lifting per instruction cycle, a Babbage can do oodles of math in a single stroke.

    1. I’ve thought the most practical way to build the first Analytical Engine would be to use wristwatch technology, getting the resulting system down to the size of a typical ATX sized PC case.

    1. Lol, much farther. That brought up a thought for me:
      Imagine what we could accomplish, if every gaming PC and mining rig ran these for an hour a day.
      Sell it as a way to personally participate in bettering humanity, by donating resources to support research. Hell, make it a tax break; it’ll be worth it.

      1. A lot of people did that at the start of 2020. Lots of stories about it here. IIRC there are only so many computing tasks that work well with this distributed model and the queue gor emptied oretty quickly…

  5. 500kW at 1 exoflop…is this the computing version of “Rollin’ Coal”?
    Brah, do you even COMPUTE?! Hahaha!

    Well, things can only get so small and still remain useful. We ran out of small, so now we’ll go more. More has its drawbacks and power consumption is one of them (for now).

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.