New Cray Will Reach 1.5 ExaFLOPS

It wasn’t that long ago when hard drives that boasted a terabyte of capacity were novel. But impressive though the tera- prefix is, beyond that is peta and even further is exa — as in petabyte and exabyte. A common i7 CPU currently clocks in at about 60 gigaflops (floating point operations per second). Respectable, but today’s supercomputers routinely turn in sustained rates in the petaflop range, with some even faster. The Department of Energy announced they were turning to Cray to provide three exascale computers — that is, computers that can reach an exaflop or more. The latest of these, El Capitan, is slated to reach 1.5 exaFLOPS and will reside at Lawrence Livermore National Laboratories.

The $600 million price tag for El Capitan seems pretty reasonable for a supercomputer. After all, a Cray I could only do 160 megaflops and cost nearly $8 million in 1977, or about $33 million in today’s money. So about 20 times the cost gets them over 9,000 times the compute power.

The computes use Cray’s Shasta architecture. Of course, at some point, it isn’t the computing but the communications which provides the limiting factor. Cray’s Slingshot connects the pieces of the computer together. The information about it on Cray’s website isn’t very technical, but we were struck with this passage:

Additionally, Shasta supports processors well over 500 watts, eliminating the need to do forklift upgrades of system infrastructure to accommodate higher-power processors.

We know we hate it when we want to upgrade our desktop and have to start up the forklift. Cray, of course, has a long history with supercomputers. You probably have a pretty good supercomputer hiding in your graphics card, by the way.

61 thoughts on “New Cray Will Reach 1.5 ExaFLOPS

    1. Does that mean I now have to do those java 8 tests that all recruiters try to force me to do. Ignoring the fact I already passed uni and have 20 years coding experience in java?

    1. From the April 2018 press release from DoE site:

      · Identifying next-generation materials

      · Deciphering high-energy physics data

      · Combating cancer

      · Accelerating industrial product design and reducing cost-to-market

      · Evaluating options for nuclear security

      Realistically, it’s up to the individual research facilities that operate them.

      https://www.energy.gov/articles/secretary-energy-rick-perry-announces-18-billion-initiative-new-supercomputers

    2. The US complies with the Comprehensive Nuclear-Test-Ban Treaty, thus the test explosions are simulated rather than exploded. Simulating the explosions requires ever more computing power to improve accuracy.

    1. I’m pretty sure they run on these machines some very custom OS designed to compile the program to be run, deliver it to all processing nodes and then just wait for interrupts and display error messages. Probably some version of UNIX developed for past 40 years. I wonder if that supercomputer would proudly proclaim that it’s not a teletype…

  1. Literally NO ONE has a supercomputer “…hiding in your graphics card…”

    The word “Supercomputer” has a definition.You aren’t being cute when you intentionally misuse it.

    A the top 500 fastest computers are Supercomputers. Period. End of statement. No exceptions. (Based on a variety of metrics)

    If you were to use “Olympics” to refer to a children’s game at summer camp, it would be clear you weren’t talking about the ACTUAL Olympics. No confusion.

    But if you used “Olympics” to refer to some World-Tier sporting event that was NOT part of the Olympics, it would be wrong. You would be deceiving your audience and spreading/perpetuating confusion.

    This EXACT type of use is why most people have no clue what terms like “Cloud”, “Crypto”, “Blockchain”, and others ACTUALLY mean. People misuse them to make their statements sound more impressive.

    This is not a statement against languages changing over time. That happens. We all have to deal with it.
    This is about people misusing words to make their statements sound more important/impressive.

    1. Not to get into an argument or anything like that, but for a long time I thought that a supercomputer was a computer that deviates from a normal computer either by being massively parallel and/or aggressive overclocking and cooling. …sort of like a supercharger for ICE engines, forcing the machine to run at “unnatural” power and paying for it by aggressively creating the needed environment for such “magic” to happen. When one goes to Wikipedia, it seems clear that the definition is a bit vague. :-) https://en.wikipedia.org/wiki/Supercomputer

      1. There is a theory which states that if ever anyone discovers exactly what the Universe is for and why it is here, it will instantly disappear and be replaced by something even more bizarre and inexplicable.

        There is another theory mentioned, which states that this has already happened.

    2. There are many Cray systems that don’t appear on top500. Some for secrecy, and others because the top500 benchmarks are an arguably bad measure of real world performance. The only reason top500 exists is for owners to give their investors bragging rights for their big outlay and encourage future funding. HPL runs are only good for post installation stress testing and turning electricity into heat.

      1. The Top500 list can be thought of as largely application specific.
        And the Linpack benchmark it uses is though better then no benchmark at all, even if it practically only looks at floating point operations.

        If one’s application is largely float heavy, then it can be a decent “estimate”, and the ratio between the linpack benchmark and performance of a given float heavy application would be expected to be roughly in the same neighborhood. (Ie, if the program runs on a system with 10 times higher linpack score, it isn’t unfair to expect a similar increase in program performance, if nothing else bottlenecks it.)

        If one’s application is more heavily using other operations, like bitwise logic, if statements (conditional logic), or literally anything else under the sun. Like vector calculations, as well as fractions, among more application specific accelerators. Like if you’r application does a multiply, followed by an addition, bitshift, and an XOR operation, and makes this sequence most of the time, then specific hardware for this can likely do this in a single cycle. Thereby greatly improving application performance.

        There are more application specific benchmarks around, but they aren’t generally as prestigious as the linpack one used by Top500.org

        1. That’s why some sites boycott HPL. Stuff like genomics and NN training depends less on single thread FLOPS than node-node bandwidth and IO bandwidth, and systems tailored to those workloads can perform worse on HPL than other systems that are not as effective at these specialized workloads.

    3. You’re correct, but I think the point that they are making is that GPGPU architectures are the notional descendants of 70s to 90s-era parallel-vector supercomputer architectures.

      It was just poorly phrased.

      Certainly as someone who programmed Y-MP’s (former employee of SGI/Cray here), when I look at CUDA it’s hard not to notice the similarities.

      As for the TOP500 list, if you actually think that represents the biggest supercomputers in the world, you’re mistaken. In the 90s, as a rule of thumb, the real #1 (always owned by government) was around 10x the size/speed/capacity of the #1 on the list. I can’t speak to what it is now as I’m out of that industry.

  2. So, set it bitcoin mining, when is the breakeven?

    (Probably hard to say, a few seconds on Google makes it clear that mining is all integer and this is a specialised floating point machine, but taking a number from the internet (1300 FLOP / Hash) suggests 45 BC per day, $450,000 per day, so it pays for itself in about 4 years. All numbers from an online calculator, I have not checked anything including how many zeros are in an exaflop)

    1. They claim 1 T Flop for the Core i9 extreme, that’s 1 x 10^12 FLOPS
      This machine should do 1.2 E Flops, ie 1.2 x 10^18.

      So You are a bit out, it is 1,200,000 x the performance of the fastest desktop CPU.
      Or 4,000,000 of the Core i5 that I have here.

      Did you perhaps forget that there are petaflops in the spectrum too?

    2. Even if you assume the number you used was accurate, you probably need to increase the number of home PCs by two or three orders of magnitude to reach the same performance of the Cray just because of the massive latency difference between all your processors being in one room vs scattered around the world.

    3. A ‘botnet’ such as Charity Engine, that just two days ago cracked the number 42 (as sum of three cubes): https://youtu.be/zyG8Vlw5aAw?t=1m44s

      @Andy: I’m ashamed that I indeed skipped an Si prefix in my mental calculations. I based it on an i5 of ~10Gflops. I didn’t include GPU in he calculation on purpose because almost no one has a GPU capable of 32 bit floats.

  3. “Forklift upgrade” is datacenter-speak for having to rip and replace an existing system out in order to upgrade. The alternative is a modular system that allows replacement of individual components throughout the system lifecycle. It needn’t involve a literal forklift, and is often times applied to software projects too.

  4. 1.5 exaflops is not 9,000 times 160 megaflops.

    1.5 exaflops = 1,500 petaflops = 1,500,000 teraflops = 1,500,000,000 gigaflops = 1,500,000,000,000 megaflops.

    Although that “little” arithmetic error isn’t even the main problem with you’re 9,000 factor. You can’t meaningfully compare flops between architectures. If you could, benchmarking wouldn’t be such a big deal. How many bits in the instruction word? How many bits in the math function? There’s a thousand variables besides flops.

Leave a Reply to Jorge GodinezCancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.