Help [Chris] Boot His Cray-1 Supercomputer

[Chris Fenton] needs your help. After constructing a 1/10th scale, cycle accurate Cray-1 supercomputer and finding a disk with Cray software on it, he’s ready to start loading the OS. There’s a small problem, though: no one knows how to boot the thing.

[Chris] posted a disk image for a Cray-1/X-MP with the help of the people at archive.org. Now he needs your help – if you think you can reverse engineer the file system, [Chris] will pay handsomely with a miniature model of a Cray printed on his MakerBot. In any case, it seems like a fun challenge.

From our quick glance at the disk image with a HEX editor, it looks like [Chris] has something special on his hands. We see a few references to “Cray memory and registers,” as well as “IOP-0 Kernel, Version 4.2.2” in the header along with a few dates referencing July of 1989.  This is consistent with the history of the source disk pack. If you think you’ve got what it takes to reverse engineer the file system of a Cray-1, this is your chance.

72 thoughts on “Help [Chris] Boot His Cray-1 Supercomputer

    1. I know with a lot of the old computers, they had ferrite cores which held magnetic charges (similar to magnetic core memory in operation, but there would only be like 20 or so cores spread out randomly on each the board, not a big grid of them) — these held configuration bits, and special pre-boot instructions, even firmware that allowed the machines to even read from disk…

      If this has any of those, and they don’t have the configuration bits set properly… well, good luck with that…

  1. This is above my level of expertise by several levels, but I do hope to see this thing get up and running! It’s exciting to see how much old technology can be reduced in size when using the newer technologies of today.

    1. Actually, it could be reduced much more than that if there was a proper emulator that could be ported to many Operating Systems. Even a IPhone or Android could do it much faster and of course many, many CPUs today could do it rather easily.

  2. Someone posted this in a previous article:

    “Charlie says:
    September 8, 2011 at 5:29 am
    They may wish to contact the smithsonian or the National Crypto Museum. They have actual Cray’s. And they may have copies of the software as well. The National Crypto Museum is at Ft. Meade, MD and is run by the NSA.”

    I wonder if Chris seen it or attempted to contact them

  3. “3. This disk was probably used to boot the Cray’s I/O Subsystem (IOS) …”

    Sue the pants off Cisco and Apple for infringing the trademark, then pay an army of developers to do the work:)

  4. Did a quick wikipedia browse.
    The Cray-1 was a 64 bit system from 1975 running at 80 MHz (single core) and did 136 MFLOPS.

    By comparison, an intel core i7 980 XE does 109 GFLOPS (with six cores) and a Radeon 5970 will do 928 GFLOPS.
    So a 5970 is about 7000 times faster and the i7 is 800 times faster (133 per core).

    Personally, I think only being beat by a factor of 133 per core is not bad considering the other guys have had 35 years to work on their systems.

    1. Scary to think that one can buy so much computing power for so little isn’t it.
      Now if we could just get some of the old apps that they ran on those Crays and compile them for an I7.

      1. Recompile old Cray codes on a modern PC? You might be in for a surprise: the PC would still surely be fast, but don’t expect it to be by the ratio of clock speeds.

        Crays were amazing pieces of hardware, designed to run CERTAIN codes extremely fast. The system was optimized for this, sometimes even having instructions which were for specific customer needs (eg. POPCOUNT, the “NSA Canonical Instruction”).

        The source you’d see would be in Fortran 66, and would be full of Cray compiler directives giving hints to the compiler about which bits to focus on optimizing. Modern compilers do an excellent job of general optimization, but the Cray ones knew the exact config of the machine down to the memory layout and reservation bits on individual execution units, so with sufficient hints and good coding, you could keep the execution units close to completely used.

        You might also find some bizarre code due to the floating point format these old machines used, which preceeded IEEE 754 by many years. As I recall, it had two different zeros, because Seymour Cray didn’t want to spend a single gate delay fixing that. This results in… interesting code. :)

        If Chris wants to contact me directly, I still have friends in or close to Cray, and might be able to find an old timer with software or information to assist.

        He will also find some of the original system manuals on bitsavers.org, if he doesn’t already have them.

        1. Cray’s floating point format didn’t need to be fixed — it wasn’t broken. At least, not that way. The problem Cray wasn’t willing to “spend a single gate delay fixing” was that his floating point arithmetic implementation was less precise than many would have liked. He believed, correctly, that people were more interested in getting a good answer than waiting around for a slightly more precise one.

          The IEEE floating point standard (IEEE 754) has two representations of zero: 0x00000000 and 0x80000000, sometimes referred to as “plus zero” and “minus zero”. Cray switched to two’s complement representation of integers from one’s complement (after he understood it) precisely because 2’s complement DOESN’T have two representations of zero, unlike 1’s complement.

      2. Ian does not have it quite right with respect to arithmetic. The CDC 6000/7000 series computers were one’s complement, with two zeros. The Cray computers were two’s complement. When asked about this, Seymour replied that he just didn’t understand two’s complement when he designed the 6600.

        The hard part about implementing floating point is that you need the logic diagrams, not metely the reference manual, in order to get an exact result in every case. (Or at least a software simulation written by someone who knew how it worked.)

    2. Cray (along w/all supercomputers) are designed for parallel processing and vector processing. Desktop P.C.’s, even today, are still designed for scalar processing. Even multi-core cpu’s are still doing scalar processing. No one would buy time on a system made of i7’s that can only do scalar and nothing else.

      Of course symmetric multi-processing on a linux machine works if the apps are written specifically for that purpose.

      I guess what I’m saying is, the Cray was always known for running parallel and vector processing. To port code onto to an i7 would not even work. It would have to be re-written from scratch to the solve the same problem in a different (read scalar) way.

      1. The other thing to bear in mind is that talking about the Cray CPU exclusively misses the point. It was the memory system on the Cray which was truly an amazing piece of engineering, and the ability to take large chunks of data and stream it into through the vector registers efficiently. This is why I said clock speed comparisons are meaningless.

        The Cray’s commercial secret was that it provided a very balanced system across CPU, memory *and* I/O. They also delivered compilers which allowed people to write real codes which got “reasonably” (insert your own definition here) close to the claimed peak performances. The other secret was that it’s scalar performance was pretty damn good, too. Other vendors provided systems which looked good on paper, but had architectural defects which meant you could never get close to their claimed speed running real workloads.

        This is why the graveyard of old supercomputer vendors has so many headstones.

        IMO, the most interesting Cray was the Cray-II, which had all of it’s PCB’s immersed in flourinert. The case itself was basically a big glass barrel, which would bubble and gurgle in the datacenter. It was cooled by this amazing-looking decorative “fountain”.

        http://www.visionfutur.com/img/histoire/cray2b.jpg
        http://www.craysupercomputers.com/images/Systems/Cray2/Cray2_019_8Processor_NERSC.jpg

        You can see the amazing point-to-point wiring between the PCB’s here:

        http://discovermagazine.com/2004/mar/digital-cache/museum_cray_sm.jpg

        The Cray-1 also had this wiring, and it drove the C-shape of the unit, when viewed from the top. This wiring was all done by hand, and the length was optimized to give predictable propagation delay of the signals.

        1. I mostly agree with your statement: the Cray-1 memory system was awesome (and very expensive — one of the key employees once tole me they didn’t need a cache because main memory ran at the speed of everyone else’s cache). But the Cray was often limited by its clock speed, because it could only issue one instruction per clock cycle (and even though that might be a vector instruction, it could only initiate vector operations at the rate of one flop per cycle (per fp unit). So the clock speed mattered a lot–Cray’s genius was in part getting the balance right.

      2. Cronjob, you have point, but you are missing something. First, multicore systems are multi-processing, so they do make calculations in parallel. Second, vector processing was a trick to get arithmetic calculations that took multiple cycles to complete to be able to pipeline and return results once per cycle per functional unit. It is a trick whose time is past. Today, most high speed processors are using all the tricks at the same time, far more effectively than the Cray systems ever did. I am not familiar with the I7 particularly, but I did look at the AMD processors a couple of years ago, and there were multiple functional units, pipelining, out-of-order instruction retirement with mutliple processors. All the same tricks the Cray used, but now we no longer care about them as we did in the day of the Cray. The compilers today take care of all the tricks for us.

        At the end of the day, supercomputers were measured in FLOPS because solving a particular problem required a particular number of floating point calculations. So, all told, a GFLOPS is a GFLOPS. The scalar performance on a Cray is still worse then the scalar performance of todays systems.

      3. Hey people , OpenCL , CUDA and similar technologies do exactly that. Your average $100 graphics card eats any Cray machine for breakfast. GPU’s are very powerful beasts optimized for parallel execution.

  5. similar to Grahams comment, i’ll be honest, this is above my level of knowledge as well. still, will be interesting to see it work and would be great to see more serious suggestions, rather than half-hearted jokes =/ keep at it chris, i hope ya get it working :D

  6. Looking through the disk image looking for strings is interesting. For example, this made me laugh: (edited because it triggered the antispam otherwise)

    &* Leading Edge Technologies *
    &Preventive Maintenance (TISM):
    &Mon 07:15 – 09:15, Tue 17:00 – 21:00, Fri 07:15 – 09:15.
    &Mon 07:15 – 09:15, Tue 17:00 – 21:00, Fri 07:15 – 09:15.
    *****
    &* This is M-M-Max MP, here … PARTY, PARTY, PARTY ! *
    *****
    %* This is M-M-Max MP, here … Hey, go home … NOW *
    ****
    ****
    %* This is M-M-Max MP, here … sorry, no NEWS today *
    ****

  7. Do they not have Cray’s running in one of the Computer Museums. I did a quick Web search but couldn’t find the answer. I’ll re-post your request to a few computer friends working within the IT industry in Ireland. Might be worth contacting the following museum websites to see if they have anyone still alive who worked on one. Bletchley Park in England might be the best spot, due to the fact they are re-building Colossus the code breaking machine form WWII.

    http://www.old-computers.com/museum/computer.asp?c=991&st=1

    http://www.computerhistory.org/brochures/companies.php?alpha=a-c&company=com-42b9d5d68b216#

    http://www.tnmoc.org/

    Best of luck with the project would be interesting to see it running.

  8. The Cray computer had a front end computer that handled all I/O for the system. The front end system consisted of four processors, each called an IOP. The first processor, called IOP-0 handled the boot process and console I/O. The other 3 IOPs each had an I/O channel to the Cray and to peripheral devices. The eternal I/O channels typically were connected to the I/O channels of yet another computer, usually either a VAX, IBM or Sun system. The IOP used a completely different instruction set than the Cray, and was integer only, while the Cray tended to focus on floating point operations.

    1. The earliest Cray-1 computers did not have the I/O processors. They were controlled through a Data General Nova computer, which is why the I/O subsystem used DG peripherals.

    1. @xorpunk
      If it’s so “simple”, then why don’t you do it? Oh yeah, that’s right, you just like running your mouth about how you’re a great hacker without actually doing anything.

    1. Not quite. The EL had a very different IOP, which was basically a 6U VME backplane near the top side of the unit. That backplane contained a number of 6U VME boards purchased from OEMs (with the sole exception of one – see below), the primary one of which was a Huricon (sp?) 68K based board. That board had a SCSI disk which booted it, then it initialized the Cray “Y1 bus” VME board which connected the IOP to the mainframe itself. The Y1 bus allowed the IOP to boot the mainframe proper, by reseting the system and pushing down the boot image. The IOP then turned into a proper I/O process, using the SMD controller and disks to provide I/O to the mainframe. The EL92, thankfully, introduced SCSI drives and got rid of the god-awful SMD.

      Remember that the EL was an offshoot of the XMS, which was a Cray “almost clone” produced by a company called Supertek, and acquired in the early 90’s. Also notable about the EL92 was that it was the first Cray which shipped with a binary OS distribution, and did not require the customer to obtain an incredibly expensive source license from AT&T. At the Uni when I purchased one around 1992, the need to do that could have killed the entire deal.

      I had a working EL92 in my garage, but had to dispose of it some years ago (spousal pressure). I still have most of the boards, disks, tapes etc. However I am unsure they’d be especially useful for this project.

      They’re still amazing pieces of engineering. Later I worked for SGI, which acquired Cray, and I still think both companies show how having the most amazing technology is irrelevant, if you have a management who can’t sell them. Look at the sea of mediocrity we live in these days.

      1. Are you the Ian Farquhar who had a Hanimex Pencil II that he was interested in getting preserved in MESS? If so, would you mind popping back into your thread? It would seem you came back, apologized for having abandoned the thread for so long, and then abandoned it again…

  9. The very first Cray-1’s were delivered with no operating system at all. CRI expected customers to roll their own OS, which they did.

    The operating system this will likely run is COS: the Cray Operating System. I have found claims online that the ‘1 also ran UNICOS, but I am unconvinced that’s true.

    In reality, a significant number of Crays were actually closer to embedded systems than they were mainframes. This especially applied when they were used for SIGINT, which is where 40% of them (prior to the end of the cold war) ended up. Even CRI itself didn’t know where some of them were installed (I was told a few were delivered to a supermarket car park late at night in shipping crates, and CRI was told to return two weeks later to retrieve the empty crates).

    Anyway, don’t worry about it, Hackjack. But better hurry, I think you’re missing an episode of the Kardashians. That seems to be at your intellectual and emotional level.

  10. and his laptop probably has just as much power as yours, your angry, dont be jealous now that he has started such a challenging project, his win7 laptop??? oh he just hit the on button…

    1. it works like this:
      you build the hardware to the clock specs and make it speak the same assembly, you get the OS from the original cray-1…but dont know how to get it to load the OS.
      the problem isnt that he doesnt know how to power it up, there is a lot more to starting a computer than just pushing the power button.

      1. I didn’t say that he isn’t able to find the power button… But I don’t see you you can build an entire computer according to some specification without. By building this he probably knows how the machine bootstraps.

        So what exactly is missing? If the OS image is known to have worked, is it the ROM contents for a complete boot that are missing?

        1. Having thought about this some more, I realize that following the hardware specs isn’t enough. The piece that is missing is that the IOP-0 processor has access to the main memory of the Cray. The boot process consisted of the IOP-0 writing the OS directly to the Cray main memory.

          As it happens, the Cray Operating System (COS) had a trace ring buffer where OS events were written. In this way when a crash occurred you could go back and look at the event that led up to the crash. The buffer was fairly large but due to the speeds of the Cray it only held a few milliseconds of data. A favorite trick the operators would do would be to use the IOP console to display this memory on the console. Because it took much longer to display the memory than to roll the buffer, as long as the Cray was running the display would be constantly changing. You could see instantly when there was a crash because this screen would suddenly stop updating.

      2. The main problem is that I don’t have a spec for the Cray X-MP IOP, so I don’t know what the deadstart sequence for an IOP is. The IOP may have even had some sort of ROM or something in it for all I know. At some level, the IOP loaded up my image, executed some code somewhere in it, and then shoveled the COS image into main CPU memory (where the thing I built could execute it). Since the main CPU (running COS) communicated with the IOS using some sort of API, I’m kinda-sorta hoping to replace the IOS with some little board running Linux, and communicate with the main CPU that way.

        tldr – I need to figure out the filesystem so I can try to extract code to run on the main CPU, not code that runs on the IOP (which I haven’t implemented yet).

      3. Chris,

        The I/O subsystem (not IOP) was a separate cabinet all by itself. You can see this in their site planning module:

        http://www.bitsavers.org/pdf/cray/CRAY-1_PrelimSitePlanning.pdf

        It definitely did have it’s own CPU (p16 – in the diagram).

        I’m not sure, but you might also have to emulate the maintenance control unit, which is a DG Eclipse S-200.

        But you’re right: I can’t see much in the way of documentation of the boot process in their system manual, which is a surprising lack:

        http://www.bitsavers.org/pdf/cray/2240004C_CRAY-1_Hardware_Reference_Nov77.pdf
        http://www.bitsavers.org/pdf/cray/2240004C-1977-Cray1.pdf

        I’d say it’s a good guess that the I/O subsystem boots the mainframe (like the EL’s IOP did), but it’s not actually specified anywhere I can find either. Hmmm….

        What you need is this publication:

        http://www.computinghistory.org.uk/det/2279/Cray-X-MP-Cray-1-I-O-Subsystem-%28IOS%29-Operators-Guide/

      4. No ROM in an IOP. It just reads a short data block from the deadstart device, then starts executing it. The input instruction is just designed into the initialization logic.

      5. No need for the Data General code, by the way. The I/O Subsystem took the place of the DG Maintenance Control Unit. If you have the IOS software, the DG is out of the picture.

      6. The boot process is explained sufficiently for a programmer. The initial code is stored in the memory beginning at address zero. The first few words are the initial exchange package, which is loaded when the input channel which loaded the memory contents disconnects. All that’s missing is some of the detail about the channel controls.

  11. i think i may be able to really help, i know someone on ps home that actually used to work on those machines. if i see them this week i will ask them if they have a bootloader that you can have. they kinda owe me a favour….

  12. Impressed with all the “insiders” and well informed people are offering their help.

    Sorta restores faith in humanity… if only nerds could have some greater say in the way the world was run.

  13. Does anyone know where this disk actually came from?
    I keep seeing these lone words in a sea of zeroes; LUDWIG DANIEL EXCAVATE MATERIALS SONICV PICTURESQUE..
    Codenames? Are there government secrets in this image?
    Maybe it knows who killed Kennedy..

    1. Bear in mind that if the disk image is a raw image of the entire disk, it will have a lot of data in it which are merely remnants of files from past use not necessarily part of the COS system last loaded on the disk.

  14. I just did a little snooping around, and apparently the document you need that describes the IOP is designated HR-0030. I googles it and it looks like the Centre for Computing History has a copy. You might be able to get access to it from them. Worth a shot.

  15. Have you asked on alt.folklore.computers? The Usenet group? Half of the folk there probably KNEW Seymour Cray. If he’s still alive, then his old computer teacher can tell you what Seymour’s homework looked like.

    It’s a great group, populated largely with the 70-year olds who commanded REAL computers! Dennis M Richie himself was a regular there, before he kicked the bit-bucket.

    You’ll quite probably end up with more help and information than you could ever use, all sorts of wierd old computers have been discussed in the past. There’s more old computer information mouldering in these people’s garages than will ever be on the web. Get ’em while they’re still warm!

    Also of course a great example of Usenet > Web and Web forums. I love Usenet.

  16. The IOS code on the disk image knows what the file system looks like, so you need an IOS emulator to run that. You also need to emulate the disk so you map the I/O requests to the disk image.

Leave a Reply to zwergCancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.