Recovering Data For A Homemade Cray

In our hubris, we pat ourselves on the back when we’re able to pull data off our old SCSI drives. [Chris Fenton]’s attempt to get an OS for a homebrew Cray-1 puts us rightfully to shame.

Last year we saw [Chris]’ fully functional 1/10th scale Cray-1 supercomputer built around FPGA. While the reproduction was nearly cycle-accurate, [Chris] hasn’t had an opportunity to test out his system because of the lack of available Cray software. A former Cray employee heard of his plight and loaned an 80 Megabyte CDC 9877 disk pack to in the hope of getting some system software.

[Chris] acquired a monstrous 100 pound disk drive to read the disk pack, but after 30 years in storage a lot of electrical problems cropped up. Since reading the drive digitally proved to be an exercise in futility, [Chris] hit upon the idea of taking analog data straight from the read head. This left him with a magnetic image of the disk pack that was ready for some data analysis.

After the disk image was put up on the Internet, the very talented [Yngve AAdlandsvik] figured out the data, header, and error correction formats and sent [Chris] a Python script to tease bits from the analog image. While no one is quite sure what is on the disk pack provided by the Cray employee, [Chris] is remarkably close to bringing the Cray-1 OS back from the dead. There’s also a great research report [Chris] wrote as penance for access to the CDC disk drive. Any Hack A Day readers feel like looking over the data and possibly giving [Chris] a hand?

41 thoughts on “Recovering Data For A Homemade Cray

  1. OK, this is just AWESOME! Pulling the data off the disk pack by recording and DSPing the analog signal from the head? Aside from being a metric a**load of work, these guys redefine persistence!

    Nicely done.

  2. They may wish to contact the smithsonian or the National Crypto Museum. They have actual Cray’s. And they may have copies of the software as well. The National Crypto Museum is at Ft. Meade, MD and is run by the NSA.

  3. This is really cool. Kudos to those guys for trying to keep computing history alive.

    This is why trusting important data to digital media is really risky. If nobody bothers to port it over to new media then it can be lost forever. That’s one of the things that scares me about ebooks. Fifty years from now my great-great-grand kids will be able to read my books assuming the the acidic paper lasts that long. Will they be able to read a nook or kindle book?

    1. Agreed, this is an amazing hack.

      You bring up a great point dbear but it’s far more than ebooks. Think about all the photos of people’s lives uploaded to websites that certainly won’t be around in 50 years. My better half has done an amazing reconstruction of family history based, in large part, on ancient photos that are still usable.

      People should consider how to permanently store the various parts of their digital lives from the software-of-the-day to the documentation of their personal lives. It may be interesting to someone else someday.

    2. I learned this lesson long ago. I was trying to find a way to read CTOS disks. I actually found the guy that was in charge of the CTOS format program. I thought I was home free. So I asked him what the format was. He had no idea. He had a list of register values he plugged into a controller! He had no idea what they actually did. To this day we keep some old machines around our office for just in case jobs.

      1. From what I understand, NASA is a huge player on eBay, buying up old computers, 9-track drives, etc., so they can read back old data. A lot of collected data has never been read by human eyes (i.e. solar flux data from Pioneers 6-9) and is only now becoming of interest to researchers.

  4. I guess one approach is to try to decipher how the data is stored on the media. From working on emulation of Bernoulli drives (read: Apple // disks), one popular technique that Woz used was group encoding (he called it nybblizing) data. The idea behind it was that no more than two 1’s or two 0’s could be next to each other on the media, to preserve data integrity. What was stored was 8-bit data where 8 bits represented 6 bits, and there was a firmware-based lookup table to translate it. Even more interesting was that each byte is xor’d against the previous byte — and the last byte serves as a sort of checksum digit. It’s a funky format, but one which is model-specific. The moral is: expect anything but data stored in the raw.

    If you identify signatures that appear repeatedly, you might be able to locate the start of track/sector boundaries (if the disk is aligned in that manner) — this could help you take the physical data dump and translate it back to a logical model that you can decode more easily as a contiguous stream of data (like what linux DD outputs). Not sure what this canister drive did in terms of other data integrity checks, but if there is any embedded data integrity checking built in it is sure to be a very interesting ride .

    1. A definite candidate for the 7400/discreet logic contest! There are some circuits that are very well met by 74HCxxx implementations. Did they use common chips like those or did they use some blackbox ASICs? You could make a wiring diagram and reverse engineer it into a version using more recent hardware. It would be cool to see a remake of some of the old drive controllers. :)

  5. Wow, killer! I really like to see people really getting into the hardware. Most people would just say “grr, it doesn’t work.” Or they’d try to rebuild the whole drive. But when you get down to the hardware, its just a read head getting some analog values, and a bunch of circuitry to interpret it. But that circuitry can be replaced by software, which is easier to tweak. If everything else has failed but the motors, you can still read it this way, so its perfect!

    And this kind of thinking is exactly what is needed to troubleshoot just about anything, including things you’re building yourself. Write some code and its not working how you expected it to work? Break everything into its basic components and verify they work.

    That is obvious, but its still a skill that many people lack. I’m getting better at it, and better than my friends at it, but there are some people out there like this guy that just nail it.

  6. no encryption or compression, once the journal is reversed you script dump the entire thing and rebuild minding endianess..

    I know..I know..if I knew what I was talking about I’d ‘just go do it’ for them..cause it’s like..so open community and stuff

    1. On the Apple II there was no ‘encryption’ in the sense of DES or such, but there was tons of obfuscation that existed solely to make the hardware cheaper. The books like Apple DOS and CopyIIPlus’s manual have tons of information on this. Suffice it to say, that I expect if I was reversing this blindly (no hardware in front of me) and I know what character set they used (not needed but a damn good helper), I would be trying XOR-ing of groups of bits, looking for start/stop bits, and so on. Reading the patents from the 1980s would probably be more than a bit like cheating! ;) Of course, I really don’t even want to start this since I’m working on some other goodies to post here in the future.

  7. I have a similar disk drive, though not from a CRAY1. It was out of an old DIGITAL workstation used digitization and image processing of old x-ray film.

    I was about to throw it out, but now I think I just might have to keep it around. As far as I can tell, it’s still functional but I have no idea where to get the specs for the bus interface.

  8. Holy Smokes! I worked on those drives years ago. They were part of a Honeywell Level 6 system that I used to maintain and service.

    +5 volts on a test point of the SGV card rings a bell for bad head alignment. Will have to look around for the books.

  9. I spent two summers working as an assembly line tech on hard drives like this at the DEC plant in Westfield, MA. Reading Chris’s report is like a journey back in time.

    I hope Chris’s Cray has switches for the dead start panel:
    http://ed-thelen.org/comp-hist/6600DeadStartPanel-t.jpg

    And, I’m sure he’s already found this, but this manual: http://bitsavers.org/pdf/cray/2240004C-1977-Cray1.pdf
    describes the “dead start” sequence, starting on page 3-44. Somewhat disturbingly, page 2-9 mentions a Data General Eclipse S-200 that “…provides control for system initialization.”
    Hopefullly, he won’t have to design another FPGA that simulates the DG Eclipse!

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.