Recovering Data For A Homemade Cray

September 8, 2011

In our hubris, we pat ourselves on the back when we’re able to pull data off our old SCSI drives. [Chris Fenton]’s attempt to get an OS for a homebrew Cray-1 puts us rightfully to shame.

Last year we saw [Chris]’ fully functional 1/10th scale Cray-1 supercomputer built around FPGA. While the reproduction was nearly cycle-accurate, [Chris] hasn’t had an opportunity to test out his system because of the lack of available Cray software. A former Cray employee heard of his plight and loaned an 80 Megabyte CDC 9877 disk pack to in the hope of getting some system software.

[Chris] acquired a monstrous 100 pound disk drive to read the disk pack, but after 30 years in storage a lot of electrical problems cropped up. Since reading the drive digitally proved to be an exercise in futility, [Chris] hit upon the idea of taking analog data straight from the read head. This left him with a magnetic image of the disk pack that was ready for some data analysis.

After the disk image was put up on the Internet, the very talented [Yngve AAdlandsvik] figured out the data, header, and error correction formats and sent [Chris] a Python script to tease bits from the analog image. While no one is quite sure what is on the disk pack provided by the Cray employee, [Chris] is remarkably close to bringing the Cray-1 OS back from the dead. There’s also a great research report [Chris] wrote as penance for access to the CDC disk drive. Any Hack A Day readers feel like looking over the data and possibly giving [Chris] a hand?

41 thoughts on “Recovering Data For A Homemade Cray”

drew says:

September 8, 2011 at 4:22 am

props to the both guys on doing this, things like the cray should not be forgotten to the ravages of time

Report comment

Reply
Eirinn says:

September 8, 2011 at 4:26 am

Wow that’s a lot of work! I really hope everything goes as planned :)

Report comment

Reply
LeJupp says:

September 8, 2011 at 4:36 am

I always wanted a dysfunctional Cray in my living room as couch. But I must admit that a functional Cray (albeit not working as a couch) has some charm too.

Report comment

Reply
fartface says:

September 8, 2011 at 4:54 am

Try pulling EBCDIC data off of a 9 track tape. or better yet a Bernulli Drive.

Report comment

Reply
1. leadacid says:
  
  September 8, 2011 at 11:05 am
  
  Ugh! Bernoulli drives. Man I’m glad I don’t have to deal with those any more.
  
  Report comment
  
  Reply
cde says:

September 8, 2011 at 4:56 am

10 bucks says it’s just a glorified hello world program.

Or it asks him if he would like to play a game.

Report comment

Reply
Peter says:

September 8, 2011 at 5:00 am

OK, this is just AWESOME! Pulling the data off the disk pack by recording and DSPing the analog signal from the head? Aside from being a metric a**load of work, these guys redefine persistence!

Nicely done.

Report comment

Reply
1. Joe says:
  
  September 9, 2011 at 6:46 pm
  
  I’m tempted to do this for DVDs. For ‘research and educational’ purposes, only of course. :)
  
  Report comment
  
  Reply
BiOzZ says:

September 8, 2011 at 5:27 am

how much data does that big mofo hold?

Report comment

Reply
1. ChrisG says:
  
  September 8, 2011 at 6:00 am
  
  second paragraph has your info: 80MB
  
  “A former Cray employee heard of his plight and loaned an 80 Megabyte CDC 9877 disk pack to in the hope of getting some system software.”
  
  Report comment
  
  Reply
Charlie says:

September 8, 2011 at 5:29 am

They may wish to contact the smithsonian or the National Crypto Museum. They have actual Cray’s. And they may have copies of the software as well. The National Crypto Museum is at Ft. Meade, MD and is run by the NSA.

Report comment

Reply
kaluce says:

September 8, 2011 at 5:47 am

Those crays were used for some serious number crunching back in the day. $2 says that he finds something pretty interesting on it.

Report comment

Reply
Martin says:

September 8, 2011 at 5:47 am

Have you checked this out ? These guys may be helpful, I’ll try and send Chris the info http://www.cray-cyber.org

Report comment

Reply
rasz says:

September 8, 2011 at 6:13 am

Author could contact some of the biggest Data recovery companies (like Ontrack for example) and pitch this project as a PR opportunity for them.

Report comment

Reply
SuperNuRd says:

September 8, 2011 at 6:23 am

You have redefined hard work! Keep on HACKING!

Report comment

Reply
dbear says:

September 8, 2011 at 8:04 am

This is really cool. Kudos to those guys for trying to keep computing history alive.

This is why trusting important data to digital media is really risky. If nobody bothers to port it over to new media then it can be lost forever. That’s one of the things that scares me about ebooks. Fifty years from now my great-great-grand kids will be able to read my books assuming the the acidic paper lasts that long. Will they be able to read a nook or kindle book?

Report comment

Reply
1. Ben H. says:
  
  September 8, 2011 at 8:16 am
  
  They will if the original files are made drm free. The actual formats are open and even if they fall out of style software can be easily written to read them.
  
  Report comment
  
  Reply
2. DanJ says:
  
  September 8, 2011 at 8:25 am
  
  Agreed, this is an amazing hack.
  
  You bring up a great point dbear but it’s far more than ebooks. Think about all the photos of people’s lives uploaded to websites that certainly won’t be around in 50 years. My better half has done an amazing reconstruction of family history based, in large part, on ancient photos that are still usable.
  
  People should consider how to permanently store the various parts of their digital lives from the software-of-the-day to the documentation of their personal lives. It may be interesting to someone else someday.
  
  Report comment
  
  Reply
  1. asheets says:
    
    September 8, 2011 at 10:41 am
    
    I personally keep my most valued pictures archived multiple ways, as follows:
    
    1) hardcopies
    2) CD
    3) DVD
    4) on an external EXT3 hard drive
    5) on my computer
    6) GMail Drive
    
    Report comment
    
    Reply
3. lwatcdr says:
  
  September 8, 2011 at 9:34 am
  
  I learned this lesson long ago. I was trying to find a way to read CTOS disks. I actually found the guy that was in charge of the CTOS format program. I thought I was home free. So I asked him what the format was. He had no idea. He had a list of register values he plugged into a controller! He had no idea what they actually did. To this day we keep some old machines around our office for just in case jobs.
  
  Report comment
  
  Reply
  1. asheets says:
    
    September 8, 2011 at 10:38 am
    
    From what I understand, NASA is a huge player on eBay, buying up old computers, 9-track drives, etc., so they can read back old data. A lot of collected data has never been read by human eyes (i.e. solar flux data from Pioneers 6-9) and is only now becoming of interest to researchers.
    
    Report comment
    
    Reply
Kevin Keith says:

September 8, 2011 at 8:21 am

this is amazing news! I had thought the original Cray OS was lost to history after it was largely replaced by UNICOS.

Report comment

Reply
philpem says:

September 8, 2011 at 8:22 am

OK, I’m going to feed this to my data analysis software (DiscFerret — see http://www.discferret.com) and see what I can come up with.

Sounds like fun!

Report comment

Reply
blurry says:

September 8, 2011 at 10:26 am

I guess one approach is to try to decipher how the data is stored on the media. From working on emulation of Bernoulli drives (read: Apple // disks), one popular technique that Woz used was group encoding (he called it nybblizing) data. The idea behind it was that no more than two 1’s or two 0’s could be next to each other on the media, to preserve data integrity. What was stored was 8-bit data where 8 bits represented 6 bits, and there was a firmware-based lookup table to translate it. Even more interesting was that each byte is xor’d against the previous byte — and the last byte serves as a sort of checksum digit. It’s a funky format, but one which is model-specific. The moral is: expect anything but data stored in the raw.

If you identify signatures that appear repeatedly, you might be able to locate the start of track/sector boundaries (if the disk is aligned in that manner) — this could help you take the physical data dump and translate it back to a logical model that you can decode more easily as a contiguous stream of data (like what linux DD outputs). Not sure what this canister drive did in terms of other data integrity checks, but if there is any embedded data integrity checking built in it is sure to be a very interesting ride .

Report comment

Reply
1. Joe says:
  
  September 9, 2011 at 6:56 pm
  
  A definite candidate for the 7400/discreet logic contest! There are some circuits that are very well met by 74HCxxx implementations. Did they use common chips like those or did they use some blackbox ASICs? You could make a wiring diagram and reverse engineer it into a version using more recent hardware. It would be cool to see a remake of some of the old drive controllers. :)
  
  Report comment
  
  Reply
Matthew says:

September 8, 2011 at 10:35 am

This post is just made of awesome. I love reading about projects like these.

Report comment

Reply
anyone says:

September 8, 2011 at 11:30 am

chris, i and most of hackaday readers would love to give you a hand here…however this is pretty much out of our league to say the least.

Report comment

Reply
cafeine says:

September 8, 2011 at 12:39 pm

instant feel of nostalgia….awesome project!

Report comment

Reply
Taylor Alexander says:

September 8, 2011 at 1:01 pm

Wow, killer! I really like to see people really getting into the hardware. Most people would just say “grr, it doesn’t work.” Or they’d try to rebuild the whole drive. But when you get down to the hardware, its just a read head getting some analog values, and a bunch of circuitry to interpret it. But that circuitry can be replaced by software, which is easier to tweak. If everything else has failed but the motors, you can still read it this way, so its perfect!

And this kind of thinking is exactly what is needed to troubleshoot just about anything, including things you’re building yourself. Write some code and its not working how you expected it to work? Break everything into its basic components and verify they work.

That is obvious, but its still a skill that many people lack. I’m getting better at it, and better than my friends at it, but there are some people out there like this guy that just nail it.

Report comment

Reply
xorpunk says:

September 8, 2011 at 5:57 pm

no encryption or compression, once the journal is reversed you script dump the entire thing and rebuild minding endianess..

I know..I know..if I knew what I was talking about I’d ‘just go do it’ for them..cause it’s like..so open community and stuff

Report comment

Reply
1. Joe says:
  
  September 9, 2011 at 7:02 pm
  
  On the Apple II there was no ‘encryption’ in the sense of DES or such, but there was tons of obfuscation that existed solely to make the hardware cheaper. The books like Apple DOS and CopyIIPlus’s manual have tons of information on this. Suffice it to say, that I expect if I was reversing this blindly (no hardware in front of me) and I know what character set they used (not needed but a damn good helper), I would be trying XOR-ing of groups of bits, looking for start/stop bits, and so on. Reading the patents from the 1980s would probably be more than a bit like cheating! ;) Of course, I really don’t even want to start this since I’m working on some other goodies to post here in the future.
  
  Report comment
  
  Reply
Philippe says:

September 8, 2011 at 6:02 pm

X.x

Report comment

Reply
medix says:

September 8, 2011 at 6:22 pm

I have a similar disk drive, though not from a CRAY1. It was out of an old DIGITAL workstation used digitization and image processing of old x-ray film.

I was about to throw it out, but now I think I just might have to keep it around. As far as I can tell, it’s still functional but I have no idea where to get the specs for the bus interface.

Report comment

Reply
Bert says:

September 9, 2011 at 7:06 am

If you just look at the readable ASCII data in the dumps, you will find loads of machine testing code written in APAL, which seems to be a Cray specific assembly-like language. There is a (part of a) APAL description as well amongst the data, which I’ve uploaded to pastebin: http://pastebin.com/aMSh7FLH

Report comment

Reply
1. Bert says:
  
  September 9, 2011 at 7:15 am
  
  Never mind, the programs on disk are more likely to be written in a BASIC-like language. But still, I think the APAL “guide” is pretty nice.
  
  Report comment
  
  Reply
2. Peter says:
  
  January 10, 2012 at 9:25 am
  
  ASCII?
  
  How quaint. I’m almost positive CRAY and CDC used their own, unique character set…
  
  Report comment
  
  Reply
3. Peter says:
  
  January 10, 2012 at 9:26 am
  
  Check the Computer History Museum? They have all sorts of artifacts and contacts.
  
  Report comment
  
  Reply
medix says:

September 9, 2011 at 8:30 am

Just found this (pertaining to the drive I have):

http://www.bitsavers.org/pdf/dec/disc/ra80/AA-M186B-TC_RA80_Maint.pdf

For anyone’s amusement..

Report comment

Reply
Herman Nelson says:

September 18, 2011 at 9:25 pm

Holy Smokes! I worked on those drives years ago. They were part of a Honeywell Level 6 system that I used to maintain and service.

+5 volts on a test point of the SGV card rings a bell for bad head alignment. Will have to look around for the books.

Report comment

Reply
HomelyPoet says:

October 27, 2011 at 4:20 pm

What about a writing the Army Corps of Engineers to get some time on an Magnetometer?
Or the Airforce…?
Navy…?

Report comment

Reply
Peter says:

January 10, 2012 at 9:45 am

I spent two summers working as an assembly line tech on hard drives like this at the DEC plant in Westfield, MA. Reading Chris’s report is like a journey back in time.

I hope Chris’s Cray has switches for the dead start panel:
http://ed-thelen.org/comp-hist/6600DeadStartPanel-t.jpg

And, I’m sure he’s already found this, but this manual: http://bitsavers.org/pdf/cray/2240004C-1977-Cray1.pdf
describes the “dead start” sequence, starting on page 3-44. Somewhat disturbingly, page 2-9 mentions a Data General Eclipse S-200 that “…provides control for system initialization.”
Hopefullly, he won’t have to design another FPGA that simulates the DG Eclipse!

Report comment

Reply