Ask Hackaday: Why Did GitHub Ship All Our Software Off To The Arctic?

July 29, 2020

If you’ve logged onto GitHub recently and you’re an active user, you might have noticed a new badge on your profile: “Arctic Code Vault Contributor”. Sounds pretty awesome right? But whose code got archived in this vault, how is it being stored, and what’s the point?

They Froze My Computer!

On February 2nd, GitHub took a snapshot of every public repository that met any one of the following criteria:

Activity between Nov 13th, 2019 and February 2nd, 2020
At least one star and new commits between February 2nd, 2019 and February 2nd, 2020
250 or more stars

Then they traveled to Svalbard, found a decommissioned coal mine, and archived the code in deep storage underground – but not before they made a very cinematic video about it.

How It Works

For the combination of longevity, price and density, GitHub chose film storage, provided by piql.

There’s nothing too remarkable about the storage medium: the tarball of each repository is encoded on standard silver halide film as a 2d barcode, which is distributed across frames of 8.8 million pixels each (roughly 4K). Whilst officially rated for 500, the film should last at least 1000 years.

You might imagine that all of GitHub’s public repositories would take up a lot of space when stored on film, but the data turns out to only be 21TB when compressed – this means the whole archive fits comfortably in a shipping container.

Each reel starts with slides containing an un-encoded human readable text guide in multiple languages, explaining to future humanity how the archive works. If you have five minutes, reading the guide and how GitHub explains the archive to whoever discovers it is good fun. It’s interesting to see the range of future knowledge the guide caters to — it starts by explaining in very basic terms what computers and software are, despite the fact that de-compression software would be required to use any of the archive. To bridge this gap, they are also providing a “Tech Tree”, a comprehensive guide to modern software, compilation, encoding, compression etc. Interestingly, whilst the introductory guide is open source, the Tech Tree does not appear to be.

But the question bigger than how GitHub did it is why did they do it?

Why?

The mission of the GitHub Archive Program is to preserve open source software for future generations.

GitHub talks about two reasons for preserving software like this: historical curiosity and disaster. Let’s talk about historical curiosity first.

There is an argument that preserving software is essential to preserving our cultural heritage. This is an easily bought argument, as even if you’re in the camp that believes there’s nothing artistic about a bunch of ones and zeros, it can’t be denied that software is a platform and medium for an incredibly diverse amount of modern culture.

GitHub also cites past examples of important technical information being lost to history, such as the search for the blueprints of the Saturn V, or the discovery of the Roman mortar which built the Pantheon. But data storage, backup, and networks have evolved significantly since Saturn V’s blueprints were produced. Today people frequently quip, “once it’s on the internet, it’s there forever”. What do you reckon? Do you think the argument that software (or rather, the subset of software which lives in public GitHub repos) could be easily lost in 2020+ is valid?

Whatever your opinion, simply preserving open source software on long timescales is already being done by many other organisations. And it doesn’t require an arctic bunker. For that we have to consider GitHub’s second motive: a large scale disaster.

If Something Goes Boom

We can’t predict what apocalyptic disasters the future may bring – that’s sort of the point. But if humanity gets into a fix, would a code vault be useful?

Firstly, let’s get something straight: in order for us to need to use a code archive buried deep in Svalbard, something needs to have gone really, really, wrong. Wrong enough that things like softwareheritage.org, Wayback Machine, and countless other “conventional” backups aren’t working. So this would be a disaster that has wiped out the majority of our digital infrastructure, including worldwide redundancy backups and networks, requiring us to rebuild things from the ground up.

This begs the question: if we were to rebuild our digital world, would we make a carbon copy of what already exists, or would we rebuild from scratch? There are two sides to this coin: could we rebuild our existing systems, and would we want to rebuild our existing systems.

Tackling the former first: modern software is built upon many, many layers of abstraction. In a post-apocalyptic world, would we even be able to use much of the software with our infrastructure/lower-level services wiped out? To take a random, perhaps tenuous example, say we had to rebuild our networks, DNS, ISPs, etc. from scratch. Inevitably behavior would be different, nodes and information missing, and so software built on layers above this might be unstable or insecure. To take more concrete examples, this problem is greatest where open-source software relies on closed-source infrastructure — AWS, 3rd party APIs, and even low-level chip designs that might not have survived the disaster. Could we reimplement existing software stably on top of re-hashed solutions?

The latter point — would we want to rebuild our software as it is now — is more subjective. I have no doubt every Hackaday reader has one or two things they might change about, well, almost everything but can’t due to existing infrastructure and legacy systems. Would the opportunity to rebuild modern systems be able to win out over the time cost of doing so?

Finally, you may have noticed that software is evolving rather quickly. Being a web developer today who is familiar with all the major technologies in use looks pretty different from the same role 5 years ago. So does archiving a static snapshot of code make sense given how quickly it would be out of date? Some would argue that throwing around numbers like 500 to 1000 years is pretty meaningless for reuse if the software landscape has completely changed within 50. If an apocalypse were to occur today, would we want to rebuild our world using code from the 80s?

Even if we weren’t to directly reuse the archived code to rebuild our world, there are still plenty of reasons it might be handy when doing so, such as referring to the logic implemented within it, or the architecture, data structures and so on. But these are just my thoughts, and I want to hear yours.

Was This a Useful Thing to Do?

The thought that there is a vault in the Arctic directly containing code you wrote is undeniably fun to think about. What’s more, your code will now almost certainly outlive you! But do you, dear Hackaday reader, think this project is a fun exercise in sci-fi, or does it hold real value to humanity?

95 thoughts on “Ask Hackaday: Why Did GitHub Ship All Our Software Off To The Arctic?”

Nicci says:

July 29, 2020 at 7:03 am

Bad idea, because all software sucks. They should have archived computer science books instead.

Report comment

Reply
1. Mr Coolio Guyio says:
  
  July 29, 2020 at 7:23 am
  
  On the contrary, this software is looking to be pretty cool.
  
  Report comment
  
  Reply
2. eternityforest says:
  
  July 29, 2020 at 12:15 pm
  
  I fail to see the suck in software these days. Modern Foss is easy to use, does a lot, and it works reliably.
  
  The only real suck comes from the cloud based stuff and network dependence, and the modern trend of going back to manual setup, configuration, and use, as opposed to automating things like Python’s auto build and cache process for bytecode, or MDNS lookups.
  
  Computer science doesn’t quite address the suck, because all the core algorithms we use today are pretty much fine, when they’re actually designed for practical use as a package rather than modular tech demos.
  
  Report comment
  
  Reply
  1. deshipu says:
    
    July 30, 2020 at 1:21 am
    
    One glimpse under the hood is enough to not sleep for a week because of nightmares.
    
    Report comment
    
    Reply
    1. eternityforest says:
      
      July 30, 2020 at 12:41 pm
      
      It’s mostly only nightmarish because it’s too big to find where to start without some serious digging. They could do better on making it easier to find the place to make the change you want to make.
      
      But in general, people are able to maintain this stuff. It’s not bad enough to get in the way of development. Usually the most sucktastic part is removed features and breaking API changes with very little notice, but even that’s not common.
      
      Under the hood doesn’t need to be perfect, we can tell it’s getting better by the great UI, and how we hear about browser exploits less and less every year.
      
      Report comment
      
      Reply
      1. tetsuoii says:
        
        August 1, 2020 at 3:30 pm
        
        Unnecessary bloat and badly structured code in more languages than spoken on earth. Expecting garbage avalanche in the near future.
        
        Report comment
      2. eternityforest says:
        
        August 1, 2020 at 5:08 pm
        
        “Bloat” is rarely a problem. Most apps aren’t even slow because of bloat, they’re slow because nobody optimized them, or they’re making cloud API calls, or someone thought it would be fun to write something by hand that really should have used a GPU accelerated library, or because they’re running in their own sandboxed container loading a whole OS, or some other resources intensive security thing.
        
        It takes a *lot* of bloat to make a performance issue, and normally, bugs outside the core of the app just show an error message and don’t let you use that particular features, if it’s well designed.
        
        The only time it usually matters is extreme security critical stuff. For an average user, if it’s an order of magnitude more secure than the odds of getting their entire phone physically stolen, they’re probably going to say it’s good enough.
        
        Report comment
3. Drone says:
  
  July 31, 2020 at 12:41 am
  
  @Nicci said: “Bad idea, because all software sucks. They should have archived computer science books instead.”
  
  It’s the humans that learned from those computer science books that wrote all that sucky software in the first place!
  
  Report comment
  
  Reply
House MD says:

July 29, 2020 at 7:25 am

Fully agree, open source is LOW quality and full of bugs. Remember Heartbleed Bug 2years in production!!

Report comment

Reply
1. Bill Gates says:
  
  July 29, 2020 at 8:32 am
  
  Yeah, Linux is awful, Firefox is trash, and we should all use closed-source proprietary software.
  
  Report comment
  
  Reply
  1. anszom says:
    
    July 30, 2020 at 6:08 am
    
    I agree with the first part, but who said that closed source is better? Generally, most of all software is trash.
    
    Report comment
    
    Reply
    1. Inhibit says:
      
      July 30, 2020 at 6:17 am
      
      … Bill Gates. Obviously.
      
      Report comment
      
      Reply
2. Remy Hadley says:
  
  July 29, 2020 at 8:38 am
  
  Yes, they have to aspire to the decades+ bugs that exist in closed source (CVE-2019-1162 for example).
  
  Report comment
  
  Reply
3. Andarb says:
  
  July 29, 2020 at 8:55 am
  
  Almost all code is low quality and full of bugs. That’s just how humans do things.
  
  Report comment
  
  Reply
4. really? says:
  
  July 29, 2020 at 9:44 am
  
  You must be trolling, but I just have to ask: So in order to get good quality and bug free code we just need to hide the code?
  
  Report comment
  
  Reply
5. chandwki says:
  
  July 29, 2020 at 10:41 am
  
  Open source software runs the world.
  
  Report comment
  
  Reply
Ethan Waldo says:

July 29, 2020 at 7:33 am

Duh, haven’t you ever heard of cold storge?

Report comment

Reply
1. Ren says:
  
  July 29, 2020 at 12:22 pm
  
  I hadn’t, so I looked it up!
  
  Storge (/ˈstɔːrɡi/,[1] from the Ancient Greek word στοργή storgē[2]) or familial love refers to natural or instinctual affection,[1][3] such as the love of a parent towards offspring and vice versa.
  -Wikipedia
  
  Report comment
  
  Reply
  1. Neil says:
    
    July 29, 2020 at 7:59 pm
    
    I vote for you to win the Internet for today for this comment.
    
    Report comment
    
    Reply
  2. tekkieneet says:
    
    July 30, 2020 at 2:25 am
    
    In another word, paraphrasing a movie line: They were unloved children
    
    Report comment
    
    Reply
Truth says:

July 29, 2020 at 7:50 am

My very first thought was “So in 1000 years, how are you going to convert the code into an executable. And what hardware will it run the binary.”

But it is funny, there is so little overlap between the most advanced technology about 1000 years ago and today.

You can not even ask these kind of questions about the most advanced technology on the planet about 1000 years ago. I guess that you would be close to the time of the Battle of Hastings (1066) so (just before gunpowder cannons) you would be talking about archiving examples of boiled leather armours, chainmails, javelins, long spears, swords, maces, axes, simple bows and crossbows.

Report comment

Reply
1. RW ver 0.0.3 says:
  
  July 29, 2020 at 8:00 am
  
  They did a fair job of recording what they thought worth recording though. We’ve got a tapestry, the domesday book, various chronicles.
  
  Report comment
  
  Reply
2. Pat says:
  
  July 29, 2020 at 8:22 am
  
  “But it is funny, there is so little overlap between the most advanced technology about 1000 years ago and today.”
  
  Uh… yes there is? The examples you gave are all weapons, but believe it or not, people do more than fight each other. Hard to believe, I know, but true.
  
  Roman concrete, for instance, is still being studied by scientists today, and we know how to do it because Charlemagne preserved a ton of classical manuscripts during the Carolingian Renaissance. Even with weapons, however, wootz steel, which apparently developed carbon nanotubes in its matrix is still a complicated process not fully understood.
  
  I don’t really doubt that in 1000 years there will be similar technological archaeologists investigating how mid-20th century technological accomplishments happened within the limitations of the time.
  
  Report comment
  
  Reply
  1. Truth says:
    
    July 29, 2020 at 9:32 am
    
    > The examples you gave are all weapons
    I went for the easiest thing that I could think of from that time period. Roman concrete is from about 2000 years ago. An unfortunate reality but the most advanced technology in any time period is typically how to efficiently spy on each other or kill each other.
    
    You could argue that mathematics, music, paintings or poetry of the period were more technologically advanced than weapons – who is to say. The only invention I could find from that time period, that stood out to me in some way, was the invention of the pound lock for transporting large quantities of cargo in China.
    
    Report comment
    
    Reply
    1. Pat says:
      
      July 29, 2020 at 12:23 pm
      
      “I went for the easiest thing that I could think of from that time period. Roman concrete is from about 2000 years ago. ”
      
      Roman concrete was *developed* about 2000 years ago, just like a lot of the code GitHub preserved is older than “right now” too.
      
      Right around the time period you were thinking of was the Carolingian renaissance, like I said – around the late 8th to 9th century. So a little earlier, but not much. That’s when Charlemagne’s empire began re-transcribing many old Latin texts so they could be preserved for the future. And this is how we *know* about Roman concrete’s construction methods.
      
      An organization deciding to transcribe and preserve many of the great works of scholarship and literature at the time. Sound familiar?
      
      Report comment
      
      Reply
  2. tekkieneet says:
    
    July 30, 2020 at 2:32 am
    
    Weapons are often a snapshot of the technology of civilization (ironic word).
    We look at stone tools, bows and arrows, metallurgy to figure what they know in physics, chemistry etc. and skill levels of manufacturing.
    
    Report comment
    
    Reply
3. Steven Gann says:
  
  July 29, 2020 at 8:33 am
  
  > you would be talking about archiving examples of boiled leather armours, chainmails, javelins, long spears, swords, maces, axes, simple bows and crossbows.
  
  I think a LOT of historians would find such an archive to be very valuable.
  
  Report comment
  
  Reply
4. rclark says:
  
  July 29, 2020 at 8:35 am
  
  My thoughts would run to (soft) archiving of technology/history/mathematics/etc. of the times. Not the physical as that can always be re-created. Lots of history/knowledge was lost for example in the destruction of the library of Alexandria. If all that knowledge of the time were in ‘cold’ dry storage somewhere….
  
  Not sure how code will benefit the future other than to make for interesting history lesson 1000 years from now. Soft of how we appreciate the old mechanical calculators for example…
  
  Report comment
  
  Reply
  1. Pat says:
    
    July 29, 2020 at 8:37 am
    
    “Lots of history/knowledge was lost for example in the destruction of the library of Alexandria.”
    
    Also, lots of knowledge was also only preserved because some people thought it was important and copied it, and it was later found. Which… is exactly a parallel to this.
    
    Report comment
    
    Reply
    1. Ren says:
      
      July 29, 2020 at 9:34 am
      
      It was the monks during the “Dark Ages” who wrote fresh copy from rotting papyrus/parchment that we have non-Christian works from Seneca, Plato, Socrates, Cicero…
      
      Report comment
      
      Reply
pelrun says:

July 29, 2020 at 8:02 am

> the Tech Tree does not appear to be [open source]

According to some of the other comments in the issues, it’s not that it’s proprietary, it’s just that it hasn’t been finished yet. It’s not like it’s going to be needed in the short term anyway, so they can take the time to do it right.

Report comment

Reply
1. (((Thomas of Borg))) (@bortels) says:
  
  July 29, 2020 at 12:51 pm
  
  Pity. The tech tree is by far the most important/interesting part of this archive. That and a summary of a few (hundred) fundamentals of computing (xor, rot13, compression mechanisms, PKI, etc) is what a future civilization would really want/need – the rest is of historical value but far less “usefulness”.
  
  Report comment
  
  Reply
Mojoe says:

July 29, 2020 at 8:27 am

Look at the extents we go to, to dig things up just to learn about previous civilizations. Not so we can duplicate it, but just so we know more about our history and how things were done. Bones, tools, household belongings, clues to their beliefs, calendars…This isn’t about rebuilding. This is to preserve that facet of our existence. It is perfectly feasible for our civilization to be “reset” by some combination of catastrophic events (pandemic anyone?). Our various infrastructures are complex and in some ways, fragile. What was the state of our technology 1000 years ago? or even 100?

Unfortunately, digital data is not a physical object and is easily lost. It requires some amount of technology to recover. Film appears to be the oldest technology that we can feasibly store that much data on for “longish” periods of time.

Report comment

Reply
Wolf says:

July 29, 2020 at 8:29 am

I think it would be more useful to have a cold storage wikipedia backup.

Report comment

Reply
1. Bill Gates says:
  
  July 29, 2020 at 8:35 am
  
  …which is already done on a regular basis by multiple groups.
  
  The existence of other good ideas doesn’t diminish the value of this. Why are many HaD commenters so ignorant and close-minded?
  
  Report comment
  
  Reply
2. RW ver 0.0.3 says:
  
  July 29, 2020 at 8:39 am
  
  I think I’ve got Encarta 97 well buried somewhere.
  
  Report comment
  
  Reply
  1. Paaaaper! says:
    
    July 31, 2020 at 7:11 am
    
    A lot has changed in the world since ’97.
    
    Report comment
    
    Reply
3. some guy says:
  
  July 29, 2020 at 9:08 am
  
  At least a few years ago you could download snapshots of Wikipedia. No idea about the size, but the text only should not be too big once compressed. If you have some disk space left, why not?
  
  Report comment
  
  Reply
  1. MoTLD says:
    
    July 30, 2020 at 2:55 pm
    
    You still can, and it’s not very big by today’s standards. I think the raw archive including talk and user pages and old revisions is a few terabytes, but indexed and compressed versions for offline reading come in at 40GB for the full text of English Wikipedia or 90GB with large thumbnails of all the images.
    
    Check out Kiwix, it’s an offline browser for Wikipedia and anything else someone wants to package up as a ZIM file; available archives include the above mentioned Wikipedia (and most Wikimedia sites), all of the various Stack Exchange sites, many TED talks, Crashcourse, Project Gutenberg, and many others.
    
    Report comment
    
    Reply
tomás zerolo says:

July 29, 2020 at 8:42 am

Github? What is Github, anyway?

Report comment

Reply
1. Traumflug says:
  
  August 2, 2020 at 3:53 am
  
  That’s a web GUI for those fools who feel overwhelmed by typing ‘git –bare init’ to set up their own public repository.
  
  Report comment
  
  Reply
DKE says:

July 29, 2020 at 8:48 am

I’ve read about this from several sources, what I haven’t been able to figure out is what it cost.
At some (lowish) cost, it’s a valuable exercise as PR stunt or just conversation starter to get people like us thinking about “what happens when.”

As far as functioning as a “useful” backup – no.
Something bad enough has to happen to wipe out all the other live copies of this data. Then we have to recover from that event to the point that we have the _ability_ to recover that archive, read, and translate that data. As well as having both hardware and software infrastructure in place to make that software worth recovering. All within the lifespan of the media.

Seems more likely we’ll be using the film strips as lashings to hold logs together to build a raft to get off the island.

Report comment

Reply
tekkieneet says:

July 29, 2020 at 8:55 am

Oh great. m Now my junk files have a place in the Ark in case things more things get out of wack in the next little while. 2020 isn’t over yet…

Even in the movie world, 2020 and 2021 are end of the world:
– In Jade’s World, Skynet became self-aware in 2021. It launched attacked on humanity on 18 June, resulting in Judgment Day.
– In the Dark Fate timeline, after the termination of Skynet and the destruction of a Cyberdyne building, a new timeline was created, in which Judgment Day happened in the 2020s.

Report comment

Reply
1. andrewjhull says:
  
  July 29, 2020 at 10:19 am
  
  Only 21tb?! In a few years time, that should fit on a pen drive.
  
  Come to think of it, I should be able to archive the whole of github on the NAS in my office.
  
  Time to rattle up a quick bash script.
  
  Actually, on second thought rural broadband in the UK is so bad it will take the best part of 1000 years to download (and the best part of 1000 years for the current government to get round to improving it).
  
  I might be quicker rowing a boat to Svalbard and taking pictures of their film on my phone.
  
  Report comment
  
  Reply
  1. RW ver 0.0.1 says:
    
    July 29, 2020 at 10:49 am
    
    Terabyte MicroSDs were out last year, so that’s only 3 Pigeon’s worth, and you can send them all at once.
    
    Report comment
    
    Reply
    1. Inhibit says:
      
      July 30, 2020 at 6:30 am
      
      Will they be on a string?
      
      Report comment
      
      Reply
      1. RW ver 0.0.1 says:
        
        July 30, 2020 at 6:41 am
        
        That’s an aesthetic choice, you can make the pigeon a little ammo belt thing, or like a photographers jacket with pockets, just put them all in a natty little courier satchel, or make a necklace out of them if you must, don’t drill them through the middle.
        
        Report comment
  2. tekkieneet says:
    
    July 29, 2020 at 1:53 pm
    
    FLASH based storage isn’t good for archives as the memory cells can lose electric charges due to high temperature, high energy radiations, material defects due to write/erase cycles.
    
    The 21TB does not include anything of sell-able commercial values (i.e. IP). There are no p0rn, music, movies, games etc.
    
    Report comment
    
    Reply
abjq says:

July 29, 2020 at 9:24 am

Out of interest I took a look at softwareheritage.org, and typed in “Hello World” as a search string, expecting to find the first C program in K&R. All I got was a big pile of spam.

That just looks like a big archive of crap. Github is probably not far off!

Report comment

Reply
1. tekkieneet says:
  
  July 29, 2020 at 1:57 pm
  
  There are projects that have no indication of their status i.e. if all the code works or it is just a storage of non-working code hoping someone would pick up and finish the project. Quite often I thought I found something useful until I look at the less than 10 lines of code and/or lack of status and documentation.
  
  I make a habit of not committing to my github until the code/hardware is at least a workable state.
  
  Report comment
  
  Reply
Ren says:

July 29, 2020 at 9:25 am

“Why Did GitHub Ship All Our Software Off To The Arctic?”

Because they can???

Report comment

Reply
1. RW ver 0.0.1 says:
  
  July 29, 2020 at 9:38 am
  
  Because they watched the Mad Max series and figure in an apocalypse the Aussies would just use the cans of film as handy “BBQ in a box” fires, if they sent them there.
  
  Report comment
  
  Reply
2. tekkieneet says:
  
  July 29, 2020 at 1:59 pm
  
  They want to see if these project can sink or swim… The arctic is melting. I hope they have something that survives flooding or water world.
  
  Report comment
  
  Reply
Ren says:

July 29, 2020 at 9:31 am

With all the code, I hope they are including some means of explaining the hardware it runs on…
Binary would be meaningless, unless it is mapped to some Op Code, and the OpCode is mapped to the construction of a processor (NAND gates and such) and I/O.

Report comment

Reply
1. Ren says:
  
  July 29, 2020 at 9:37 am
  
  I forgot another intermediary step.
  Converting the 2D barcode to binary, and decompression algorithms…
  
  Otherwise, it will all be Voynich Manuscript to future people.
  
  Report comment
  
  Reply
2. tekkieneet says:
  
  July 30, 2020 at 1:57 am
  
  Github deals primarily with source code, not binaries.
  
  As for Opcode, I was very impressed with the folks that reverse engineered the instruction set of programmable calculators by statistical analysis of the raw binary ROM dump and a lot of trial and error. The full official instruction was later published and it was almost dead on. It is similar to the way cryptos are cracked. e.g. jumps are most frequent. etc So in the right hands of a determined mind with lots of time, even a binary dump carries some information.
  
  Report comment
  
  Reply
ChipMaster says:

July 29, 2020 at 9:45 am

Boy! That software will be out of date! … next year. ;-)

Its a PR stunt plain and simple. M$ does nothing benevolent without a vampiric monetary upside for themselves. Heck nobody will probably even be allowed to prove whether or not the data was ever stored in the vault. Unless, of course, it shows up in a court battle with M$ vs. you.

Personally I feel the whole thing to be completely irrelevant. But who knows what someone will do 1,000 years from now for amusement or curiosity.

Report comment

Reply
1. Ren says:
  
  July 29, 2020 at 9:49 am
  
  “But who knows what someone will do 1,000 years from now for amusement or curiosity.”
  
  After they’ve returned from the day’s hunting and gathering, and the fire in their cave still has a bit more light/heat before it goes out for the night.
  
  Report comment
  
  Reply
Feinfinger says:

July 29, 2020 at 10:34 am

Google bad, MS good?

Historical programming-language groups disappearing from Google
–> https://lwn.net/Articles/827233/

…better let us find a grass-roots way to preserve our collective memory.

And maybe Pigor should update his hymn:

https://www.youtube.com/watch?v=BKfTlJ06Eu0

Report comment

Reply
Clara says:

July 29, 2020 at 11:13 am

Not thrilled to learn that my deadname’s been preserved for future generations to discover, but I guess the licensing made it impossible for me to stop it. *shrug*

Report comment

Reply
Drew says:

July 29, 2020 at 11:24 am

A bit of a tangent…

I’d actually like to find a way to archive blueprints on tiny glass slides with a laser. I’d like to build some things, then hide in a little compartment on the object itself glass etched micro copies of all the blueprints needed to make it- so someone with a microscope could find them, pull them out, and see the dimensions needed to rebuild it in a few hundred years.

Report comment

Reply
1. rclark says:
  
  July 29, 2020 at 12:50 pm
  
  Isn’t glass a liquid? All run together ‘over time’ ;) . Might be better to store on a hard substance…. Kind of funny actually … going back to writing on tablets of stone … so to speak!
  
  Report comment
  
  Reply
  1. Ren says:
    
    July 29, 2020 at 2:02 pm
    
    Glass as a slow liquid was proposed when old (e.g. 100 year) glass bottles were measured and found to have thicker bottoms.
    It was later realized that bottle making technique of that era resulted in bottles with thick bottoms!
    
    Report comment
    
    Reply
    1. BillSF9c says:
      
      July 29, 2020 at 5:17 pm
      
      Stained glass in v old churches is falling apart due to the flow of the glass it is said.
      
      Report comment
      
      Reply
      1. pelrun says:
        
        July 30, 2020 at 7:10 am
        
        Oh, lets not bother with facts because people *say* stuff!
        
        “Glass is a liquid because the windows in old churches is thicker at the bottom!”
        
        No, it’s thicker at the bottom because old glassmakers couldn’t MAKE perfectly flat glass in the first place and so they installed the windows thick side down! It never changed at all!
        
        Report comment
eternityforest says:

July 29, 2020 at 12:21 pm

I’ve been saying it a lot lately, but we really need an archival-grade way of writing software. A compiled or fast interpreted language that can be done in under 100k lines of C, with modern OOP and everything, and no out there experimental stuff that limits anyone’s interest, that we can all agree to just not change anything about, aside from adding platform ports and such.

Programs should be distributed as source code, and build and run should be automatic, no thinking about makefiles or anything.

Python is almost perfect, but it’s not designed to be a fixed archival format, nobody will be maintaining 3.8 in 30 years.

Report comment

Reply
1. rclark says:
  
  July 29, 2020 at 1:08 pm
  
  Doesn’t need to run persay as it can always be ‘re-written’ to the language in the language of that future day if necessary. More important is the algorithm or ‘function’ of the code, so the wheel won’t have to be re-invented so to speak.
  
  Report comment
  
  Reply
  1. rclark says:
    
    July 29, 2020 at 1:11 pm
    
    I mangled that sentence! Can’t edit. Should be something like “rewritten in the computer language of the future if necessary’.
    
    Report comment
    
    Reply
    1. X says:
      
      July 29, 2020 at 1:33 pm
      
      Faithfully reproduce all those buffer overruns and use-after-free bugs and their consequences, what are you smoking?
      
      Report comment
      
      Reply
  2. X says:
    
    July 29, 2020 at 1:31 pm
    
    Yeah future languages will come with all of the old stupid stuff built in so they can run old code, what a great idea
    
    Report comment
    
    Reply
    1. Ren says:
      
      July 29, 2020 at 2:04 pm
      
      1000 years from now they will still #INCLUDE current libraries for software that will need them to run!
      B^)
      
      Report comment
      
      Reply
  3. eternityforest says:
    
    July 29, 2020 at 11:42 pm
    
    Rewriting code is subject to economic forces, and aside from algorithms that deal with “soft” stuff like computer vision and media processing, a lot of software is already pretty obvious how one would rewrite it using the current available black box algorithms people take for granted, like hash tables and compression.
    
    Data formats and protocols are probably the hard part though. Preserving those is important, but companies don’t seem to want to reveal those…
    
    Report comment
    
    Reply
2. X says:
  
  July 29, 2020 at 1:28 pm
  
  How is that supposed to work for embedded software? The target cannot compile the source. Are we supposed to put all of the required build tools in the tarball? Do we really have to put a whole copies of lex and yacc and swig and latex in our tarball? How idiotic is that?
  
  Report comment
  
  Reply
  1. eternityforest says:
    
    July 29, 2020 at 11:37 pm
    
    A lot of embedded software is usually pretty simple and already is archival friendly to some degree. I suspect the very first Arduino programs would be very easy to compile today, and even things like MPLAB haven’t changed much. GUIless C/C++ with no operating system versions to worry about, no shared libraries to share with newer programs, etc, aren’t really that big of an issue.
    
    But making an “archival” build environment that did in fact have lex/yacc/swing/whatever(As opposed to bundling build tools with every single embedded firmware source tarball), seems like a pretty good idea.
    
    Even with no disaster, I can imagine someone using software like this for the config GUI of some $2000 piece of gear, to make sure it never becomes unusable with new computers.
    
    Report comment
    
    Reply
    1. pelrun says:
      
      July 30, 2020 at 7:13 am
      
      As a professional embedded systems engineer, I laugh at your entire assertion.
      
      Also, let me know when you’ve made the environment that is all things to all people. I wanna see it.
      
      Report comment
      
      Reply
    2. X says:
      
      July 30, 2020 at 10:28 am
      
      You are aware that all of these build tools have forked and gone their own separate ways so there is no way to have a standard lex or yacc because there are too many different forks and versions?
      
      Report comment
      
      Reply
      1. eternityforest says:
        
        July 30, 2020 at 12:45 pm
        
        There’s a way to achieve *a* standard version. As in, the company making crappy 8051s meant to survive the apocalypse can say “Here’s exactly the build tools to work with this, it will always run on this archival VM, it includes the proper lex and yacc, so you’ll always be able to compile old programs for this chip on any host machine”
        
        Not that practical for daily use, but this is about having a backup plan if things really fall apart.
        
        Report comment
GDPR says:

July 29, 2020 at 12:57 pm

I haven’t seen a clear presentation of exactly what information they archived. Was it only the files stored in the repositories or was it the related user profiles as well?
Did they anonymize all the data first by removing names and email addresses?
What if I had personal information in my repository and I want to be forgotten? Will they send their archiving guy over to Svalbard to scratch my details off the film?

Report comment

Reply
1. thannattt says:
  
  July 29, 2020 at 2:19 pm
  
  Good point. Also…very apt moment to conserve only the most popular code haha
  
  Report comment
  
  Reply
2. X says:
  
  July 29, 2020 at 3:57 pm
  
  “What if I had personal information in my repository and I want to be forgotten?”
  
  How does it feel to want? You dumped your stinky turds on GitHub and now you want to kick some cat litter over them. Too bad, you own that crap and it’s right there for your future job interviewers to Google, tough on you
  
  Report comment
  
  Reply
3. JM says:
  
  July 30, 2020 at 1:58 am
  
  ” GitHub took a snapshot of every public repository”
  Never used GitHub myself but why put sensitive date on to “public repository”?
  
  Report comment
  
  Reply
4. pelrun says:
  
  July 30, 2020 at 7:18 am
  
  It has been said *at length* never to put sensitive information into a public git repository.
  
  Why do you demand other people take responsibility for actions you took? It’s not even ambiguous – if you don’t want the information to be public, DON’T MAKE IT PUBLIC.
  
  Report comment
  
  Reply
Hirudinea says:

July 29, 2020 at 2:25 pm

“WHY DID GITHUB SHIP ALL OUR SOFTWARE OFF TO THE ARCTIC?” Because they didn’t think the software was cool enough.

Report comment

Reply
1. BillSF9c says:
  
  July 29, 2020 at 5:21 pm
  
  So, “cold storge,” is cold feelings toward the families of software. Uh… sort of a software unibomber feeling…?
  
  Report comment
  
  Reply
Saabman says:

July 29, 2020 at 6:13 pm

Archiving it in such a format has the added benefit of reducing the amount of material that can be modified to suit the political narrative at the time. Already here in Australia changes are been made to rewrite our history to make it look better in the eyes of the disruptive minority.

Report comment

Reply
Peter Knoppers says:

July 30, 2020 at 2:32 am

The “un-encoded human readable text guide in multiple languages” could become the Rosetta stone for some post apocalyptic society that wants to understand all the written things that we bequeathed them.

Report comment

Reply
JM says:

July 30, 2020 at 3:17 am

I guess in 1000 years scientist will announce:
“Before computers served some useful purpose they were mostly greeting the world or blinking a light. We don’t know why people needed more and more complex machines to do that but analysis of The Code from The Arc leaves no daubt about that.”

Report comment

Reply
hexXamphetamine says:

July 30, 2020 at 5:01 am

I’m not sure if the point is to provide something that will be used to rebuild a society or even used at all. I think it would be kinda like when we find artifacts from long vanished civilizations—its a glimpse into a time that is forever gone, a way to see what our ancestors found meaningful and spent their time doing.
Someday in the fairly distant future WE will be someone’s long-forgotten ancestors and I think it would be a pretty cool thing to find! I don’t think the particulars of what software make a difference, I would imagine it would be studied more on a cultural level so the need to re-compile everything into working programs would be unnecessary.

Report comment

Reply
RetepV says:

July 30, 2020 at 6:34 am

If society collapses, how long will it take for the survivors to reinvent enough technology to go to the Arctic again? And how long will it take them to, by utter chance, find that vault?

I think a better thing to do would be to store it in some kind of monument which will still be there in 1000 years from now. Things that apply as such kind of monument are huge pyramids, huge temple complexes, just simply any huge man-made monument that can be seen from afar by people who did not even reinvent the lens yet.

Report comment

Reply
1. pelrun says:
  
  July 30, 2020 at 7:20 am
  
  People have been living in the arctic for how many thousands of years now? Technology’s not the problem.
  
  Report comment
  
  Reply
Ryan Timothy Vasquez says:

July 30, 2020 at 8:08 am

This is great and all…..but what about compilers and/or interpreters for the code

Report comment

Reply
1. X says:
  
  July 30, 2020 at 10:30 am
  
  Yes because we want future generations to also suffer from buffer overruns and use-after-free bugs, that’s the ticket.
  
  Report comment
  
  Reply
Mike says:

July 30, 2020 at 8:35 am

Mu suggestion: keep all the memes instead of real code. That will explain a lot to future generations.
Nobody will be interested to rebuild the same exact code (with all the needed tools needed to interpret it). This archive will be the same as old vintage sumerian accounting claims: interesting as an historical view, not as real tools.

Report comment

Reply
Chris says:

July 31, 2020 at 6:28 am

What about my right to be forgotten? I can’t stand to read code that I wrote few years ago, now it will be there forever.

Report comment

Reply
Robb says:

July 31, 2020 at 7:31 pm

Ah if only there had been such an archive after Black Death. European leaders would have been able to re-establish key theories like the earth-centered universe, demonic possession as the cause of all illness, and the divine right of Kings.

Report comment

Reply