Simple Storage Service

You no doubt heard about the Amazon S3 outage that happened earlier this week. It was reported far and wide by media outlets who normally don’t delve into details of the technology supporting our connected world. It is an interesting thing to think that most people have heard about The Cloud but never AWS and certainly not S3.

We didn’t report on the outage, but we ate up the details of the aftermath. It’s an excellent look under the hood. We say kudos to Amazon for adding to the growing trend of companies sharing the gory details surrounding events like this so that we can all understand what caused this and how they plan to avoid it in the future.

Turns out the S3 team was working on a problem with some part of the billing system and to do so, needed to take a few servers down. An incorrect command used when taking those machines down ended up affecting a larger block than expected. So they went out like a light switch — but turning that switch back on wasn’t nearly as easy.

The servers that went down run various commands in the S3 API. With the explosive growth of the Simple Storage Service, this “reboot” hadn’t been tried in several years and took far longer than expected. Compounding this was a backlog of tasks that built up while they were bringing the API servers back online. Working through that backlog took time as well. The process was like waiting for a bathtub to fill up with water. It must have been an agonizing process for those involved, but certainly not as bad as the folks who had to restore GitLab service a few weeks back.

[via /r/programming]

Hackaday

Simple Storage Service

1 Articles

Amazon S3: Out Like A Light; On Like A Bathtub

Search

Never miss a hack

If you missed it

Thingino Teaches Cheap IP Cameras New Tricks

Hackaday Europe 2026: High Performance SDR On The Cheap

Encryption In The 1790s

The Need For Speed: Internet Speed Measurement (or DIY?)

Postal IRCs Are Almost A Thing Of The Past

Our Columns

Hackaday Podcast Episode 380: 3D Printing The Rainbow, IR And IP Camera Hacks, And Americium 241 On The Loose

This Week In Security: What’s In A Name, The AI Bugpocalypse Hits Everyone, OpenWRT Flaws, And Duress Passwords

FLOSS Weekly Episode 877: RCE As A Service

Hackaday Links: July 26, 2026

Add Sensors To Everything!

Search

Never miss a hack

Subscribe

If you missed it

Our Columns