Amazon S3: Out Like a Light; On Like a Bathtub

You no doubt heard about the Amazon S3 outage that happened earlier this week. It was reported far and wide by media outlets who normally don’t delve into details of the technology supporting our connected world. It is an interesting thing to think that most people have heard about The Cloud but never AWS and certainly not S3.

We didn’t report on the outage, but we ate up the details of the aftermath. It’s an excellent look under the hood. We say kudos to Amazon for adding to the growing trend of companies sharing the gory details surrounding events like this so that we can all understand what caused this and how they plan to avoid it in the future.

Turns out the S3 team was working on a problem with some part of the billing system and to do so, needed to take a few servers down. An incorrect command used when taking those machines down ended up affecting a larger block than expected. So they went out like a light switch — but turning that switch back on wasn’t nearly as easy.

The servers that went down run various commands in the S3 API. With the explosive growth of the Simple Storage Service, this “reboot” hadn’t been tried in several years and took far longer than expected. Compounding this was a backlog of tasks that built up while they were bringing the API servers back online. Working through that backlog took time as well. The process was like waiting for a bathtub to fill up with water. It must have been an agonizing process for those involved, but certainly not as bad as the folks who had to restore GitLab service a few weeks back.

[via /r/programming]

Fanboys want to take AT&T down

A post about Operation Chokehold popped up on (fake) Steve Jobs’ blog this morning. It seems some folks are just plain tired of AT&T giving excuses about their network. The straw that broke the camel’s back came when AT&T floated the idea of instituting bandwidth limitations for data accounts. Now, someone hatched the idea of organizing enough users to bring the whole network down by maxing their bandwidth at the same time.

We’re not quite sure what to think about this. Our friend Google told us that there’s plenty of press already out there regarding Operation Chokehold so it’s not beyond comprehension that this could have an effect on the network. On the other hand, AT&T already knows about it and we’d wager they’re working on a plan to mitigate any outages that might occur.

As for the effectiveness of the message?  We’d have more sympathy for AT&T if they didn’t have exclusivity contracts for their smart phones (most notably the iPhone). And if you’re selling an “Unlimited Plan” it should be just that. What do you think?

[Thanks Bobbers]

[Headlock photo]

Gmail without the cloud: tips for next time

gmail_outage_tips

Yesterday’s Gmail service outage is a hot topic on just about every news site right now. For so many of us that have always taken the reliability of Gmail for granted it was a real shock to lose all of the functionality of the web based system. Now that we’ve learned our lesson, here’s a couple of tips to help you out the next time there’s an outage.

Continue reading “Gmail without the cloud: tips for next time”