On Cloud Computing And Learning To Say No

Do you really need that cloud hosting package? If you’re just running a website — no matter whether large or very large — you probably don’t and should settle for basic hosting. This is the point that [Thomas Millar] argues, taking the reader through an example of a big site like Business Insider, and their realistic bandwidth needs.

From a few stories on Business Insider the HTML itself comes down to about 75 kB compressed, so for their approximately 200 million visitors a month they’d churn through 30 TB of bandwidth for the HTML assuming two articles read per visitor.

This comes down to 11 MB/s of HTML, which can be generated dynamically even with slow interpreted languages, or as [Thomas] says would allow for the world’s websites to be hosted on a system featuring single 192 core AMD Zen 5-based server CPU. So what’s the added value here? The reduction in latency and of course increased redundancy from having the site served from 2-3 locations around the globe. Rather than falling in the trap of ‘edge cloud hosting’ and the latency of inter-datacenter calls, databases should be ideally located on the same physical hardware and synchronized between datacenters.

In this scenario [Thomas] also sees no need for Docker, scaling solutions and virtualization, massively cutting down on costs and complexity. For those among us who run large websites (in the cloud or not), do you agree or disagree with this notion? Feel free to touch off in the comments.

45 thoughts on “On Cloud Computing And Learning To Say No

  1. Well, every technology has to have advantages over its competitors, otherwise why even bother? so I think both use cases are worthy and valid. Spending a bit more money to be on the safe (in this case, distributed) side is never too much.

    1. I agree both have their uses. However don’t really need a true cloud to have some distributed for safety elements – would be very natural to have two servers at different locations you fail over between for instance with a cold or hot spare at both locations perhaps. Still way simpler than the cloud, no external trust required, and with uptime good enough for most use cases that the expected downtime in a transition from A to B won’t matter.

      Where the cloud really wins is the degree of shared hardware between many different clients, which has a few advantages to go with the complexity – your one server can be DDOS to death quite easily, but a cloud provider usually has enough capacity to handle it without having to have enough hardware to handle it for every single one of their clients at once.

      1. The bill after being DDOSed on a cloud can be the national debt.

        Which is worse?
        Being offline for a day.
        Being in bankruptcy court facing a signed contract you never read…Hoping you paid good attention to incorporation paperwork. But knowing you filled out a form you downloaded and signed (never read).

        1. Pick your poison either way. Either you can’t ply your trade which will cost you, potentially really absurd amounts of cost depending on what you do, or you make sure you can always do your job and accept its possible that will cost you a heap too. As always not one size fits all.

        2. Sometimes you can go get DDOS protection, or maybe a straight-up insurance policy of some kind, to avoid unaffordable bills. If that doesn’t apply to you, surely you at least are going to set limits on usage that have to be manually overridden, right?

          1. Devil is always in the details.

            The cloud is supposed to be turnkey.

            You’d think cloud providers would have a credit limit, per client. Cut them off automagically. Bet at least some of them don’t. Rather run up a receivable.

        3. WRONG. I work at AWS. Every major cloud vendor has tunable cost monitoring with ability to set a daily, weekly, monthly bill limit. You can even set spend limits for daily, hourly, minutes or seconds on specific services, like certain storage, database, analytics or compute. You don’t have to sue us, just read the user manual (service docs) and follow the guidance and you won’t overspend. It is embarrassingly SIMPLE.

          People who overspend are the ones who don’t know what they are doing. Easier to blame the vendor.

  2. The maintenance overhead should be enough to dissuade anyone from doing that – especially when it comes to kubernetes. I’m averse to docker as well because its just added complexity without any advantage for my use case.

    1. I strongly agree on the complexity part, but even more so disagree with contarization. I run everything in a container nowadays. Even a simple daily curl script. All via docker(compose) with just 5 lines or so.

      Sure, I could run it all on the host, and for years I did. But the clean delegation. Isolation. Jail-like setup. But most of all no more dependency breakage, broken updates and cross contamination.

      No, cgroups (containers) for everything from hosting, all the way down to development!

  3. Can I host all of Wikipedia from a current gen Epyc in a closet in my office? Probably! And will it be cheaper than AWS? Yes!

    Except… the articles author willfully ignores all the real reasons a top 1000 website opts for a cloud hosted model vice colo / self hosted. Scaling and geolatency are really only a small parts of the equation. Resiliency and redundancy matter a great deal to this top 1000 site, particularly when revenue is driven by users viewing but continuing to engage with content. If you blow it on latency, your bounce rate will be high. But if users can’t access it at all, your bounce rate is 100%.

    Servers crash, DDoS attacks exist, code deployments get increasingly complex (especially with distributed teams), and security breaches are real. Yes, you can roll your own and spend far less money on *hardware*, but the TCO can be dramatically higher.

    It’s never wise to just default to “put it in the cloud”, there are other (sometimes better) options. But this article reads like “Y’all are suckers and the cloud is a grift”, which is just greybeard* conceit.

    *Proudly a metaphorical and literal greybeard myself

    1. Did you mention SLA?

      Because cloud offerings come with a contractual obligation for uptime that just isn’t available to anyone running a single server in their closet.

    2. Cloud is usually accompanied by, “someone else’s server” said in a derisive tone, like people are obliviousness to that. There needs to be a YT nostalgia channel for those people featuring nothing but racks of old computers with people running around when one critical server goes down.

      1. If you design your network with a single point of failure, that’s on you not the lack of “cloud”. Hardware isn’t even slightly expensive relative to nearly every other cost of running a business. There is rarely an excuse to not build something as close to completely fault tolerant as it’s possible to achieve.

        1. Sure, the box is cheap. The redundant power, redundant networking, redundant cooling, etc are much, much more expensive. And you’re duplicating effort on something that isn’t core to your business. It’s a commodity.

          If reliable hosting is so cheap and easy, offer it as a service. Call it cloud++.

    3. These are my thoughts as well, a lot of the time raw performance is my last concern.

      However, if your site is mostly static, statically building your site, then pushing the content to a cdn (or s3) you can easily scale out far more easily than with containers&app servers. It’s also more simple to understand and maintain.

      You can also do Incremental Static Regeneration (ISR) which is a relatively automated way of rebuilding static site based off of dynamic data.

  4. Agree, I have run large systems in the past. A lot of these new tools are neat, cool, and new people in the industry learn these these and feel like they have to implement them. Case in point, I met a software engineer who was telling me how they are setting up a simple site, mysql, json calls left and right template, and I was like why is the site so slow, I did a quick remake in straight php/html/vanila js and it outperformed hers. I had to explain that using all the latest stuff in one project is overkill. learn the basics and use what is needed to solve a problem/issue. so expand this out to docker, rpc, databaes when not needed, serverless hosted of some items, its becomes kaos

    1. This is one of my perennial struggles with our newer folks. I love the latest and greatest whatsits, and make a point to take on projects where I can learn through application and implementation. But you have to have the also know the moldy oldies to understand what the right tool/approach is for a given task.

      Could we create a CRD with hooks to a serverless function to do some “thing” in our k8s cluster? Yes!
      Would a cron’d bash script work just as well, be faster to implement and infinitely more understandable/maintainable, albeit less sexy? Probably.

  5. I won’t argue with the fundamental point that there are usually cheaper ways of doing everything than doing it in one of the big cloud providers, but this article is severely under-representing the flaws of self-hosting and ignoring where the costs creep in, and I’ve got a lot of experience in these areas. I’ve spent years rolling my own and also working with cloud solutions. Here are some of the things this article fails to address:

    1) In any business a significant if not most expensive cost is personnel and in technology, dev time. If you roll your own, your not only not going to be able to hire devs who immediately can jump into your stack, you’re going to have to spend significant money on training and also initial setup of each stack technology.

    2) Deployment is expensive. Using a “standardized” cloud stack means you can invest in a deployment interface like terraform, and have robust and repeatable deployments. Yes, you can probably accomplish something similar with SaltStack, Kubernetes, or some other solution, but it is going to be more painful to setup and maintain.

    3) Security. With something like AWS or another provider, you get automatic, built in ACL’s, firewalls, KMS key management and transparent encryption/decryption and key management/rotation, this is extremely difficult to do quickly and securely in your own stack. Related, you’re still going to have to work with third-parties for a lot of resources: domains, SSL certs, IP blocks, etc. These take a lot more time and effort to setup and maintain (manage payments and credit card expirations), where-as with a cloud provider it’s easy to add/remove in seconds.

    4) If you own the entire stack, you have to constantly patch and securely configure the whole stack. This is a huge amount of labor to stay on top of vulnerabilities and “secure configuration” for the entire stack down to metal. One of the huge advantages of a cloud solution is that most of the stack is managed by someone else and you only have to worry about your specific technology. More importantly, you need to have domain experts for all of the technology stacks down to bare metal to manage and patch all that stuff and handle failures. This is cost prohibitive often, and you often need more than one person because you need fast response times when things break, and they will break. A big advantage of cloud providers is that they have whole teams of people devoted to their products to ensure that new versions don’t break existing stacks. If they do, they fix them very quickly and it doesn’t cost you anything (you will likely even get reimbursed for your losses).

    5) Compartmentalization. It’s one thing if you want to run a simple webserver / or LAMP stack, but as soon as you want to do anything more complicated, you are suddenly going to need to do virtualization and/or buy and pay for multiple servers. Both options get spendy fast, especially when you don’t necessarily know what you need up front. Similarly, one advantage of a cloud provider is that if you design well, you can ramp up, as-needed, quickly for arbitrary levels of traffic. This is simply not the case when your renting bare-metal servers or buying. If you go with virtualization, you have to deal with Virtualization escapes. The big cloud providers spend a lot of time and effort hardening their virtualization solutions and maintaining them. You’re gonna have to do all of that now.

    6) Co-Lo is stupidly expensive. If you actually want to own the hardware, co-location is expensive and slow to replace in the event of hardware failure. VPN across locations is a pain to setup and manage, and difficult to get working at speed. With a cloud provider, you can just share a VPC across environments and regions and setup peering and have secure communication that is very fast. The dirty details are all managed for you under the hood.

    7) Hardware failure is a huge potential issue and recover can be slow. With a cloud platform and proper design, you won’t have to lift a finger to automatically fail-over to a new “server”. This is a huge advantage and if you think failure won’t happen to you, then you’ve not ever worked in the tech industry for any length of time.

    8) The guy kind of talks about it, but CDN’s and edge caching are a big advantage of cloud computing. You also often don’t pay for intra-cloud bandwidth, which is often not the case for renting bare metal.

    1. One other one I forgot to mention is back-ups. When you use a cloud solution, you usually get robust back-ups nearly for free (often just a switch you flip). If you are doing it yourself, you not only have to implement a back-up solution for each technology, you need to test it regularly to make sure it is working as advertised, monitor that it isn’t broken, and also handle where the back-ups live (they need to be stored in multiple places in case of corruption or failure). In the cloud, you don’t have to do any of that. You flip the switch and have a very high confidence that it is working from both a practical point of view and an “industry-accepted” level of risk point of view.

    2. 1 and 2: You can have your own standardized self hosted cloud. OpenStack and OpenShift runs on Linux. So devs can just jump in. And I agree dev time is costly and you pay the same licensing on cloud or on a dedicated server.

      3) You can have a secure cloud BUT you have to properly manage the infinite security knobs. OWASP was caught leaking internal files because of a misconfiguration, and it was fricking OWASP! Misclick an AWS setting for a bucket and you have internal data leaking, or you have a public read-write bucket hosting porn.

      4) Yep! Nobody can have the amount of expertise of Amazon without paying the same amount Amazon spends on their security. You will have to pay for doing all security and maintenance locally and it will cost you, and it won’t be as good as the security of a provider.

      5) You can install KVM on Linux and have all compartments you want. Flexible scale-up and scale-down are great when you have no idea if you will have 10k users or 1M users. If your service is more or less consolidated, you are paying for something you aren’t using. Either in cash or in performance.

      6) Co-Lo isn’t stupidly expensive. One of my clients migrated from AWS to a dedicated co-located managed server and spend 20% of the price. The contract have pretty good SLA times, and in 8 years we had two HDD failures and one network card failure, and it was fixed in a few hours. VPN isn’t a pain and isn’t difficult, they were using the integrated VPN service from Windows Server and it was way above acceptable.

      7) Check the SLA of your provider, and check the time to deploy. My provider can deploy a new server in 3 minutes, and you can pay for as little as 2 weeks, IIRC. Hardware failure is covered by the SLA.

      I believe the main point of the article is that sometimes you don’t need all the bells and whistles that a cloud provider gives you, and sometimes you don’t want that. It’s easier to start slow and small and grow, and get an surprisingly large cloud bill at the end of the month because you used your bandwidth quota and now you pay $90 per TB, or your weekly backups exceeded the allocated space and now you have to either pay a lot or delete all your backups. Or you have an issue with your code and someone decided to host game iso files and your bandwidth usage explodes overnight.

      Cloud is a very useful tool, I recommend for a lot of cases, and I don’t recommend for other cases. My point is that cloud isn’t the only solution, sometimes is the best solution but sometimes isn’t.

      1. I don’t disagree with most of your points. I think there’s a sweet spot for cloud. It lives somewhere between small start-up / simple apps and a medium to large corporation that can amass their own in-house specialists and hardware. There are a lot of companies that have successfully shifted to their own “cloud” from AWS/Azure/GCP/Oracle once they were big enough and they saved tons of money. My main point is that especially for start-ups when your stack isn’t clearly defined, and you need to move quickly, experiment, and build out something with unknown demand, it is really hard to make a cost-effective argument for rolling your own, especially with most cloud providers giving you a lot of credits early on. There’s just too much pain in the initial setup process for something you might end up tearing down and starting all over with 10 times while you figure out the right approach. Of course the cloud vendors are betting on vendor lock-in and the more you build, the harder it is to lift-and-shift. It’s just not a clear, cut-and-dried, “cloud vendors are screwing you” argument like TFA makes it sound.

    3. All your points except #1 are, more or less, specific examples of #1.

      And they are mostly fair points. The counter is if you don’t spend the money to have the talent in house, you are locked in. Good luck with your cloud bill. Cloud works financially when companies are too small to justify a fulltime ‘butt in chair’ for all those rolls. Then pure inertia, tech debt and lockin.

      We all know there is nothing more net negative productive than a ‘network security specialist’ with time on their hands. Do they setup a silent network intrusion detector? Hell no, they create policies for everyone to enjoy. Claim veto on updates and installs. I’ve had luck setting them against each other. I digress.

    1. buy a cheap box, stuff it in a datacenter. could be a VM too, but you only get the box, with an internet coonection.

      flat rate, no dynamic provisioning, no host-provided services (eg. no host provided object store). basically “just” the hosting with all the value-add shit stripped away that differentiates amazon or another cloud provider. keep stripping and stripping until there’s nothing left to take away.

      turns you you really dont need any of it.

      1. haha thanks! so that’s called “self-hosting” and it’s not simple.

        and it’s alright so long as you don’t need scalability or reliability or low cost and can tolerate the significant bandwidth/access restrictions of whatever local network you happen to be a part of (iow, it needs a big city). it can be made to work, and ‘cloud’ doesn’t automatically solve every downside of it but it’s probably gonna be a part of most well-engineered solutions.

  6. While I don’t disagree with the core argument of the post, I would caution the wisdom of putting your database on the same machine that serves the content. Databases, especially SQL databases, have a large security cross-section, and compromising a database generally compromises the entire machine.

    Early in my career, I worked for a company in San Diego that had a sister site owned by someone else in Ontario. The Ontario company fired their network engineer, which my boss poached and hid from his business partner. IT for both companies was handled out of the Ontario site. The network engineer would routinely have his admin domain account deleted. Every time it happened, he would shell into the MSSQL server using the local admin and make himself a new account. This was the moment I realized that database servers should never be reachable over the internet, and also that you shouldn’t work for dudes that lie to their business partners.

    1. You shouldn’t even use the same OS on you DB server as your web server.
      Then again ‘Connection string’. Your web server should use a very low priv creds.

      Idiots will do the thing (separate machines.), then have the server connecting to the db as admin. With the same password as the network admin, plain text in connection string hard coded into web server, source in /src.
      I’m sure that idiots can hand their data to the world on the cloud too. I have faith in idiots.

  7. This article is all based on the assumption that traffic is constant. 11mb/s is an average… how about peak rates? That’s when your website would require some extra punch.

    1. 11MB/s is achievable on a home connection. Your hosting provider probably won’t supply you with a 100BASE-T cable, you will have at least a Fast Ethernet connection. And your provider does not charge for MB/s, but for TB/month.

      Peak rates are when your cloud provider makes a lot of money. As soon as you go above the “contracted bandwidth” you have to pay extra, and that extra usually is larger than the monthly amount. The article says that Vercel charges $200/TB when you go over the included bandwidth. You go 20TB over what you contracted and you owe 4k USD… Or a misconfiguration lets people host HD video files on your account, and you are left with a hundred thousand dollar bill… plus the amount of storage you surely went overquota.

      On the other hand, co-location providers give you a x MB/s network connection and you don’t pay for the bandwidth, so if you transfer several dozen GB or TB does not make a difference on your bill. And if someone fills your harddrive with porn, you won’t pay anything extra as well.

  8. Its one thing being on the business side of cloud services, i can see how the prospect of having your data decentralized across many servers inherently makes it more safe from a data loss stand point.

    But from a consumer IoT stand point I for one am sick of being treated like this is the first time ive touched a computer ever by all these device manufacturers…not to mention the frankly insulting level of condescension in the instructions and support channels, yet they are locked down tighter than fort knox (has anyone fully jailbroken the old Ring doorbells and cut out the cloud entirely yet?!?) like they know power users are going to try breaking them out of their false walled gardens. We all know IoT runs on MQTT and SIP, or similar protocols. Protocols that are known and open source most of the time. Protocols that have existed in some form as long ago as back when call waiting would boot you out of yahoo chat. Protocols that can easily be brokered by a decently modern router with OpenWRT firmware, and use googles Oauth to enable remote access and messaging. THINGS DAMN NEAR ANYONE ON THIS SITE CAN DO OR AT LEAST FIGURE OUT!!!

    This idea that there are people who have never used a computer before needs to stop. Those people are starting to move into geriatric care facilities. And while yes, there is a market for people who just want to plug it in and have it work, the first company that figures out make a decent product that the customer can use however the hell they want to without our fingers in their Network on the back end is going to make an absolute killing from all of the nerds who finally have a solution for their IoT home automation dreams without having to sign away their data to the cloud (which is just a fancy term for someone else’s computer that owns your information). But of course if you say no to the cloud with the current offerings of products you’re not allowed to use the product, it flat out will be non-functional. A product you have legally purchased, you are not allowed to use, without signing away rights to your data and allowing some random company most likely based in China (that’s not paranoia, it’s true), to have complete unfettered access to inside our home networks. That’s what I say no to. I have no interest in iot or home automation simply because there are no devices that are A) affordable, and B) of decent quality, support, and integration, which are 100% free of cloud reliance. Of all of the companies that have come and gone in that market not a single one of them has produced a product worth seriously considering allowing into my home network. They might do a cool things on the surface, but they do some very not cool things on the back end.

    We are grown up enough and have enough hardware readily available to “Say no to the cloud” Its up to them to decide to profit by giving us what we want, or continue to insult us treating us like walking wallets who are easily placated by pretty sounding but ultimately condescending PC double speak.

    1. Yep. Tend to agree.

      Have you looked at Home Assistant with ZigBee. Most ZigBee sensors/switches/lights work out of the box – no cloud – well opt in option is available, but not required (or needed).

  9. Does that include the cost in human time to patch the machine? The software licenses? SLAs?

    I don’t doubt that you can do the math to make it cheaper than cloud hosting (they have to make some profit). But you get a lot of bang for your buck with cloud systems that is hard to quantify the cost of when running your own system.

    Are lots of sites/companies over buying? Yes, but that’s not limited cloud systems.

  10. I want to get back to the idea that everyone’s devices are nodes on the internet capable of sharing with one another.

    Death to [anti]social media! And a rise in self-hosting!

    Just forward a port from your cable modem or ONT through your router to a power sipping Pi or similar SBC and leave it on.

    Worried about security? Well… WTH does the HTTP spec have to be a whole doctoral thesis anyway? It can’t be too hard to do 90% with a much simpler protocol spoken by a simple server with a much smaller attack surface. And that could be ran in a sandbox easily enough. Maybe sort of a Gemini plus SSL plus simple forms submission support?

    But that’s only for the tinfoil hat types. After nearly 30 years of running an Apache server locally, with the ports forwarded I have experienced exactly ZERO PROBLEMS. I mean.. I’m sure if I was hosting a business with a million hits a minute or the website of a political party things would be different. But most of us are not.

    Now.. SSH OTOH… I used to check my logs and see soooo many failed attempts! Definitely don’t allow root login nor use really common usernames. Nonstandard ports didn’t even help much for me! Although… the particular numbers I chose probably weren’t that uncommon/unguessable. Hey… I have to remember this myself! A firewall rule blocking foreign requests… that dropped the strange ssh attempts down to nothing!

    1. For securing SSH, I have a non standard port, a honeypot, PortSentry, Fail2ban, and TOTP. And geoblocking.

      First the geoblocking blocks entire countries. I only allow where I live, and places that I someday may travel to. All the rest of the world is blocked.

      Then, the honeypot accepts connections on port 22 and says the default invalid password message for any username/password. It can distract almost all attackers.

      If they realize it’s a honeypot (usually they don’t), standard port forces them to portscan. PortSentry detects a portscan and bans it. If they somehow find the correct port in 4 tries or less, they have to bruteforce the user and password in less than 3 tries, or Fail2ban bans them. If they manage to do so, they have 3 chances to guess the 6 digit OTP or Fail2ban bans them.

      In more than 10 years having the same server online, with a public address, nobody but me accessed it as far as I know…

  11. So having done both, as well as running massive data centers hosted in colocations. I 100% prefer self hosting and not using a cloud even for complex things that are more than a website. Why?

    1. Without using a CDN, even for a basic website I’ve had it where with AWS and Azure people are unable to use my app/websites during peak hours. This was caused by peering between their ISPs and the bandwidth available between cloud provider. When many people are using shared resources and those pipes get filled between peering partners you get screwed. You wind up needing to spend more money and deploying a CDN and adding more complexity.

    2. To continue on that, many clouds epsell using there network for edge connectivity to speed up connections to your visitors. When this works it’s great. However, if you live somewhere where the peering sucks you find having it just on the internet and letting the internet do the routing you get lower latency more stable connections. For example, I’m near azure datacenter, use hot potato routing with Azure, pay extra and yet I get routed two states away, when I should be going to azure a state away. I wind up with 60-70ms of latency when the data is hosted 20ms away. The work around for this is to use a CDN.

    3. The networks are not stable. They mention this and it’s typically noisy neighbor issues. The fix is to actively host in another region and that is costly and complex.

    4. Visibility and control. I’ve never opened so many tickets in my life because I am unable to see the issues going in the black box that are breaking my products. I end up in a case with a contractor for the cloud provider, who has no clue either. We speens months and the problem usually works itself out .

    I have more. But I think I made my point. Self hosting all the way.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.