OnPrem

They left AWS to save money. Coming back cost even more

Not long ago, a partner I work with told me about a company that decided it had finally had enough of AWS.

The monthly bill had become the sort of document people opened with the facial expression usually reserved for dental estimates. Consultants were invited in. Spreadsheets were produced. Serious people said serious things about control, efficiency, and the wisdom of getting off the cloud treadmill.

The conclusion sounded almost virtuous. Leave AWS, move the workloads to a colocation facility, buy the hardware, and stop renting what could surely be owned more cheaply.

It was neat. It was rational. It was, for a while, deeply satisfying.

And then reality arrived, carrying invoices.

The company spent a substantial sum getting out of AWS. Servers were bought. Contracts were signed. Staff had to be hired to manage all the things cloud providers manage quietly in the background while everyone else gets on with their jobs. Not long after, the economics began to fray. Reversing course costs even more than leaving in the first place.

That is the part worth paying attention to.

Not because it makes for a dramatic story, though it does. Not because it is especially rare, but because it is not. It matters because it exposes one of the oldest tricks in infrastructure decision-making. Companies compare a visible bill with an invisible burden, decide the bill is the scandal, and only later discover that the burden was doing quite a lot of useful work.

The spreadsheet seduction

On paper, the move away from AWS looked wonderfully sensible.

The cloud bill was obvious, monthly, and impolite enough to keep turning up. On-premises looked calmer. Hardware could be amortized. Rack space, power, and bandwidth could be priced. With a bit of care, the whole thing could be made to resemble prudence.

This is where many repatriation plans become dangerously persuasive. The cloud is cast as an extravagant landlord. On-premises is presented as the mature decision to stop renting and finally buy the house.

Unfortunately, a data center is not a house. It is closer to owning a very large hotel whose plumbing, wiring, keys, security, fire precautions, laundry, and unexpected midnight incidents are all your responsibility, except the guests are servers and none of them leave a tip.

The spreadsheet had done a decent job of pricing the obvious things. Hardware. Colocation space. Power. Connectivity.

What was priced badly were all the dull, expensive capabilities that public cloud tends to bundle into the bill. Managed failover. Backup automation. Key rotation. Elastic capacity. Security controls. Compliance support. Monitoring that does not depend on a specific engineer being awake, available, and emotionally prepared.

What looked like cloud excess turned out to include a great deal of cloud competence.

That distinction matters.

A large cloud bill is easy to resent because it is visible. Operational competence is harder to resent because it tends to be hidden in the walls.

What the cloud had been doing all along

One of the costliest mistakes in infrastructure is confusing convenience with fluff.

A managed database can look expensive right up to the moment you have to build and test failover yourself, define recovery objectives, handle maintenance windows, rotate credentials, validate backups, and explain to auditors why one awkward part of the process still depends on a human remembering to do something after lunch.

A content delivery network may seem like a luxury until you try to reproduce low-latency delivery, edge caching, certificate handling, resilience, and attack mitigation with a mixture of hardware, internal effort, procurement delays, and hope.

The company, in this case, had not really been paying AWS only for compute and storage. It had been paying AWS to absorb a long list of repetitive operational chores, specialized platform decisions, and uncomfortable edge cases.

Once those chores came back in-house, they did not return politely.

Redundancy stopped being a feature and became a budget line, followed by an implementation plan, followed by a maintenance burden. Security controls that had once been inherited now had to be selected, deployed, documented, checked, and defended. Compliance work that had once been partly automated became a steady stream of evidence gathering, procedural discipline, and administrative repetition.

Cloud bills can look high. So can plumbing. You only discover its emotional value when it stops working.

The talent tax

The easiest part of moving on premises is buying equipment.

The harder part is finding enough people who know how to run the surrounding world properly.

Cloud expertise is now common enough that many companies can hire engineers comfortable with infrastructure as code, IAM, managed services, container platforms, observability, autoscaling, and cost controls. Strong cloud engineers are not cheap, but they are at least visible in the market.

Deep on-premises expertise is another matter. People who are strong in storage, backup infrastructure, virtualization, physical networking, hardware lifecycle, and operational recovery still exist, but they are not standing about in large numbers waiting to be discovered. They are experienced, expensive, and often well aware of their market value.

There is also a cultural issue that rarely appears in repatriation slide decks. A great many engineers would rather write Terraform than troubleshoot a hardware issue under unflattering lighting at two in the morning. This is not a moral failure. It is simple market gravity. The industry has spent years abstracting away routine infrastructure pain because abstraction is usually a better use of skilled human attention.

The partner who told me this story was particularly clear on this point. The staffing line looked manageable in planning. In practice, it turned into one of the most stubborn and underestimated parts of the whole effort.

Cloud is not cheap because expertise is cheap. Cloud is often cheaper because rebuilding enough expertise inside one company is very expensive.

Why does utilization lie so beautifully

Projected utilization is one of those numbers that becomes more charming the less time it spends near reality.

Many repatriation models assume that servers will be well used, capacity will be planned sensibly, and waste will be modest. It sounds disciplined. Responsible, even.

Real workloads behave less like equations and more like kitchens during a family gathering. There are quiet periods, sudden rushes, abandoned experiments, quarter-end panics, new projects that arrive with urgency and no warning, and services no one remembers until they break.

Elasticity is not a decorative feature added by cloud providers to justify themselves. It is one of the main ways organizations avoid buying for peak demand and then spending the rest of the year paying for machinery to sit about waiting.

Without elasticity, you provision for the busiest day and fund the silence in between.

Silence, in infrastructure, is expensive.

A half-used on-premises platform still consumes power, occupies space, demands maintenance, requires patching, and waits patiently for a workload spike that visits only now and then. Spare capacity has excellent manners. It makes no fuss. It simply eats money quietly and on schedule.

This was one of the turning points in the story I heard. Forecast utilization turned out to be far more flattering than actual utilization. Once that happened, the economics began to sag under their own good intentions.

The cost of becoming slower

Traditional total-cost comparisons handle direct spending reasonably well. They are much worse at pricing lost momentum.

When a company runs on a large cloud platform, it does not merely rent infrastructure. It also gains access to a constant flow of improvements and options. Better analytics tools. New security integrations. Managed AI services. Identity features. Database capabilities. Deployment patterns. Networking enhancements. Observability tooling.

No single addition changes everything overnight. The effect is cumulative. It is a thousand small conveniences arriving over time and sparing teams from having to rebuild ordinary civilization every quarter.

An on-premises platform can be stable and well run. For the right workloads, that may be perfectly acceptable. But it does not evolve at the pace of a hyperscaler. Upgrades become projects. New capabilities require procurement, testing, staffing, and patience. The platform becomes more careful and, usually, slower.

That slower pace does not always show up neatly in a spreadsheet, but engineers feel it almost immediately.

While competitors are experimenting with new managed services or shipping new capabilities faster, the repatriated organization may be spending its time improving backup procedures, standardizing tools, negotiating maintenance arrangements, or replacing hardware that has chosen an inconvenient moment to become philosophical.

There is nothing glamorous about that. There is also nothing free about it.

Who should actually consider on-premises

None of this means on-premises is foolish.

That would be a lazy conclusion, and lazy conclusions are where expensive architecture plans begin.

For some organizations, on-premises remains entirely reasonable. It makes sense for highly predictable workloads with very little variability. It can make sense in tightly regulated environments where legal, sovereignty, or operational constraints sharply limit the use of public cloud. And at a very large scale, some organizations genuinely can justify building substantial parts of their own platform.

But most companies tempted by repatriation are not in that category.

They are not hyperscalers. They are not all running flat, perfectly predictable workloads. They are not all boxed in by constraints that make public cloud impossible. More often, they are reacting to a painful cloud bill caused by weak cost governance, poor workload fit, loose architecture discipline, or a lack of serious FinOps.

That is a very different problem.

Leaving AWS because you are using AWS badly is a bit like selling your refrigerator because the groceries keep going off while the door is open. The appliance may not be the heart of the matter.

The middle ground companies skip past

One of the stranger features of cloud debates is how quickly they become binary.

Either remain in public cloud forever, or march solemnly back to racks and cages as if returning to a lost ancestral craft.

There is, of course, a middle ground.

Some workloads do benefit from local placement because of latency, residency, plant integration, or operational constraints. But needing hardware closer to the ground does not automatically mean rebuilding the entire service model from scratch. The more useful question is often not whether the hardware should be local, but whether the control plane, automation model, and day-to-day operations should still feel cloud-like.

That is a much more practical conversation.

A company may need some infrastructure nearby while still gaining enormous value from managed identity, familiar APIs, consistent automation, and operational patterns learned in the cloud. This tends to sound less heroic than a full repatriation story, but heroism is not a particularly reliable basis for infrastructure strategy.

The partner who described this case said as much. If they had explored the middle road earlier, they might have kept the local advantages they wanted without assuming quite so much of the surrounding operational burden.

What a real repatriation audit should include

Any company seriously considering a move off AWS should pause long enough to perform an audit that is a little less enchanted by ownership.

Start with the full cloud picture, not just the line items everyone enjoys complaining about. Include engineering effort, compliance automation, security services, platform speed, operational overhead, and the cost of scaling quickly when demand changes.

Then build the on-premises model with uncommon honesty. Price round-the-clock operations. Price redundancy properly. Price backup and recovery as if they matter, because they do. Price refresh cycles, maintenance contracts, spare capacity, patching, testing, physical security, audit evidence, and the awkward certainty that hardware fails when it is least convenient.

Then ask a cultural question, not just a financial one. How many of your engineers actually want to spend more of their time dealing with the physical stack and the operational plumbing that comes with it?

That answer matters more than many executives would like.

A strategy that looks cheaper on paper but nudges your best engineers toward the door is not, in any meaningful sense, cheaper.

Finally, compare repatriation not only against your current cloud bill, but against what a disciplined cloud optimization program could achieve. Rightsizing, storage improvements, better instance strategy, autoscaling discipline, reserved capacity planning, architecture cleanup, and proper FinOps can all change the economics without requiring anyone to rediscover the intimate emotional texture of broken hardware.

The bill behind the bill

What has stayed with me about this story is that it was never really a story about AWS.

It was a story about accounting for the wrong thing.

The visible bill was treated as the entire problem. The hidden work behind the bill was treated as background scenery. Once the company moved off AWS, the scenery walked to the front of the stage and began sending invoices.

That is the trap.

Cloud can absolutely be expensive. Plenty of organizations run it badly and pay for the privilege. But on-premises is not automatically the sober adult in the room. Quite often, it is simply a different payment model, one that hides more of the cost in staffing, slower delivery, operational fragility, maintenance overhead, and all the unlovely little chores that cloud platforms had been taking care of out of sight.

The lesson from this case was not that every workload belongs in AWS forever. It was that infrastructure decisions become dangerous when they are made in reaction to irritation rather than in response to a full economic picture.

Leaving the cloud may still be the right answer for some organizations. For many others, the more useful answer is much less theatrical. Use the cloud better. Govern it better. Design it properly. Understand what you are paying for before deciding you would prefer to rebuild it yourself.

A large monthly cloud bill can be offensive to look at.

The bill that arrives after a bad attempt to escape it is usually less offensive than heartbreaking.

And heartbreak, unlike EC2, rarely comes with autoscaling.

March 30, 2026 by Fernando SRE Cloud stuff Computer Science stuff

That awkward moment when On-Prem is cheaper

Let’s be honest. For the better part of a decade, the public cloud has been the charismatic, free-spending friend who gets you out of any jam. Need to throw a last-minute party for a million users? They’ve got the hardware. Need to scale an app overnight? They’re already warming up the car. It was fast, it was elastic, and it saved you from the tedious, greasy work of racking your own servers. The only price was a casual, “You can pay me back later.”

Well, it’s later. The bill has arrived, and it has more cryptic line items than a forgotten ancient language. The finance department is calling, and they don’t sound happy.

This isn’t an angry stampede for the exits. Nobody is burning their AWS credits in protest. It’s more of a pragmatic reshuffle, a collective moment of clarity. Teams are looking at their sprawling digital estates and asking a simple question: Does everything really need to live in this expensive, all-inclusive resort? The result is a new normal where the cloud is still essential, just not universal.

The financial hangover

The cloud is wonderfully elastic. Elastic things, by their nature, bounce. So do monthly statements. Teams that scaled at lightning speed are now waking up to a familiar financial hangover with four distinct symptoms. First, there’s the billing complexity. Your monthly invoice isn’t a bill; it’s a mystery novel written by a sadist. Thousands of line items, tiered pricing, and egress charges transform the simple act of “moving data” into a budget-devouring monster.

-- A query that looks innocent but costs a fortune in data egress
SELECT
    event_id,
    user_id,
    payload
FROM
    user_events_production.events_archive
WHERE
    event_date BETWEEN '2025-07-01' AND '2025-07-31'
    AND region != 'eu-central-1'; -- Oh, you wanted to move 5TB across continents? That'll be extra.

Second is the unpredictable demand. A few busy weeks, a successful marketing campaign, or a minor viral event can undo months of careful savings plans. You budget for a quiet month, and suddenly you’re hosting the Super Bowl.

Then come the hidden multipliers. These are the gremlins of your infrastructure. Tiny, seemingly insignificant charges for cross-AZ traffic, managed service premiums, and per-request pricing that quietly multiply in the dark, feasting on your budget.

Finally, there’s the convenience tax. You paid a premium to turn the pain of operations into someone else’s problem. But for workloads that are steady, predictable, and bandwidth-heavy, that convenience starts to look suspiciously like setting money on fire. Those workloads are starting to look much cheaper on hardware you own or lease, where capital expenditure and depreciation replace the tyranny of per-hour-everything.

The gilded cage of convenience

Cloud providers don’t lock you in with malice. They seduce you with helpfulness. They offer a proprietary database so powerful, an event bus so seamless, an identity layer so integrated that before you know it, your application is woven into the very fabric of their ecosystem.

Leaving isn’t a migration; it’s a full-scale renovation project. It’s like living in a luxury hotel. They don’t forbid you from leaving, but once you’re used to the 24/7 room service, are you really going to go back to cooking for yourself?

Faced with this gilded cage, smart teams are now insisting on a kind of technological prenuptial agreement. It’s not about a lack of trust; it’s about preserving future freedom. Where practical, they prefer:

Open databases or engines with compatible wire protocols.
Kubernetes with portable controllers over platform-specific orchestration.
OpenTelemetry for metrics and traces that can travel.
Terraform or Crossplane to describe infrastructure in a way that isn’t tied to one vendor.

This isn’t purity theater. It simply reduces the penalty for changing your mind later.

# A portable infrastructure module
# It can be pointed at AWS, GCP, or even an on-prem vSphere cluster
# with the right provider.

resource "kubernetes_namespace" "app_namespace" {
  metadata {
    name = "my-awesome-app"
  }
}

resource "helm_release" "app_database" {
  name       = "app-postgres"
  repository = "https://charts.bitnami.com/bitnami"
  chart      = "postgresql"
  namespace  = kubernetes_namespace.app_namespace.metadata[0].name

  values = [
    "${file("values/postgres-prod.yaml")}"
  ]
}

A new menu of choices

The choice is no longer just between a hyperscaler and a dusty server cupboard under the stairs. The menu has expanded:

Private cloud: Using platforms like OpenStack or Kubernetes on bare metal in a modern colocation facility.
Alternative clouds: A growing number of providers are offering simpler pricing and less lock-in.
Hybrid models: Keeping sensitive data close to home while bursting to public regions for peak demand.
Edge locations: For workloads that need to be physically close to users and hate round-trip latency.

The point isn’t to flee the public cloud. The point is workload fitness. You wouldn’t wear hiking boots to a wedding, so why run a predictable, data-heavy analytics pipeline on a platform optimized for spiky, uncertain web traffic?

A personality test for your workload

So, how do you decide what stays and what goes? You don’t need a crystal ball. You just need to give each workload a quick personality test. Ask these six questions:

Is its demand mostly steady or mostly spiky? Is it a predictable workhorse or a temperamental rock star?
Is its data large and chatty or small and quiet?
Is latency critical? Does it need instant responses or is a few dozen milliseconds acceptable?
Are there strict data residency or compliance rules?
Does it rely on a proprietary managed service that would be a nightmare to replace?
Can we measure its unit economics? Do we know the cost per request, per user, or per gigabyte processed?

Steady and heavy often wins on owned or leased hardware. Spiky and uncertain still loves the elasticity of the hyperscalers. Regulated and locality-bound prefer the control of a private or hybrid setup. And if a workload gets its superpowers from a proprietary managed service, you either keep it where its powers live or make peace with a less super version of your app.

What does this mean for you, Architect

If you’re a DevOps engineer or a Cloud Architect, congratulations. Your job description just grew a new wing. You are no longer just a builder of digital infrastructure; you are now part financial planner, part supply chain expert, and part treaty negotiator.

Your playbook now includes:

FinOps literacy: The ability to connect design choices to money in a way the business understands and trusts.
Portability patterns: Designing services that can move without a complete rewrite.
Hybrid networking: Weaving together different environments without creating a haunted house of routing tables and DNS entries.
Observability without borders: Using vendor-neutral signals to see what’s happening from end to end.
Procurement fluency: The skill to make apples-to-apples comparisons between amortized hardware, managed services, and colocation contracts.

Yes, it’s time to carry a pocket calculator again, at least metaphorically.

The unsexy path to freedom

The journey back from the cloud is paved with unglamorous but essential work. It’s not a heroic epic; it’s a series of small, carefully planned steps. The risks are real. You have to account for the people cost of patching and maintaining private platforms, the lead times for hardware, and the shadow dependencies on convenient features you forgot you were using.

The antidote is small steps, honest metrics, and boringly detailed runbooks. Start with a proof-of-concept, create a migration plan that moves slices, not the whole cake, and have rollback criteria that a non-engineer can understand.

This is just a course correction

The Great Cloud Exit is less a rebellion and more a rationalization. Think of it as finally cleaning out your closet after a decade-long shopping spree. The public cloud gave us a phenomenal decade of speed, and we bought one of everything. Now, we’re sorting through the pile. That spiky, unpredictable web service? It still looks great in the elastic fabric of a hyperscaler. That massive, steady-state analytics database? It’s like a heavy wool coat that was never meant for the tropics; it’s time to move it to a more suitable climate, like your own data center. And that experimental service you spun up in 2019 and forgot about? That’s the impulse buy sequin jacket you’re never going to wear. Time to donate it.

Treating workload placement as a design problem instead of a loyalty test is liberating. It’s admitting you don’t need a Swiss Army knife when all you have to do is turn a single screw. Choosing the right environment for the job results in a system that costs less and complains less. It performs better because it’s not being forced to do something it was never designed for.

This leads to the most important outcome: options. In a landscape that changes faster than we can update our résumés, flexibility is the only superpower that truly matters. The ability to move, to adapt, and to choose without facing a punishing exit fee or a six-month rewrite, that’s the real prize. The cloud isn’t the destination anymore; it’s just one very useful stop on the map.