FinOps

Why your AWS bill secretly hates Graviton

The party always ends when the bill arrives.

Your team ships a brilliant release. The dashboards glow a satisfying, healthy green. The celebratory GIFs echo through the Slack channels. For a few glorious days, you are a master of the universe, a conductor of digital symphonies.

And then it shows up. The AWS invoice doesn’t knock. It just appears in your inbox with the silent, judgmental stare of a Victorian governess who caught you eating dessert before dinner. You shipped performance, yes. You also shipped a small fleet of x86 instances that are now burning actual, tangible money while you sleep.

Engineers live in a constant tug-of-war between making things faster and making them cheaper. We’re told the solution is another coupon code or just turning off a few replicas over the weekend. But real, lasting savings don’t come from tinkering at the edges. They show up when you change the underlying math. In the world of AWS, that often means changing the very silicon running the show.

Enter a family of servers that look unassuming on the console but quietly punch far above their weight. Migrate the right workloads, and they do the same work for less money. Welcome to AWS Graviton.

What is this Graviton thing anyway?

Let’s be honest. The first time someone says “ARM-based processor,” your brain conjures images of your phone, or maybe a high-end Raspberry Pi. The immediate, skeptical thought is, “Are we really going to run our production fleet on that?”

Well, yes. And it turns out that when you own the entire datacenter, you can design a chip that’s ridiculously good at cloud workloads, without the decades of baggage x86 has been carrying around. Switching to Graviton is like swapping that gas-guzzling ’70s muscle car for a sleek, silent electric skateboard that somehow still manages to tow your boat. It feels wrong… until you see your fuel bill. You’re swapping raw, hot, expensive grunt for cool, cheap efficiency.

Amazon designed these chips to optimize the whole stack, from the physical hardware to the hypervisor to the services you click on. This control means better performance-per-watt and, more importantly, a better price for every bit of work you do.

The lineup is simple:

  • Graviton2: The reliable workhorse. Great for general-purpose and memory-hungry tasks.
  • Graviton3: The souped-up model. Faster cores, better at cryptography, and sips memory bandwidth through a wider straw.
  • Graviton3E: The specialist. Tuned for high-performance computing (HPC) and anything that loves vector math.

This isn’t some lab experiment. Graviton is already powering massive production fleets. If your stack includes common tools like NGINX, Redis, Java, Go, Node.js, Python, or containers on ECS or EKS, you’re already walking on paved roads.

The real numbers behind the hype

The headline from AWS is tantalizing. “Up to 40 percent better price-performance.” “Up to,” of course, are marketing’s two favorite words. It’s the engineering equivalent of a dating profile saying they enjoy “adventures.” It could mean anything.

But even with a healthy dose of cynicism, the trend is hard to ignore. Your mileage will vary depending on your code and where your bottlenecks are, but the gains are real.

Here’s where teams often find the gold:

  • Web and API services: Handling the same requests per second at a lower instance cost.
  • CI/CD Pipelines: Faster compile times for languages like Go and Rust on cheaper build runners.
  • Data and Streaming: Popular engines like NGINX, Envoy, Redis, Memcached, and Kafka clients run beautifully on ARM.
  • Batch and HPC: Heavy computational jobs get a serious boost from the Graviton3E chips.

There’s also a footprint bonus. Better performance-per-watt means you can hit your ESG (Environmental, Social, and Governance) goals without ever having to create a single sustainability slide deck. A win for engineering, a win for the planet, and a win for dodging boring meetings.

But will my stuff actually run on it?

This is the moment every engineer flinches. The suggestion of “recompiling for ARM” triggers flashbacks to obscure linker errors and a trip down dependency hell.

The good news? The water’s fine. For most modern workloads, the transition is surprisingly anticlimactic. Here’s a quick compatibility scan:

  • You compile from source or use open-source software? Very likely portable.
  • Using closed-source agents or vendor libraries? Time to do some testing and maybe send a polite-but-firm support ticket.
  • Running containers? Fantastic. Multi-architecture images are your new best friend.
  • What about languages? Java, Go, Node.js, .NET 6+, Python, Ruby, and PHP are all happy on ARM on Linux.
  • C and C++? Just recompile and link against ARM64 libraries.

The easiest first wins are usually stateless services sitting behind a load balancer, sidecars like log forwarders, or any kind of queue worker where raw throughput is king.

A calm path to migration

Heroic, caffeine-fueled weekend migrations are for rookies. A calm, boring checklist is how professionals do it.

Phase 1: Test in a safe place

Launch a Graviton sibling of your current instance family (e.g., a c7g.large instead of a c6i.large). Replay production traffic to it or run your standard benchmarks. Compare CPU utilization, latency, and error rates. No surprises allowed.

Phase 2: Build for both worlds

It’s time to create multi-arch container images. docker buildx is the tool for the job. This command builds an image for both chip architectures and pushes them to your registry under a single tag.

# Build and push an image for both amd64 and arm64 from one command
docker buildx build \
  --platform linux/amd64,linux/arm64 \
  --tag $YOUR_ACCOUNT.dkr.ecr.$[REGION.amazonaws.com/my-web-app:v1.2.3](https://REGION.amazonaws.com/my-web-app:v1.2.3) \
  --push .

Phase 3: Canary and verify

Slowly introduce the new instances. Route just 5% of traffic to the Graviton pool using weighted target groups. Stare intently at your dashboards. Your “golden signals”, latency, traffic, errors, and saturation, should look identical across both pools.

Here’s a conceptual Terraform snippet of what that weighting looks like:

resource "aws_lb_target_group" "x86_pool" {
  name     = "my-app-x86-pool"
  # ... other config
}

resource "aws_lb_target_group" "arm_pool" {
  name     = "my-app-arm-pool"
  # ... other config
}

resource "aws_lb_listener_rule" "weighted_routing" {
  listener_arn = aws_lb_listener.frontend.arn
  priority     = 100

  action {
    type = "forward"

    forward {
      target_group {
        arn    = aws_lb_target_group.x86_pool.arn
        weight = 95
      }
      target_group {
        arn    = aws_lb_target_group.arm_pool.arn
        weight = 5
      }
    }
  }

  condition {
    path_pattern {
      values = ["/*"]
    }
  }
}

Phase 4: Full rollout with a parachute

If the canary looks healthy, gradually increase traffic: 25%, 50%, then 100%. Keep the old x86 pool warm for a day or two, just in case. It’s your escape hatch. Once it’s done, go show the finance team the new, smaller bill. They love that.

Common gotchas and easy fixes

Here are a few fun ways to ruin your Friday afternoon, and how to avoid them.

  • The sneaky base image: You built your beautiful ARM application… on an x86 foundation. Your FROM amazonlinux:2023 defaulted to the amd64 architecture. Your container dies instantly. The fix: Explicitly pin your base images to an ARM64 version, like FROM –platform=linux/arm64 public.ecr.aws/amazonlinux/amazonlinux:2023.
  • The native extension puzzle: Your Python, Ruby, or Node.js app fails because a native dependency couldn’t be built. The fix: Ensure you’re building on an ARM machine or using pre-compiled manylinux wheels that support aarch64.
  • The lagging agent: Your favorite observability tool’s agent doesn’t have an official ARM64 build yet. The fix: Check if they have a containerized version or gently nudge their support team. Most major vendors are on board now.

A shift in mindset

For decades, we’ve treated the processor as a given, an unchangeable law of physics in our digital world. The x86 architecture was simply the landscape on which we built everything. Graviton isn’t just a new hill on that landscape; it’s a sign the tectonic plates are shifting beneath our feet. This is more than a cost-saving trick; it’s an invitation to question the expensive assumptions we’ve been living with for years.

You don’t need a degree in electrical engineering to benefit from this, though it might help you win arguments on Hacker News. All you really need is a healthy dose of professional curiosity and a good benchmark script.

So here’s the experiment. Pick one of your workhorse stateless services, the ones that do the boring, repetitive work without complaining. The digital equivalent of a dishwasher. Build a multi-arch image for it. Cordon off a tiny, five-percent slice of your traffic and send it to a Graviton pool. Then, watch. Treat your service like a lab specimen. Don’t just glance at the CPU percentage; analyze the cost-per-million-requests. Scrutinize the p99 latency.

If the numbers tell a happy story, you haven’t just tweaked a deployment. You’ve fundamentally changed the economics of that service. You’ve found a powerful new lever to pull. If they don’t, you’ve lost a few hours and gained something more valuable: hard data. You’ve replaced a vague “what if” with a definitive “we tried that.”

Either way, you’ve sent a clear message to that smug monthly invoice. You’re paying attention. And you’re getting smarter. Doing the same work for less money isn’t a stunt. It’s just good engineering.

The silent bill killers lurking in your Terraform state

The first time I heard the term “sustainability smell,” I rolled my eyes. It sounded like a fluffy marketing phrase dreamed up to make cloud infrastructure sound as wholesome as a farmers’ market. Eco-friendly Terraform? Right. Next, you’ll tell me my data center is powered by happy thoughts and unicorn tears.

But then it clicked. The term wasn’t about planting trees with every terraform apply. It was about that weird feeling you get when you open a legacy repository. It’s the code equivalent of opening a Tupperware container you found in the back of the fridge. You don’t know what’s inside, but you’re pretty sure it’s going to be unpleasant.

Turns out, I’d been smelling these things for years without knowing what to call them. According to HashiCorp’s 2024 survey, a staggering 70% of infrastructure teams admit to over-provisioning resources. It seems we’re all building mansions for guests who never arrive. That, my friend, is the smell. It’s the scent of money quietly burning in the background.

What exactly is that funny smell in my code

A “sustainability smell” isn’t a bug. It won’t trigger a PagerDuty alert at 3 AM. It’s far more insidious. It’s a bad habit baked into your Terraform configuration that silently drains your budget and makes future maintenance a soul-crushing exercise in digital archaeology.

The most common offender is the legendary main.tf file that looks more like an epic novel. You know the one. It’s a sprawling, thousand-line behemoth where VPCs, subnets, ECS clusters, IAM roles, and that one S3 bucket from a forgotten 2021 proof-of-concept all live together in chaotic harmony. Trying to change one small thing in that file is like playing Jenga with a live grenade. You pull out one block, and suddenly three unrelated services start weeping.

I’ve stumbled through enough of these digital haunted houses to recognize the usual ghosts:

  • The over-provisioned powerhouse: An RDS instance with enough horsepower to manage the entire New York Stock Exchange, currently tasked with serving a blog that gets about ten visits a month. Most of them are from the author’s mom.
  • The zombie load balancer: Left behind after a one-off traffic spike, it now spends its days blissfully idle, forwarding zero traffic but diligently charging your account for the privilege of existing.
  • Hardcoded horrors: Instance sizes and IP addresses sprinkled directly into the code like cheap confetti. Need to scale? Good luck. You’ll be hunting down those values for the rest of the week.
  • The phantom snapshot: That old EBS snapshot you swore you deleted. It’s still there, lurking in the dark corners of your AWS account, accumulating charges with the quiet persistence of a glacier.

The silent killers that sink your budget

Let’s be honest, no one’s idea of a perfect Friday afternoon involves becoming a private investigator whose only client is a rogue t3.2xlarge instance that went on a very expensive vacation without permission. It’s tempting to just ignore it. It’s just one instance, right?

Wrong. These smells are the termites of your cloud budget. You don’t notice them individually, but they are silently chewing through your financial foundations. That “tiny” overcharge joins forces with its zombie friends, and suddenly your bill isn’t just creeping up; it’s sprinting.

But the real horror is for the next person who inherits your repo. They were promised the Terraform dream: a predictable, elegant blueprint. Instead, they get a haunted house. Every terraform apply becomes a jump scare, a game of Russian roulette where they pray they don’t awaken some ancient, costly beast.

Becoming a cloud cost detective

So, how do you hunt these ghosts? While tools like Checkov, tfsec, and terrascan are your trusty guard dogs, they’ll bark if you leave the front door wide open; they won’t notice that you’re paying the mortgage on a ten-bedroom mansion when you only live in the garage. For that, you need to do some old-fashioned detective work.

My ghost-hunting toolkit is simple:

  1. Cross-Reference with reality: Check your declared instance sizes against their actual usage in CloudWatch. If your CPU utilization has been sitting at a Zen-like 2% for the past six months, you have a prime suspect.
  2. Befriend the terraform plan command: Run it often. Run it before you even think about changing code. Treat it like a paranoid glance over your shoulder. It’s your best defense against unintended consequences.
  3. Dig for treasure in AWS cost explorer: This is where the bodies are buried. Filter by service, by tag (you are tagging everything, right?), and look for the quiet, consistent charges. That weird $30 “other” charge that shows up every month? I’ve been ambushed by forgotten Route 53 hosted zones more times than I care to admit.

Your detective gadgets

Putting your budget directly into your code is a power move. It’s like putting a security guard inside the bank vault.

Here’s an aws_budgets_budget resource that will scream at you via SNS if you start spending too frivolously on your EC2 instances.

resource "aws_budgets_budget" "ec2_spending_cap" {
  name         = "budget-ec2-monthly-limit"
  budget_type  = "COST"
  limit_amount = "250.0"
  limit_unit   = "USD"
  time_unit    = "MONTHLY"

  cost_filters = {
    Service = ["Amazon Elastic Compute Cloud - Compute"]
  }

  notification {
    comparison_operator        = "GREATER_THAN"
    threshold                  = 80
    threshold_type             = "PERCENTAGE"
    notification_type          = "FORECASTED"
    subscriber_sns_topic_arns = [aws_sns_topic.budget_alerts.arn]
  }

  notification {
    comparison_operator        = "GREATER_THAN"
    threshold                  = 100
    threshold_type             = "PERCENTAGE"
    notification_type          = "ACTUAL"
    subscriber_sns_topic_arns = [aws_sns_topic.budget_alerts.arn]
  }
}

resource "aws_sns_topic" "budget_alerts" {
  name = "budget-alert-topic"
}

And for those phantom snapshots? Perform an exorcism with lifecycle rules. This little block of code tells S3 to act like a self-cleaning oven.

resource "aws_s3_bucket" "log_archive" {
  bucket = "my-app-log-archive-bucket"

  lifecycle_rule {
    id      = "log-retention-policy"
    enabled = true

    # Move older logs to a cheaper storage class
    transition {
      days          = 30
      storage_class = "STANDARD_IA"
    }

    # And then get rid of them entirely after a year
    expiration {
      days = 365
    }
  }
}

An exorcist’s guide to cleaner code

You can’t eliminate smells forever, but you can definitely keep them from taking over your house. There’s no magic spell, just a few simple rituals:

  1. Embrace modularity: Stop building monoliths. Break your infrastructure into smaller, logical modules. It’s the difference between remodeling one room and having to rebuild the entire house just to change a light fixture.
  2. Variables are your friends: Hardcoding an instance size is a crime against your future self. Use variables. It’s a tiny effort now that saves you a world of pain later.
  3. Tag everything. No, really: Tagging feels like a chore, but it’s a lifesaver. When you’re hunting for the source of a mysterious charge, a good tagging strategy is your map and compass. Tag by project, by team, by owner, heck, tag it with your favorite sandwich. Just tag it.
  4. Schedule a cleanup day: If it’s not on the calendar, it doesn’t exist. Dedicate a few hours every quarter to go ghost-hunting. Review idle resources, question oversized instances, and delete anything that looks dusty.

Your Terraform code is the blueprint for your infrastructure. And just like a real blueprint, any coffee stains, scribbled-out notes, or vague “we’ll figure this out later” sections get built directly into the final structure. If the plan calls for gold-plated plumbing in a closet that will never be used, that’s exactly what you’ll get. And you’ll pay for it. Every single month. These smells aren’t the spectacular, three-alarm fires that get everyone’s attention. They’re the slow, silent drips from a faucet in the basement. It’s just a dollar here for a phantom snapshot, five dollars there for an oversized instance. It’s nothing, right? But leave those drips unchecked long enough, and you don’t just get a high water bill. You come back to find you’ve cultivated a thriving mold colony and the floorboards are suspiciously soft. Ultimately, a clean repository isn’t just about being tidy. It’s about financial hygiene. So go on, open up that old repo. Be brave. The initial smell might be unpleasant, but it’s far better than the stench of a budget that has mysteriously evaporated into thin air.

That awkward moment when On-Prem is cheaper

Let’s be honest. For the better part of a decade, the public cloud has been the charismatic, free-spending friend who gets you out of any jam. Need to throw a last-minute party for a million users? They’ve got the hardware. Need to scale an app overnight? They’re already warming up the car. It was fast, it was elastic, and it saved you from the tedious, greasy work of racking your own servers. The only price was a casual, “You can pay me back later.”

Well, it’s later. The bill has arrived, and it has more cryptic line items than a forgotten ancient language. The finance department is calling, and they don’t sound happy.

This isn’t an angry stampede for the exits. Nobody is burning their AWS credits in protest. It’s more of a pragmatic reshuffle, a collective moment of clarity. Teams are looking at their sprawling digital estates and asking a simple question: Does everything really need to live in this expensive, all-inclusive resort? The result is a new normal where the cloud is still essential, just not universal.

The financial hangover

The cloud is wonderfully elastic. Elastic things, by their nature, bounce. So do monthly statements. Teams that scaled at lightning speed are now waking up to a familiar financial hangover with four distinct symptoms. First, there’s the billing complexity. Your monthly invoice isn’t a bill; it’s a mystery novel written by a sadist. Thousands of line items, tiered pricing, and egress charges transform the simple act of “moving data” into a budget-devouring monster.

-- A query that looks innocent but costs a fortune in data egress
SELECT
    event_id,
    user_id,
    payload
FROM
    user_events_production.events_archive
WHERE
    event_date BETWEEN '2025-07-01' AND '2025-07-31'
    AND region != 'eu-central-1'; -- Oh, you wanted to move 5TB across continents? That'll be extra.

Second is the unpredictable demand. A few busy weeks, a successful marketing campaign, or a minor viral event can undo months of careful savings plans. You budget for a quiet month, and suddenly you’re hosting the Super Bowl.

Then come the hidden multipliers. These are the gremlins of your infrastructure. Tiny, seemingly insignificant charges for cross-AZ traffic, managed service premiums, and per-request pricing that quietly multiply in the dark, feasting on your budget.

Finally, there’s the convenience tax. You paid a premium to turn the pain of operations into someone else’s problem. But for workloads that are steady, predictable, and bandwidth-heavy, that convenience starts to look suspiciously like setting money on fire. Those workloads are starting to look much cheaper on hardware you own or lease, where capital expenditure and depreciation replace the tyranny of per-hour-everything.

The gilded cage of convenience

Cloud providers don’t lock you in with malice. They seduce you with helpfulness. They offer a proprietary database so powerful, an event bus so seamless, an identity layer so integrated that before you know it, your application is woven into the very fabric of their ecosystem.

Leaving isn’t a migration; it’s a full-scale renovation project. It’s like living in a luxury hotel. They don’t forbid you from leaving, but once you’re used to the 24/7 room service, are you really going to go back to cooking for yourself?

Faced with this gilded cage, smart teams are now insisting on a kind of technological prenuptial agreement. It’s not about a lack of trust; it’s about preserving future freedom. Where practical, they prefer:

  • Open databases or engines with compatible wire protocols.
  • Kubernetes with portable controllers over platform-specific orchestration.
  • OpenTelemetry for metrics and traces that can travel.
  • Terraform or Crossplane to describe infrastructure in a way that isn’t tied to one vendor.

This isn’t purity theater. It simply reduces the penalty for changing your mind later.

# A portable infrastructure module
# It can be pointed at AWS, GCP, or even an on-prem vSphere cluster
# with the right provider.

resource "kubernetes_namespace" "app_namespace" {
  metadata {
    name = "my-awesome-app"
  }
}

resource "helm_release" "app_database" {
  name       = "app-postgres"
  repository = "https://charts.bitnami.com/bitnami"
  chart      = "postgresql"
  namespace  = kubernetes_namespace.app_namespace.metadata[0].name

  values = [
    "${file("values/postgres-prod.yaml")}"
  ]
}

A new menu of choices

The choice is no longer just between a hyperscaler and a dusty server cupboard under the stairs. The menu has expanded:

  • Private cloud: Using platforms like OpenStack or Kubernetes on bare metal in a modern colocation facility.
  • Alternative clouds: A growing number of providers are offering simpler pricing and less lock-in.
  • Hybrid models: Keeping sensitive data close to home while bursting to public regions for peak demand.
  • Edge locations: For workloads that need to be physically close to users and hate round-trip latency.

The point isn’t to flee the public cloud. The point is workload fitness. You wouldn’t wear hiking boots to a wedding, so why run a predictable, data-heavy analytics pipeline on a platform optimized for spiky, uncertain web traffic?

A personality test for your workload

So, how do you decide what stays and what goes? You don’t need a crystal ball. You just need to give each workload a quick personality test. Ask these six questions:

  1. Is its demand mostly steady or mostly spiky? Is it a predictable workhorse or a temperamental rock star?
  2. Is its data large and chatty or small and quiet?
  3. Is latency critical? Does it need instant responses or is a few dozen milliseconds acceptable?
  4. Are there strict data residency or compliance rules?
  5. Does it rely on a proprietary managed service that would be a nightmare to replace?
  6. Can we measure its unit economics? Do we know the cost per request, per user, or per gigabyte processed?

Steady and heavy often wins on owned or leased hardware. Spiky and uncertain still loves the elasticity of the hyperscalers. Regulated and locality-bound prefer the control of a private or hybrid setup. And if a workload gets its superpowers from a proprietary managed service, you either keep it where its powers live or make peace with a less super version of your app.

What does this mean for you, Architect

If you’re a DevOps engineer or a Cloud Architect, congratulations. Your job description just grew a new wing. You are no longer just a builder of digital infrastructure; you are now part financial planner, part supply chain expert, and part treaty negotiator.

Your playbook now includes:

  • FinOps literacy: The ability to connect design choices to money in a way the business understands and trusts.
  • Portability patterns: Designing services that can move without a complete rewrite.
  • Hybrid networking: Weaving together different environments without creating a haunted house of routing tables and DNS entries.
  • Observability without borders: Using vendor-neutral signals to see what’s happening from end to end.
  • Procurement fluency: The skill to make apples-to-apples comparisons between amortized hardware, managed services, and colocation contracts.

Yes, it’s time to carry a pocket calculator again, at least metaphorically.

The unsexy path to freedom

The journey back from the cloud is paved with unglamorous but essential work. It’s not a heroic epic; it’s a series of small, carefully planned steps. The risks are real. You have to account for the people cost of patching and maintaining private platforms, the lead times for hardware, and the shadow dependencies on convenient features you forgot you were using.

The antidote is small steps, honest metrics, and boringly detailed runbooks. Start with a proof-of-concept, create a migration plan that moves slices, not the whole cake, and have rollback criteria that a non-engineer can understand.

This is just a course correction

The Great Cloud Exit is less a rebellion and more a rationalization. Think of it as finally cleaning out your closet after a decade-long shopping spree. The public cloud gave us a phenomenal decade of speed, and we bought one of everything. Now, we’re sorting through the pile. That spiky, unpredictable web service? It still looks great in the elastic fabric of a hyperscaler. That massive, steady-state analytics database? It’s like a heavy wool coat that was never meant for the tropics; it’s time to move it to a more suitable climate, like your own data center. And that experimental service you spun up in 2019 and forgot about? That’s the impulse buy sequin jacket you’re never going to wear. Time to donate it.

Treating workload placement as a design problem instead of a loyalty test is liberating. It’s admitting you don’t need a Swiss Army knife when all you have to do is turn a single screw. Choosing the right environment for the job results in a system that costs less and complains less. It performs better because it’s not being forced to do something it was never designed for.

This leads to the most important outcome: options. In a landscape that changes faster than we can update our résumés, flexibility is the only superpower that truly matters. The ability to move, to adapt, and to choose without facing a punishing exit fee or a six-month rewrite, that’s the real prize. The cloud isn’t the destination anymore; it’s just one very useful stop on the map.

The AWS bill shrank by 70%, but so did my DevOps dignity

Fourteen thousand dollars a month.

That number wasn’t on a spreadsheet for a new car or a down payment on a house. It was the line item from our AWS bill simply labeled “API Gateway.” It stared back at me from the screen, judging my life choices. For $14,000, you could hire a small team of actual gateway guards, probably with very cool sunglasses and earpieces. All we had was a service. An expensive, silent, and, as we would soon learn, incredibly competent service.

This is the story of how we, in a fit of brilliant financial engineering, decided to fire our expensive digital bouncer and replace it with what we thought was a cheaper, more efficient swinging saloon door. It’s a tale of triumph, disaster, sleepless nights, and the hard-won lesson that sometimes, the most expensive feature is the one you don’t realize you’re using until it’s gone.

Our grand cost-saving gamble

Every DevOps engineer has had this moment. You’re scrolling through the AWS Cost Explorer, and you spot it: a service that’s drinking your budget like it’s happy hour. For us, API Gateway was that service. Its pricing model felt… punitive. You pay per request, and you pay for the data that flows through it. It’s like paying a toll for every car and then an extra fee based on the weight of the passengers.

Then, we saw the alternative: the Application Load Balancer (ALB). The ALB was the cool, laid-back cousin. Its pricing was simple, based on capacity units. It was like paying a flat fee for an all-you-can-eat buffet instead of by the gram.

The math was so seductive it felt like we were cheating the system.

We celebrated. We patted ourselves on the back. We had slain the beast of cloud waste! The migration was smooth. Response times even improved slightly, as an ALB adds less overhead than the feature-rich API Gateway. Our architecture was simpler. We had won.

Or so we thought.

When reality crashes the party

API Gateway isn’t just a toll booth. It’s a fortress gate with a built-in, military-grade security detail you get for free. It inspects IDs, pats down suspicious characters, and keeps a strict count of who comes and goes. When we swapped it for an ALB, we essentially replaced that fortress gate with a flimsy screen door. And then we were surprised when the bugs got in.

The trouble didn’t start with a bang. It started with a whisper.

Week 1: The first unwanted visitor. A script kiddie, probably bored in their parents’ basement, decided to poke our new, undefended endpoint with a classic SQL injection probe. It was the digital equivalent of someone jiggling the handle on your front door. With API Gateway, its built-in Web Application Firewall (WAF) would have spotted the malicious pattern and drop-kicked the request into the void before it ever reached our application. With the ALB, the request sailed right through. Our application code was robust enough to reject it, but the fact that it got to knock on the application’s door at all was a bad sign. We dismissed it. “An amateur,” we scoffed.

Week 3: The client who drank too much. A bug in a third-party client integration caused it to get stuck in a loop. A very, very fast loop. It began hammering one of our endpoints with 10,000 requests per second. The servers, bless their hearts, tried to keep up. Alarms started screaming. The on-call engineer, who was attempting to have a peaceful dinner, saw his phone light up like a Christmas tree. API Gateway’s built-in rate limiting would have calmly told the misbehaving client, “You’ve had enough, buddy. Come back later.” It would have throttled the requests automatically. We had to scramble, manually blocking the IP while the system groaned under the strain.

Week 6: The full-blown apocalypse. This was it. The big one. A proper Distributed Denial-of-Service (DDoS) attack. It wasn’t sophisticated, just a tidal wave of garbage traffic from thousands of hijacked machines. Our ALB, true to its name, diligently balanced this flood of garbage across all our servers, ensuring that every single one of them was completely overwhelmed in a perfectly distributed fashion. The site went down. Hard.

The next 72 hours were a blur of caffeine, panicked Slack messages, and emergency calls with our new best friends at Cloudflare. We had saved $9,800 on our monthly bill, but we were now paying for it with our sanity and our customers’ trust.

Waking up to the real tradeoff

The hidden cost wasn’t just the emergency WAF setup or the premium support plan. It was the engineering hours. The all-nighters. The “I’m so sorry, it won’t happen again” emails to customers.

We had made a classic mistake. We looked at API Gateway’s price tag and saw a cost. We should have seen an itemized receipt for services rendered:

  • Built-in WAF: Free bouncer that stops common attacks.
  • Rate Limiting: A Free bartender who cuts off rowdy clients.
  • Request Validation: Free doorman that checks for malformed IDs (JSON).
  • Authorization: Free security guard that validates credentials (JWTs/OAuth).

An ALB does none of this. It just balances the load. That’s its only job. Expecting it to handle security is like expecting the mailman to guard your house. It’s not what he’s paid for.

Building a smarter hybrid fortress

After the fires were out and we’d all had a chance to sleep, we didn’t just switch back. That would have been too easy (and an admission of complete defeat). We came back wiser, like battle-hardened veterans of the cloud wars. We realized that not all traffic is created equal.

We landed on a hybrid solution, the architectural equivalent of having both a friendly greeter and a heavily-armed guard.

1. Let the ALB handle the easy stuff. For high-volume, low-risk traffic like serving images or tracking pixels, the ALB is perfect. It’s cheap, fast, and the security risk is minimal. Here’s how we route the “dumb” traffic in serverless.yml:

# serverless.yml
functions:
  handleMedia:
    handler: handler.alb  # A simple handler for media
    events:
      - alb:
          listenerArn: arn:aws:elasticloadbalancing:us-east-1:123456789012:listener/app/my-main-alb/a1b2c3d4e5f6a7b8
          priority: 100
          conditions:
            path: /media/*  # Any request for images, videos, etc.

2. Protect the crown jewels with API Gateway. For anything critical, user authentication, data processing, and payment endpoints, we put it back behind the fortress of API Gateway. The extra cost is a small price to pay for peace of mind.

# serverless.yml
functions:
  handleSecureData:
    handler: handler.auth  # A handler with business logic
    events:
      - httpApi:
          path: /secure/v2/{proxy+}
          method: any

3. Add the security we should have had all along. This was non-negotiable. We layered on security like a paranoid onion.

  • AWS WAF on the ALB: We now pay for a WAF to protect our “dumb” endpoints. It’s an extra cost, but cheaper than an outage.
  • Cloudflare: Sits in front of everything, providing world-class DDoS protection.
  • Custom Rate Limiting: For nuanced cases, we built our own rate limiter. A Lambda function checks a Redis cache to track request counts from specific IPs or API keys.

Here’s a conceptual Python snippet for what that Lambda might look like:

# A simplified Lambda function for rate limiting with Redis
import redis
import os

# Connect to Redis
redis_client = redis.Redis(
    host=os.environ['REDIS_HOST'], 
    port=6379, 
    db=0
)

RATE_LIMIT_THRESHOLD = 100  # requests
TIME_WINDOW_SECONDS = 60    # per minute

def lambda_handler(event, context):
    # Get an identifier, e.g., from the source IP or an API key
    client_key = event['requestContext']['identity']['sourceIp']
    
    # Increment the count for this key
    current_count = redis_client.incr(client_key)
    
    # If it's a new key, set an expiration time
    if current_count == 1:
        redis_client.expire(client_key, TIME_WINDOW_SECONDS)
    
    # Check if the limit is exceeded
    if current_count > RATE_LIMIT_THRESHOLD:
        # Return a "429 Too Many Requests" error
        return {
            "statusCode": 429,
            "body": "Rate limit exceeded. Please try again later."
        }
    
    # Otherwise, allow the request to proceed to the application
    return {
        "statusCode": 200, 
        "body": "Request allowed"
    }

The final scorecard

So, after all that drama, was it worth it? Let’s look at the final numbers.

Verdict: For us, yes, it was ultimately worth it. Our traffic profile justified the complexity. But we traded money for time and complexity. We saved cash on the AWS bill, but the initial investment in engineering hours was massive. My DevOps dignity took a hit, but it has since recovered, scarred but wiser.

Your personal sanity check

Before you follow in our footsteps, do yourself a favor and perform an audit. Don’t let a shiny, low price tag lure you onto the rocks.

1. Calculate your break-even point. First, figure out what API Gateway is costing you.

# Get your usage data from API Gateway 
aws apigateway get-usage \ --usage-plan-id your_plan_id_here \ 
   --start-date 2024-01-01 
   --end-date 2024-01-31 \ 
   --query 'items'

Then, estimate your ALB costs based on request volume.

# Get request count to estimate ALB costs
aws cloudwatch get-metric-statistics \
    --namespace AWS/ApplicationELB \
    --metric-name RequestCount \
    --dimensions Name=LoadBalancer,Value=app/my-main-alb/a1b2c3d4e5f6a7b8 \
    --start-time 2024-01-01T00:00:00Z \
    --end-time 2024-01-31T23:59:59Z \
    --period 86400 \
    --statistics Sum

2. Test your defenses (or lack thereof). Don’t wait for a real attack. Simulate one. Tools like Burp Suite or sqlmap can show you just how exposed you are.

# A simple sqlmap test against a vulnerable parameter 
sqlmap -u "https://api.yoursite.com/endpoint?id=1" --batch

If you switch to an ALB without a WAF, you might be horrified by what you find.

In the end, choosing between API Gateway and an ALB isn’t just a technical decision. It’s a business decision about risk, and one that goes far beyond the monthly bill. It’s about calculating the true Total Cost of Ownership (TCO), where the “O” includes your engineers’ time, your customers’ trust, and your sleep schedule. Are you paying with dollars on an invoice, or are you paying with 3 AM incident calls and the slow erosion of your team’s morale?

Peace of mind has a price tag. For us, we discovered its value was around $7,700 a month. We just had to get DDoS’d to learn how to read the receipt. Don’t make the same mistake. Look at your architecture not just for what it costs, but for what it protects. Because the cheapest option on paper can quickly become the most expensive one in reality.