SRE stuff

Random comments from a SRE

Git branching strategies Merge or Rebase

Picture this: You’re building a magnificent LEGO castle, not alone but with a team. Each of you is crafting different sections, a tower, a wall, maybe a dungeon for the mischievous minifigures. The grand question arises: How do you unite these masterpieces into one glorious fortress?

This is where Git, our trusty version control system, steps in, offering two distinct approaches: Merge and Rebase. Both achieve the same goal, bringing your team’s work together, but they do so with different philosophies and, consequently, different outcomes in your project’s history. So, which path should you choose? Let’s unravel this mystery together!

Merging: The Storyteller

Imagine git merge as a meticulous historian, carefully documenting every step of your castle-building journey. When you merge two branches, Git creates a special “merge commit,” a snapshot that says, “Here’s where we brought these two storylines together.” It’s like adding a chapter to a book that acknowledges the contributions of multiple authors.

# You are on the 'feature' branch 
git checkout main 
git merge feature 

# Result: A new merge commit is created on 'main'

What’s the beauty of this approach?

  • Preserves History: You get a complete, chronological record of every commit, every twist and turn in your development process. It’s like having a detailed blueprint of how your LEGO castle was built, brick by brick.
  • Transparency: Everyone on the team can easily see how the project evolved, who made what changes, and when. This is crucial for collaboration and debugging.
  • Safety Net: If something goes wrong, you can easily trace back the changes and revert to an earlier state. It’s like having a time machine to undo any construction mishaps.

But, there’s a catch (isn’t there always?):

  • Messy History: Your project’s history can become quite complex, especially with frequent merges. Imagine a book with too many footnotes, it can be a bit overwhelming to follow.

Rebasing: The Time Traveler

Now, git rebase takes a different approach. Think of it as a time traveler who neatly rewrites history. Instead of creating a merge commit, rebase takes your branch’s commits and replants them on top of the target branch, making it appear as if you’d been working directly on that branch all along.

# You are on the 'feature' branch 
git checkout feature 
git rebase main 

# Result: The 'feature' branch's commits are now on top of 'main'

Why would you want to rewrite history?

  • Clean History: You end up with a linear, streamlined project history, like a well-organized story with a clear narrative flow. It’s easier to read and understand the overall progression of the project.
  • Simplified View: It can be easier to visualize the project’s development as a single, continuous line, especially in projects with many contributors.

However, there’s a word of caution:

  • History Alteration: Rebasing rewrites the commit history. This can be problematic if you’re working on a shared branch, as it can lead to confusion and conflicts for other team members. Imagine someone changing the blueprints while you’re still building… chaos.
  • Potential for Errors: If not done carefully, rebasing can introduce subtle bugs that are hard to track down.

So, Merge or Rebase? The Golden Rule

Here’s the gist, the key takeaway, the rule of thumb you should tattoo on your programmer’s brain (metaphorically, of course):

  • Use merge for shared or public branches (like main or master). It preserves the true history and keeps everyone on the same page.
  • Use rebase for your local feature branches before merging them into a shared branch. This keeps your feature branch’s history clean and easy to understand, making the final merge smoother.

Think of it this way: you do your messy experiments and drafts in your private notebook (local branch with rebase), and then you neatly transcribe your final work into the official logbook (shared branch with merge).

Analogy Time!

Let’s say you and your friend are writing a song.

  • Merge: You each write verses separately. Then, you combine them, creating a new verse that says, “Here’s where Verse 1 and Verse 2 meet.” It’s clear that it was a collaborative effort, and you can still see the individual verses.
  • Rebase: You write your verse. Then, you take your friend’s verse and rewrite yours as if you had written it after theirs. The song flows seamlessly, but it’s not immediately obvious that two people wrote it.

The Bottom Line

Both merge and rebase are powerful tools. The best choice depends on your specific workflow and your team’s preferences. The most important thing is to understand how each method works and to use them consistently. But always remember the golden rule: merge for shared, rebase for local.

S3 Access Points explained

Don’t you feel like your data in the cloud is a bit too… exposed? Like you’ve got a treasure chest full of valuable information (your S3 bucket), but it’s just sitting there, practically begging for unwanted attention? You wouldn’t leave your valuables out in the open in the real world, would you? Well, the same logic applies to your data in the cloud.

This is where AWS S3 Access Points come in. They act like bouncers for your data, ensuring only the right people get in. And for those of you with data scattered across the globe, we’ve got something even fancier: Multi-Region Access Points (MRAPs). They’re like the global positioning system for your data, ensuring fast access no matter where you are.

So buckle up, and let’s explore the fascinating world of S3 Access Points and MRAPs. Let’s try to make it fun.

The problem is that your S3 Bucket is wide open (By Default)

Think of an S3 bucket as a giant storage locker in the cloud. When you first create one, it’s like leaving the locker door wide open. Anyone who knows the lockers there can just waltz in and take a peek, or worse, start messing with your stuff.

This might be fine if you’re just storing cat memes, but what if you have sensitive customer data, financial records, or top-secret project files? You need a way to control who gets in and what they can do.

The solution is the Access Points, your data’s bouncers

Imagine Access Points as the bouncers standing guard at the entrance of your storage locker. They check IDs, make sure everyone’s on the guest list, and only let in the people you’ve authorized.

In more technical terms, an Access Point is a unique hostname that you create to enforce distinct permissions and network controls for any request made through it. You can configure each Access Point with its own IAM policy, tailored to specific use cases.

Why you need Access Points. It’s all about control

Here’s the deal:

  • Granular Access Control: You can create different Access Points for different applications or teams, each with its own set of permissions. Maybe your marketing team only needs read access to product images, while your developers need full read and write access to application logs. Access Points make this a breeze.
  • Simplified Policy Management: Instead of one giant, complicated bucket policy, you can have smaller, more manageable policies for each Access Point. It’s like having a separate rule book for each group that needs access.
  • Enhanced Security: By restricting access through specific Access Points, you reduce the risk of accidental data exposure or unauthorized modification. It’s like having multiple layers of security for your precious data.
  • Compliance Made Easier: Many industries have strict regulations about data access and security (think GDPR, HIPAA). Access Points help you meet these requirements by providing a clear and auditable way to control who can access what.

Let’s get practical with an Access Point policy example

Okay, let’s see how this works in practice. Here’s an example of an Access Point policy that only allows access to a bucket named “pending-documentation” and only permits read and write actions (no deleting!):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::123456789012:user/Alice"
      },
      "Action": [
        "s3:GetObject",
        "s3:PutObject"
      ],
      "Resource": "arn:aws:s3:us-west-2:123456789012:accesspoint/my-access-point/object/pending-documentation/*"
    }
  ]
}

Explanation:

  • Version: Specifies the policy language version.
  • Statement: An array of permission statements.
  • Effect: “Allow” means this statement grants permission.
  • Principal: This specifies who is granted access. In this case, it’s the IAM user “Alice” (you’d replace this with the actual ARN of your user or role).
  • Action: The S3 actions allowed. Here, it’s s3:GetObject (read) and s3:PutObject (write).
  • Resource: This is the crucial part. It specifies the resource the policy applies to. Here, it’s the “pending-documentation” bucket accessed through the “my-access-point” Access Point. The /* at the end means all objects within that bucket path.

Delegating access control to the Access Point (Bucket Policy)

You also need to configure your S3 bucket policy to delegate access control to the Access Point. Here’s an example:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "*"
      },
      "Action": "s3:*",
      "Resource": "arn:aws:s3:::my-bucket/*",
      "Condition": {
        "StringEquals": {
          "s3:DataAccessPointArn": "arn:aws:s3:us-west-2:123456789012:accesspoint/my-access-point"
        }
      }
    }
  ]
}
  • This policy allows any principal (“AWS”: “*”) to perform any S3 action (“s3:*”), but only if the request goes through the specified Access Point ARN.

Taking it global, Multi-Region Access Points (MRAPs)

Now, let’s say your data is spread across multiple AWS regions. Maybe you have users all over the world, and you want them to have fast access to your data, no matter where they are. This is where Multi-Region Access Points (MRAPs) come to the rescue!

Think of an MRAP as a smart global router for your data. It’s a single endpoint that automatically routes requests to the closest copy of your data in one of your S3 buckets across multiple regions.

Why Use MRAPs? Think speed and resilience

  • Reduced Latency: MRAPs ensure that users are always accessing the data from the nearest region, minimizing latency and improving application performance. It is like having a fast-food in each country, so clients can have their orders faster.
  • High Availability: If one region becomes unavailable, MRAPs automatically route traffic to another region, ensuring your application stays up and running. It’s like having a backup generator for your data.
  • Simplified Management: Instead of managing multiple endpoints for different regions, you have one MRAP to rule them all.

MRAPs vs. Regular Access Points, what’s the difference?

While both are about controlling access, MRAPs take it to the next level:

  • Scope: Regular Access Points are regional; MRAPs are multi-regional.
  • Focus: Regular Access Points primarily focus on security and access control; MRAPs add performance and availability to the mix.
  • Complexity: MRAPs are a bit more complex to set up because you’re dealing with multiple regions.

When to unleash the power of Access Points and MRAPs

  • Data Lakes: Use Access Points to create secure “zones” within your data lake, granting different teams access to only the data they need.
  • Content Delivery: MRAPs can accelerate content delivery to users around the world by serving data from the nearest region.
  • Hybrid Cloud: Access Points can help integrate your on-premises applications with your S3 data in a secure and controlled manner.
  • Compliance: Meeting regulations like GDPR or HIPAA becomes easier with the fine-grained access control provided by Access Points.
  • Global Applications: If you have a globally distributed application, MRAPs are essential for delivering a seamless user experience.

Lock down your data and speed up access

AWS S3 Access Points and Multi-Region Access Points are powerful tools for managing access to your data in the cloud. They provide the security, control, and performance that modern applications demand.

Managing SSL certificates with SNI on AWS ALB and NLB

The challenge of hosting multiple SSL-Secured sites

Let’s talk about security on the web. You want your website to be secure. Of course, you do! That’s where HTTPS and those little SSL/TLS certificates come in. They’re like the secret handshakes of the internet, ensuring that the information flowing between your site and visitors is safe from prying eyes. But here’s the thing: back in the day, if you wanted a bunch of websites, each with its secure certificate, you needed a separate IP address. Imagine having to get a new phone number for every person you wanted to call! It was a real headache and cost a pretty penny, too, especially if you were running a whole bunch of websites.

Defining SNI as a modern SSL/TLS extension

Now, what if I told you there was a clever way around this whole IP address mess? That’s where this little gem called Server Name Indication (SNI) comes in. It’s like a smart little addition to the way websites and browsers talk to each other securely. Think of it this way, your server’s IP address is like a big apartment building, and each website is a different apartment. Without SNI, it’s like visitors can only shout the building’s address (the IP address). The doorman (the server) wouldn’t know which apartment to send them to. SNI fixes that. It lets the visitor whisper both the building address and the apartment number (the website’s name) right at the start. Pretty neat.

Understanding the SNI handshake process

So, how does this SNI thing work? Let’s lift the hood and take a peek at the engine, shall we? It all happens during this little dance called the SSL/TLS handshake, the very beginning of a secure connection.

  • Client Hello: First, the client (like your web browser) says “Hello!” to the server. But now, thanks to SNI, it also whispers the name of the website it wants to talk to. It is like saying “Hey, I want to connect, and by the way, I’m looking for ‘www.example.com‘”.
  • Server Selection: The server gets this message and, because it’s a smart cookie, it checks the SNI part. It uses that website name to pick out the right secret handshake (the SSL certificate) from its big box of handshakes.
  • Server Hello: The server then says “Hello!” back, showing off the certificate it picked.
  • Secure Connection: The client checks if the handshake looks legit, and if it does, boom! You’ve got yourself a secure connection. It’s like a secret club where everyone knows the password, and they’re all speaking in code so no one else can understand.

AWS load balancers and SNI as a perfect match

Now, let’s bring this into the world of Amazon Web Services (AWS). They’ve got these things called load balancers, which are like traffic cops for websites, directing visitors to the right place. The newer ones, Application Load Balancers (ALB) and Network Load Balancers (NLB) are big fans of SNI. It means you can have a whole bunch of websites, each with its certificate, all hiding behind one of these load balancers. Each of those websites could be running on different computers (EC2 instances, as they call them), but the load balancer, thanks to SNI, knows exactly where to send the visitors.

CloudFront’s adoption of SNI for secure content delivery at scale

And it’s not just load balancers, AWS has this other thing called CloudFront, which is like a super-fast delivery service for websites. It makes sure your website loads quickly for people all over the world. And guess what? CloudFront loves SNI, too. It lets you have different secret handshakes (certificates) for different websites, even if they’re all being delivered through the same CloudFront setup. Just remember, the old-timer, Classic Load Balancer (CLB), doesn’t know this SNI trick. It’s a bit behind the times, so keep that in mind.

Cost savings through optimized resource utilization

Why should you care about all this? Well, for starters, it saves you money! Instead of needing a whole bunch of IP addresses (which cost money), you can use just one with SNI. It is like sharing an office space instead of everyone renting their building.

Simplified management by streamlining certificate handling

And it makes your life a whole lot easier, too. Managing those secret handshakes (certificates) can be a real pain. But with SNI, you can manage them all in one place on your load balancer. It is way simpler than running around to a dozen different offices to update everyone’s secret handshake.

Enhanced scalability for efficient infrastructure growth

And if your website gets popular, no problem, SNI lets you add new websites to your load balancer without breaking a sweat. You don’t have to worry about getting new IP addresses every time you want to launch a new site. It’s like adding new apartments to your building without having to change the building’s address.

Client compatibility to ensure broad support

Now, I have to be honest with you. There might be some really, really old web browsers out there that haven’t heard of SNI. But, honestly, they’re becoming rarer than a dodo bird. Most browsers these days are smart enough to handle SNI, so you don’t have to worry about it.

SNI as a cornerstone of modern Web hosting on AWS

So, there you have it. SNI is like a secret weapon for running websites securely and efficiently on AWS. It’s a clever little trick that saves you money, simplifies your life, and lets your website grow without any headaches. It is proof that even small changes to the way things work on the internet can make a huge difference. When you’re building things on AWS, remember SNI. It’s like having a master key that unlocks a whole bunch of possibilities for a secure and scalable future. It’s a neat piece of engineering if you ask me.

Practical guide to DNS Records in AWS Route 53

Your browser instantly connects you to your desired website when you type in its address and hit enter. It’s a seamless experience we often take for granted. But behind this seemingly simple action lies a complex system that makes it all possible: the Domain Name System (DNS). Think of DNS as the internet’s global directory, translating human-readable domain names into the numerical IP addresses that computers use to communicate. And when managing DNS with reliability and scalability, AWS Route 53 takes center stage. Route 53 is Amazon’s highly available and scalable DNS service, designed to route traffic to your application’s resources with remarkable precision and minimal latency. In this guide, we’ll demystify the most common DNS record types and show you how to use them effectively with Route 53, using practical examples.

Let’s jump into DNS records by breaking them down into simple, relatable examples and exploring real-world use cases. We’ll see how they work together, like a well-orchestrated symphony, to make the internet navigable.

The basics of DNS Records

DNS records are like traffic signs for the internet, directing users to the right destinations. But instead of physical signs, they’re digital entries that guide web browsers and other services. Route 53 makes managing these records straightforward. Here are the most common types:

A Record (Address Record)

Think of an A Record as the street address for your website. It maps a domain name (e.g., example.com) to an IPv4 address (e.g., 192.0.2.1). It’s the most basic thing. It just tells the internet where your website lives.

  • Purpose: Directs traffic to web servers or other IPv4 resources.
  • Analogy: Imagine telling a friend to visit you at your home address, that’s what an A Record does for websites. It’s like saying, “Hey, if you’re looking for example.com, it’s over at this IP address.”
  • Use Case: Hosting a website like example.com on an EC2 instance or an on-premises server.

CNAME Record (Canonical Name)

A CNAME Record is like a nickname for your domain. It maps an alias domain name (e.g., www.example.com) to another “canonical” domain name (e.g., example.com).

  • Purpose: Simplifies management by allowing multiple domains to point to the same resource. It’s like having various roads leading to the same destination.
  • Analogy: It’s like calling your friend “Bob” instead of “Robert.” Both names point to the same person.
  • Use case: Scaling applications by mapping api.example.com to an Application Load Balancer’s DNS name, such as app-load-balancer-456.amazonaws.com. You point your CNAME to the load balancer, and the load balancer handles distributing traffic to your servers.

AAAA Record (Quad A Record)

For the modern internet, AAAA Records map domain names to IPv6 addresses (e.g., 2001:db8::1).

  • Purpose: Ensures compatibility with IPv6 resources, which is becoming increasingly important as the internet grows.
  • Analogy: Think of this as an upgrade to a new address system for the internet, ready for the future. It’s like moving from a local phone system to a global one.
  • Use case: Enabling access to your website via IPv6. This ensures your site is reachable by devices using the newer IPv6 standard.

MX Record (Mail Exchange)

MX Records ensure emails sent to your domain arrive at the correct mail server.

  • Purpose: Routes emails to the appropriate mail server.
  • Analogy: Like sorting mail at a post office to send it to the right address. Each piece of mail (email) needs to be directed to the correct recipient (mail server).
  • Use case: Configuring email for domains with Google Workspace or Microsoft 365. This ensures your emails are handled by the right service.

NS Record (Name Server)

NS Records delegate a domain or subdomain to specific name servers.

  • Purpose: Specifies which servers are authoritative for answering DNS queries for a domain. In other words, they know all the A records, CNAME records, etc., for that domain.
  • Analogy: It’s like asking a specific guide for directions within a city. That guide knows the specific area inside and out.
  • Use case: Delegating subdomains like dev.example.com to a different DNS provider, perhaps for testing purposes.

TXT Record (Text Record)

TXT Records store arbitrary text data, often used for domain verification or email security configurations (e.g., SPF, DKIM).

  • Purpose: Provides information to external systems.
  • Analogy: Think of it as posting a sign with instructions outside your door. This sign might say, “To verify you own this house, please show this specific code.”
  • Use case: Adding SPF, DKIM, and DMARC records to prevent email spoofing and improve email deliverability. This helps ensure your emails don’t end up in spam folders.

Alias Record

Exclusive to AWS, Alias Records map domain names to AWS resources like S3 buckets or CloudFront distributions without needing an IP address.

  • Purpose: Reduces costs and simplifies DNS management, especially within the AWS ecosystem.
  • Analogy: A direct shortcut to AWS resources without the extra steps. Think of it as a secret tunnel directly to your destination, bypassing traffic.
  • Use case: Mapping example.com to a CloudFront distribution for CDN integration. This allows for faster content delivery to users around the world. Or, say you have a static website hosted on S3. An Alias record can point your domain directly to the S3 bucket, without needing a separate web server.

Putting it all together

Let’s look at how these records work in harmony to power your website. See? It’s not so complicated when you break it down. Each record has its job, and they all work together like a well-oiled machine.

Hosting a scalable website

  1. Register your domain: Let’s say you register example.com using Route 53.
  2. Create an A Record: You map example.com to an EC2 instance’s IP address where your website is hosted.
  3. Add a CNAME Record: For www.example.com, you create a CNAME pointing to example.com. This way, both addresses lead to your site.
  4. Utilize Alias Records: To speed up content delivery, you create an Alias record connecting example.com to a CloudFront distribution. This caches your website content at edge locations closer to your users. And shall we use another Alias Record to connect static.example.com to an S3 bucket, to serve your images faster? Why not.
  5. Implement TXT Records: You add TXT records for email authentication (SPF, DKIM) to ensure your emails are trusted and delivered reliably.
  6. Enable health checks: Route 53 can automatically monitor the health of your EC2 instances and route traffic away from unhealthy ones, ensuring your site stays up even if a server has issues. Route 53 can even automatically remove unhealthy instances from your DNS records.

This setup ensures high availability, scalability, and secure communication. But what makes Route 53 special? It’s not just about creating these records; it’s about doing it reliably and efficiently. Route 53 is designed for high availability and low latency. It uses a global network of DNS servers to ensure your website is always reachable, even if one server or region has problems. That means faster loading times for your users, no matter where they are.

Closing thoughts

AWS Route 53 isn’t just about creating DNS records, it’s about building robust, scalable, and secure internet infrastructure. It’s about making sure your website is always available to your users, no matter what. It’s like having a team of incredibly efficient digital postal workers who know exactly how to deliver each data packet to its correct destination. And what’s fascinating is that, like a well-designed metro system, Route 53 operates on multiple levels: it can direct traffic based on latency, geolocation, or even the health status of your services. Consider for a moment the massive scale at which services like Netflix or Amazon operate, keeping their platforms running smoothly with millions of simultaneous users. Part of that magic happens thanks to services like Route 53.
The beauty of it all lies in its apparent simplicity for the end user, everything works seamlessly, but behind the scenes, there’s a complex orchestration of systems working in perfect harmony. It’s like a symphony where each DNS record is a different instrument, and Route 53 is the conductor ensuring everything sounds exactly as it should.

How to check if a folder is used by services on Linux

You know that feeling when you’re spring cleaning your Linux system and spot that mysterious folder lurking around forever? Your finger hovers over the delete key, but something makes you pause. Smart move! Before removing any folder, wouldn’t it be nice to know if any services are actively using it? It’s like checking if someone’s sitting in a chair before moving it. Today, I’ll show you how to do that, and I promise to keep it simple and fun.

Why should you care?

You see, in the world of DevOps and SysOps, understanding which services are using your folders is becoming increasingly important. It’s like being a detective in your own system – you need to know what’s happening behind the scenes to avoid accidentally breaking things. Think of it as checking if the room is empty before turning off the lights!

Meet your two best friends lsof and fuser

Let me introduce you to two powerful tools that will help you become this system detective: lsof and fuser. They’re like X-ray glasses for your Linux system, letting you see invisible connections between processes and files.

The lsof command as your first tool

lsof stands for “list open files” (pretty straightforward, right?). Here’s how you can use it:

lsof +D /path/to/your/folder

This command is like asking, “Hey, who’s using stuff in this folder?” The system will then show you a list of all processes that are accessing files in that directory. It’s that simple!

Let’s break down what you’ll see:

  • COMMAND: The name of the program using the folder
  • PID: A unique number identifying the process (like its ID card)
  • USER: Who’s running the process
  • FD: File descriptor (don’t worry too much about this one)
  • TYPE: Type of file
  • DEVICE: Device numbers
  • SIZE/OFF: Size of the file
  • NODE: Inode number (system’s way of tracking files)
  • NAME: Path to the file

The fuser command as your second tool

Now, let’s meet fuser. It’s like lsof’s cousin, but with a different approach:

fuser -v /path/to/your/folder

This command shows you which processes are using the folder but in a more concise way. It’s perfect when you want a quick overview without too many details.

Examples

Let’s say you have a folder called /var/www/html and you want to check if your web server is using it:

lsof +D /var/www/html

You might see something like:

COMMAND  PID     USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
apache2  1234    www-data  3r  REG  252,0   12345 67890 /var/www/html/index.html

This tells you that Apache is reading files from that folder, good to know before making any changes!

Pro tips and best practices

  • Always check before deleting When in doubt, it’s better to check twice than to break something once. It’s like looking both ways before crossing the street!
  • Watch out for performance The lsof +D command checks all subfolders too, which can be slow for large directories. For quicker checks of just the folder itself, you can use:
lsof +d /path/to/folder
  • Combine commands for better insights You can pipe these commands with grep for more specific searches:
lsof +D /path/to/folder | grep service_name

Troubleshooting common scenarios

Sometimes you might run these commands and get no output. Don’t panic! This usually means no processes are currently using the folder. However, remember that:

  • Some processes might open and close files quickly
  • You might need sudo privileges to see everything
  • System processes might be using files in ways that aren’t immediately visible

Conclusion

Understanding which services are using your folders is crucial in modern DevOps and SysOps environments. With lsof and fuser, you have powerful tools at your disposal to make informed decisions about your system’s folders.

Remember, the key is to always check before making changes. It’s better to spend a minute checking than an hour fixing it! These tools are your friends in maintaining a healthy and stable Linux system.

Quick reference

# Check folder usage with lsof
lsof +D /path/to/folder

# Quick check with fuser
fuser -v /path/to/folder

# Check specific service
lsof +D /path/to/folder | grep service_name

# Check folder without recursion
lsof +d /path/to/folder

The commands we’ve explored today are just the beginning of your journey into better Linux system management. As you become more comfortable with these tools, you’ll find yourself naturally integrating them into your daily DevOps and SysOps routines. They’ll become an essential part of your system maintenance toolkit, helping you make informed decisions and prevent those dreaded “Oops, I shouldn’t have deleted that” moments.

Being cautious with system modifications isn’t about being afraid to make changes,  it’s about making changes confidently because you understand what you’re working with. Whether you’re managing a single server or orchestrating a complex cloud infrastructure, these simple yet powerful commands will help you maintain system stability and peace of mind.

Keep exploring, keep learning, and most importantly, keep your Linux systems running smoothly. The more you practice these techniques, the more natural they’ll become. And remember, in the world of system administration, a minute of checking can save hours of troubleshooting!

How to ensure high availability for pods in Kubernetes

I was thinking the other day about these Kubernetes pods, and how they’re like little spaceships floating around in the cluster. But what happens if one of those spaceships suddenly vanishes? Poof! Gone! That’s a real problem. So I started wondering, how can we ensure our pods are always there, ready to do their job, even if things go wrong? It’s like trying to keep a juggling act going while someone’s moving the floor around you…

Let me tell you about this tool called Karpenter. It’s like a super-efficient hotel manager for our Kubernetes worker nodes, always trying to arrange the “guests” (our applications) most cost-effectively. Sometimes, this means moving guests from one room to another to save on operating costs. In Kubernetes terminology, we call this “consolidation.”

The dancing pods challenge

Here’s the thing: We have this wonderful hotel manager (Karpenter) who’s doing a fantastic job, keeping costs down by constantly optimizing room assignments. But what about our guests (the applications)? They might get a bit dizzy with all this moving around, and sometimes, their important work gets disrupted.

So, the question is: How do we keep our applications running smoothly while still allowing Karpenter to do its magic? It’s like trying to keep a circus performance going while the stage crew rearranges the set in the middle of the act.

Understanding the moving parts

Before we explore the solutions, let’s take a peek behind the scenes and see what happens when Karpenter decides to relocate our applications. It’s quite a fascinating process:

First, Karpenter puts up a “Do Not Disturb” sign (technically called a taint) on the node it wants to clear. Then, it finds new accommodations for all the applications. Finally, it carefully moves each application to its new location.

Think of it as a well-choreographed dance where each step must be perfectly timed to avoid any missteps.

The art of high availability

Now, for the exciting part, we have some clever tricks up our sleeves to ensure our applications keep running smoothly:

  1. The buddy system: The first rule of high availability is simple: never go it alone! Instead of running a single instance of your application, run at least two. It’s like having a backup singer, if one voice falters, the show goes on. In Kubernetes, we do this by setting replicas: 2 in our deployment configuration.
  2. Strategic placement: Here’s a neat trick: we can tell Kubernetes to spread our application copies across different physical machines. It’s like not putting all your eggs in one basket. We use something called “Pod Topology Spread Constraints” for this. Here’s how it looks in practice:
topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: kubernetes.io/hostname
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        app: your-app
  1. Setting boundaries: Remember when your parents set rules about how many cookies you could eat? We do something similar in Kubernetes with PodDisruptionBudgets (PDB). We tell Kubernetes, “Hey, you must always keep at least 50% of my application instances running.” This prevents our hotel manager from getting too enthusiastic about rearranging things.
  2. The “Do Not Disturb” sign: For those special cases where we absolutely don’t want an application to be moved, we can put up a permanent “Do Not Disturb” sign using the karpenter.sh/do-not-disrupt: “true” annotation. It’s like having a VIP guest who gets to keep their room no matter what.

The complete picture

The beauty of this system lies in how all the pieces work together. Think of it as a safety net with multiple layers:

  • Multiple instances ensure basic redundancy.
  • Strategic placement keeps instances separated.
  • PodDisruptionBudgets prevent too many moves at once.
  • And when necessary, we can completely prevent disruption.

A real example

Let me paint you a picture. Imagine you’re running a critical web service. Here’s how you might set it up:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: critical-web-service
spec:
  replicas: 2
  template:
    metadata:
      annotations:
        karpenter.sh/do-not-disrupt: "false"  # We allow movement, but with controls
    spec:
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: DoNotSchedule
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: critical-web-service-pdb
spec:
  minAvailable: 50%
  selector:
    matchLabels:
      app: critical-web-service

The result

With these patterns in place, our applications become incredibly resilient. They can handle node failures, scale smoothly, and even survive Karpenter’s optimization efforts without any downtime. It’s like having a self-healing system that keeps your services running no matter what happens behind the scenes.

High availability isn’t just about having multiple copies of our application, it’s about thoughtfully designing how those copies are managed and maintained. By understanding and implementing these patterns, we are not just running applications in Kubernetes; we are crafting reliable, resilient services that can weather any storm.

The next time you deploy an application to Kubernetes, think about these patterns. They might just save you from that dreaded 3 AM wake-up call about your service being down!

How to mount AWS EFS on EKS for scalable storage solutions

Suppose you need multiple applications to share files seamlessly, without worrying about running out of storage space or struggling with complex configurations. That’s where AWS Elastic File System (EFS) comes in. EFS is a fully managed, scalable file system that multiple AWS services or containers can access. In this guide, we’ll take a simple yet comprehensive journey through the process of mounting AWS EFS to an Amazon Elastic Kubernetes Service (EKS) cluster. I’ll make sure to keep it straightforward, so you can follow along regardless of your Kubernetes experience.

Why use EFS with EKS?

Before we go into the details, let’s consider why using EFS in a Kubernetes environment is beneficial. Imagine you have multiple applications (pods) that all need to access the same data—like a shared directory of documents. Instead of replicating data for each application, EFS provides a centralized storage solution that can be accessed by all pods, regardless of which node they’re running on.

Here’s what makes EFS a great choice for EKS:

  • Shared Storage: Multiple pods across different nodes can access the same files at the same time, making it perfect for workloads that require shared access.
  • Scalability: EFS automatically scales up or down as your data needs change, so you never have to worry about manually managing storage limits.
  • Durability and Availability: AWS ensures that your data is highly durable and accessible across multiple Availability Zones (AZs), which means your applications stay resilient even if there are hardware failures.

Typical use cases for using EFS with EKS include machine learning workloads, content management systems, or shared file storage for collaborative environments like JupyterHub.

Prerequisites

Before we start, make sure you have the following:

  1. EKS Cluster: You need a running EKS cluster, and kubectl should be configured to access it.
  2. EFS File System: An existing EFS file system in the same AWS region as your EKS cluster.
  3. IAM Roles: Correct IAM roles and policies for your EKS nodes to interact with EFS.
  4. Amazon EFS CSI Driver: This driver must be installed in your EKS cluster.

How to mount AWS EFS on EKS

Let’s take it step by step, so by the end, you’ll have a working setup where your Kubernetes pods can use EFS for shared, scalable storage.

Create an EFS file system

To begin, navigate to the EFS Management Console:

  1. Create a New File System: Select the appropriate VPC and subnets—they should be in the same region as your EKS cluster.
  2. File System ID: Note the File System ID; you’ll use it later.
  3. Networking: Ensure that your security group allows inbound traffic from the EKS worker nodes. Think of this as permitting EKS to access your storage safely.

Set up IAM role for the EFS CSI driver

The Amazon EFS CSI driver manages the integration between EFS and Kubernetes. For this driver to work, you need to create an IAM role. It’s a bit like giving the CSI driver its set of keys to interact with EFS securely.

To create the role:

  1. Log in to the AWS Management Console and navigate to IAM.
  2. Create a new role and set up a custom trust policy:
{
   "Version": "2012-10-17",
   "Statement": [
       {
           "Effect": "Allow",
           "Principal": {
               "Federated": "arn:aws:iam::<account-id>:oidc-provider/oidc.eks.<region>.amazonaws.com/id/<oidc-provider-id>"
           },
           "Action": "sts:AssumeRoleWithWebIdentity",
           "Condition": {
               "StringLike": {
                   "oidc.eks.<region>.amazonaws.com/id/<oidc-provider-id>:sub": "system:serviceaccount:kube-system:efs-csi-*"
               }
           }
       }
   ]
}

Make sure to attach the AmazonEFSCSIDriverPolicy to this role. This step ensures that the CSI driver has the necessary permissions to manage EFS volumes.

Install the Amazon EFS CSI driver

You can install the EFS CSI driver using either the EKS Add-ons feature or via Helm charts. I recommend the EKS Add-on method because it’s easier to manage and stays updated automatically.

Attach the IAM role you created to the EFS CSI add-on in your cluster.

(Optional) Create an EFS access point

Access points provide a way to manage and segregate access within an EFS file system. It’s like having different doors to different parts of the same warehouse, each with its key and permissions.

  • Go to the EFS Console and select your file system.
  • Create a new Access Point and note its ID for use in upcoming steps.

Configure an IAM Policy for worker nodes

To make sure your EKS worker nodes can access EFS, attach an IAM policy to their role. Here’s an example policy:

{
   "Version": "2012-10-17",
   "Statement": [
       {
           "Effect": "Allow",
           "Action": [
               "elasticfilesystem:DescribeAccessPoints",
               "elasticfilesystem:DescribeFileSystems",
               "elasticfilesystem:ClientMount",
               "elasticfilesystem:ClientWrite"
           ],
           "Resource": "*"
       }
   ]
}

This ensures your nodes can create and interact with the necessary resources.

Create a storage class for EFS

Next, create a Kubernetes StorageClass to provision Persistent Volumes (PVs) dynamically. Here’s an example YAML file:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: efs-sc
provisioner: efs.csi.aws.com
parameters:
  fileSystemId: <file-system-id>
  directoryPerms: "700"
  basePath: "/dynamic_provisioning"
  ensureUniqueDirectory: "true"

Replace <file-system-id> with your EFS File System ID.

Apply the file:

kubectl apply -f efs-storage-class.yaml

Create a persistent volume claim (PVC)

Now, let’s request some storage by creating a PersistentVolumeClaim (PVC):

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: efs-pvc
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 5Gi
  storageClassName: efs-sc

Apply the PVC:

kubectl apply -f efs-pvc.yaml

Use the EFS PVC in a pod

With the PVC created, you can now mount the EFS storage into a pod. Here’s a sample pod configuration:

apiVersion: v1
kind: Pod
metadata:
  name: efs-app
spec:
  containers:
  - name: app
    image: nginx
    volumeMounts:
    - mountPath: "/data"
      name: efs-volume
  volumes:
  - name: efs-volume
    persistentVolumeClaim:
      claimName: efs-pvc

Apply the configuration:

kubectl apply -f efs-pod.yaml

You can verify the setup by checking if the pod can access the mounted storage:

kubectl exec -it efs-app -- ls /data

A note on direct EFS mounting

You can mount EFS directly into pods without using a Persistent Volume (PV) or Persistent Volume Claim (PVC) by referencing the EFS file system directly in the pod’s configuration. This approach simplifies the setup but offers less flexibility compared to using dynamic provisioning with a StorageClass. Here’s how you can do it:

apiVersion: v1
kind: Pod
metadata:
  name: efs-mounted-app
  labels:
    app: efs-example
spec:
  containers:
  - name: nginx-container
    image: nginx:latest
    volumeMounts:
    - name: efs-storage
      mountPath: "/shared-data"
  volumes:
  - name: efs-storage
    csi:
      driver: efs.csi.aws.com
      volumeHandle: <file-system-id>
      readOnly: false

Replace <file-system-id> with your EFS File System ID. This method works well for simpler scenarios where direct access is all you need.

Final remarks

Mounting EFS to an EKS cluster gives you a powerful, shared storage solution for Kubernetes workloads. By following these steps, you can ensure that your applications have access to scalable, durable, and highly available storage without needing to worry about complex management or capacity issues.

As you can see, EFS acts like a giant, shared repository that all your applications can tap into. Whether you’re working on machine learning projects, collaborative tools, or any workload needing shared data, EFS and EKS together simplify the whole process.

Now that you’ve walked through mounting EFS on EKS, think about what other applications could benefit from this setup. It’s always fascinating to see how managed services can help reduce the time you spend on the nitty-gritty details, letting you focus on building great solutions.

How many pods fit on an AWS EKS node?

Managing Kubernetes workloads on AWS EKS (Elastic Kubernetes Service) is much like managing a city, you need to know how many “tenants” (Pods) you can fit into your “buildings” (EC2 instances). This might sound straightforward, but a bit more is happening behind the scenes. Each type of instance has its characteristics, and understanding the limits is key to optimizing your deployments and avoiding resource headaches.

Why Is there a pod limit per node in AWS EKS?

Imagine you want to deploy several applications as Pods across several instances in AWS EKS. You might think, “Why not cram as many as possible onto each node?” Well, there’s a catch. Every EC2 instance in AWS has a limit on networking resources, which ultimately determines how many Pods it can support.

Each EC2 instance has a certain number of Elastic Network Interfaces (ENIs), and each ENI can hold a certain number of IPv4 addresses. But not all these IP addresses are available for Pods, AWS reserves some for essential services like the AWS CNI (Container Network Interface) and kube-proxy, which helps maintain connectivity and communication across your cluster.

Think of each ENI like an apartment building, and the IPv4 addresses as individual apartments. Not every apartment is available to your “tenants” (Pods), because AWS keeps some for maintenance. So, when calculating the maximum number of Pods for a specific instance type, you need to take this into account.

For example, a t3.medium instance has a maximum capacity of 17 Pods. A slightly bigger t3.large can handle up to 35 Pods. The difference depends on the number of ENIs and how many apartments (IPv4 addresses) each ENI can hold.

Formula to calculate Max pods per EC2 instance

To determine the maximum number of Pods that an instance type can support, you can use the following formula:

Max Pods = (Number of ENIs × IPv4 addresses per ENI) – Reserved IPs

Let’s apply this to a t2.medium instance:

  • Number of ENIs: 3
  • IPv4 addresses per ENI: 6

Using these values, we get:

Max Pods = (3 × 6) – 1

Max Pods = 18 – 1

Max Pods = 17

So, a t2.medium instance in EKS can support up to 17 Pods. It’s important to understand that this number isn’t arbitrary, it reflects the way AWS manages networking to keep your cluster running smoothly.

Why does this matter?

Knowing the limits of your EC2 instances can be crucial when planning your Kubernetes workloads. If you exceed the maximum number of Pods, some of your applications might fail to deploy, leading to errors and downtime. On the other hand, choosing an instance that’s too large might waste resources, costing you more than necessary.

Suppose you’re running a city, and you need to decide how many tenants each building can support comfortably. You don’t want buildings overcrowded with tenants, nor do you want them half-empty. Similarly, you need to find the sweet spot in AWS EKS, enough Pods to maximize efficiency, but not so many that your node runs out of resources.

The apartment analogy

Consider an m5.large instance. Let’s say it has 4 ENIs, and each ENI can support 10 IP addresses. But, AWS reserves a few apartments (IPv4 addresses) in each building (ENI) for maintenance staff (essential services). Using our formula, we can estimate how many Pods (tenants) we can fit.

  • Number of ENIs: 4
  • IPv4 addresses per ENI: 10

Max Pods = (4 × 10) – 1

Max Pods = 40 – 1

Max Pods = 39

So, an m5.large can support 39 Pods. This limit helps ensure that the building (instance) doesn’t get overwhelmed and that the essential services can function without issues.

Automating the Calculation

Manually calculating these limits can be tedious, especially if you’re managing multiple instance types or scaling dynamically. Thankfully, AWS provides tools and scripts to help automate these calculations. You can use the kubectl describe node command to get insights into your node’s capacity or refer to AWS documentation for Pod limits by instance type. Automating this step saves time and helps you avoid deployment issues.

Best practices for scaling

When planning the architecture of your EKS cluster, consider these best practices:

  • Match instance type to workload needs: If your application requires many Pods, opt for an instance type with more ENIs and IPv4 capacity.
  • Consider cost efficiency: Sometimes, using fewer large instances can be more cost-effective than using many smaller ones, depending on your workload.
  • Leverage autoscaling: AWS allows you to set up autoscaling for both your Pods and your nodes. This can help ensure that you have the right amount of capacity during peak and off-peak times without manual intervention.

Key takeaways

Understanding the Pod limits per EC2 instance in AWS EKS is more than just a calculation, it’s about ensuring your Kubernetes workloads run smoothly and efficiently. By thinking of ENIs as buildings and IP addresses as apartments, you can simplify the complexity of AWS networking and better plan your deployments.

Like any good city planner, you want to make sure there’s enough room for everyone, but not so much that you’re wasting space. AWS gives you the tools, you just need to know how to use them.

Helm or Kustomize for deploying to Kubernetes?

Choosing the right tool for continuous deployments is a big decision. It’s like picking the right vehicle for a road trip. Do you go for the thrill of a sports car or the reliability of a sturdy truck? In our world, the “cargo” is your application, and we want to ensure it reaches its destination smoothly and efficiently.

Two popular tools for this task are Helm and Kustomize. Both help you manage and deploy applications on Kubernetes, but they take different approaches. Let’s dive in, explore how they work, and help you decide which one might be your ideal travel buddy.

What is Helm?

Imagine Helm as a Kubernetes package manager, similar to apt or yum if you’ve worked with Linux before. It bundles all your application’s Kubernetes resources (like deployments, services, etc.) into a neat Helm chart package. This makes installing, upgrading, and even rolling back your application straightforward.

Think of a Helm chart as a blueprint for your application’s desired state in Kubernetes. Instead of manually configuring each element, you have a pre-built plan that tells Kubernetes exactly how to construct your environment. Helm provides a command-line tool, helm, to create these charts. You can start with a basic template and customize it to suit your needs, like a pre-fabricated house that you can modify to match your style. Here’s what a typical Helm chart looks like:

mychart/
  Chart.yaml        # Describes the chart
  templates/        # Contains template files
    deployment.yaml # Template for a Deployment
    service.yaml    # Template for a Service
  values.yaml       # Default configuration values

Helm makes it easy to reuse configurations across different projects and share your charts with others, providing a practical way to manage the complexity of Kubernetes applications.

What is Kustomize?

Now, let’s talk about Kustomize. Imagine Kustomize as a powerful customization tool for Kubernetes, a versatile toolkit designed to modify and adapt existing Kubernetes configurations. It provides a way to create variations of your deployment without having to rewrite or duplicate configurations. Think of it as having a set of advanced tools to tweak, fine-tune, and adapt everything you already have. Kustomize allows you to take a base configuration and apply overlays to create different variations for various environments, making it highly flexible for scenarios like development, staging, and production.

Kustomize works by applying patches and transformations to your base Kubernetes YAML files. Instead of duplicating the entire configuration for each environment, you define a base once, and then Kustomize helps you apply environment-specific changes on top. Imagine you have a basic configuration, and Kustomize is your stencil and spray paint set, letting you add layers of detail to suit different environments while keeping the base consistent. Here’s what a typical Kustomize project might look like:

base/
  deployment.yaml
  service.yaml

overlays/
  dev/
    kustomization.yaml
    patches/
      deployment.yaml
  prod/
    kustomization.yaml
    patches/
      deployment.yaml

The structure is straightforward: you have a base directory that contains your core configurations, and an overlays directory that includes different environment-specific customizations. This makes Kustomize particularly powerful when you need to maintain multiple versions of an application across different environments, like development, staging, and production, without duplicating configurations.

Kustomize shines when you need to maintain variations of the same application for multiple environments, such as development, staging, and production. This helps keep your configurations DRY (Don’t Repeat Yourself), reducing errors and simplifying maintenance. By keeping base definitions consistent and only modifying what’s necessary for each environment, you can ensure greater consistency and reliability in your deployments.

Helm vs Kustomize, different approaches

Helm uses templating to generate Kubernetes manifests. It takes your chart’s templates and values, combines them, and produces the final YAML files that Kubernetes needs. This templating mechanism allows for a high level of flexibility, but it also adds a level of complexity, especially when managing different environments or configurations. With Helm, the user must define various parameters in values.yaml files, which are then injected into templates, offering a powerful but sometimes intricate method of managing deployments.

Kustomize, by contrast, uses a patching approach, starting from a base configuration and applying layers of customizations. Instead of generating new YAML files from scratch, Kustomize allows you to define a consistent base once, and then apply overlays for different environments, such as development, staging, or production. This means you do not need to maintain separate full configurations for each environment, making it easier to ensure consistency and reduce duplication. Kustomize’s patching mechanism is particularly powerful for teams looking to maintain a DRY (Don’t Repeat Yourself) approach, where you only change what’s necessary for each environment without affecting the shared base configuration. This also helps minimize configuration drift, keeping environments aligned and easier to manage over time.

Ease of use

Helm can be a bit intimidating at first due to its templating language and chart structure. It’s like jumping straight onto a motorcycle, whereas Kustomize might feel more like learning to ride a bike with training wheels. Kustomize is generally easier to pick up if you are already familiar with standard Kubernetes YAML files.

Packaging and reusability

Helm excels when it comes to packaging and distributing applications. Helm charts can be shared, reused, and maintained, making them perfect for complex applications with many dependencies. Kustomize, on the other hand, is focused on customizing existing configurations rather than packaging them for distribution.

Integration with kubectl

Both tools integrate well with Kubernetes’ command-line tool, kubectl. Helm has its own CLI, helm, which extends kubectl capabilities, while Kustomize can be directly used with kubectl via the -k flag.

Declarative vs. Imperative

Kustomize follows a declarative mode, you describe what you want, and it figures out how to get there. Helm can be used both declaratively and imperatively, offering more flexibility but also more complexity if you want to take a hands-on approach.

Release history management

Helm provides built-in release management, keeping track of the history of your deployments so you can easily roll back to a previous version if needed. Kustomize lacks this feature, which means you need to handle versioning and rollback strategies separately.

CI/CD integration

Both Helm and Kustomize can be integrated into your CI/CD pipelines, but their roles and strengths differ slightly. Helm is frequently chosen for its ability to package and deploy entire applications. Its charts encapsulate all necessary components, making it a great fit for automated, repeatable deployments where consistency and simplicity are key. Helm also provides versioning, which allows you to manage releases effectively and roll back if something goes wrong, which is extremely useful for CI/CD scenarios.

Kustomize, on the other hand, excels at adapting deployments to fit different environments without altering the original base configurations. It allows you to easily apply changes based on the environment, such as development, staging, or production, by layering customizations on top of the base YAML files. This makes Kustomize a valuable tool for teams that need flexibility across multiple environments, ensuring that you maintain a consistent base while making targeted adjustments as needed.

In practice, many DevOps teams find that combining both tools provides the best of both worlds: Helm for packaging and managing releases, and Kustomize for environment-specific customizations. By leveraging their unique capabilities, you can build a more robust, flexible CI/CD pipeline that meets the diverse needs of your application deployment processes.

Helm and Kustomize together

Here’s an interesting twist: you can use Helm and Kustomize together! For instance, you can use Helm to package your base application, and then apply Kustomize overlays for environment-specific customizations. This combo allows for the best of both worlds, standardized base configurations from Helm and flexible customizations from Kustomize.

Use cases for combining Helm and Kustomize

  • Environment-Specific customizations: Use Kustomize to apply environment-specific configurations to a Helm chart. This allows you to maintain a single base chart while still customizing for development, staging, and production environments.
  • Third-Party Helm charts: Instead of forking a third-party Helm chart to make changes, Kustomize lets you apply those changes directly on top, making it a cleaner and more maintainable solution.
  • Secrets and ConfigMaps management: Kustomize allows you to manage sensitive data, such as secrets and ConfigMaps, separately from Helm charts, which can help improve both security and maintainability.

Final thoughts

So, which tool should you choose? The answer depends on your needs and preferences. If you’re looking for a comprehensive solution to package and manage complex Kubernetes applications, Helm might be the way to go. On the other hand, if you want a simpler way to tweak configurations for different environments without diving into templating languages, Kustomize may be your best bet.

My advice? If the application is for internal use within your organization, use Kustomize. If the application is to be distributed to third parties, use Helm.

Traffic Control in AWS VPC with Security Groups and NACLs

In AWS, Security Groups and Network ACLs (NACLs) are the core tools for controlling inbound and outbound traffic within Virtual Private Clouds (VPCs). Think of them as layers of security that, together, help keep your resources safe by blocking unwanted traffic. While they serve a similar purpose, each works at a different level and has distinct features that make them effective when combined.

1. Security Groups as room-level locks

Imagine each instance or resource within your VPC is like a room in a house. A Security Group acts as the lock on each of those doors. It controls who can get in and who can leave and remembers who it lets through so it doesn’t need to keep asking. Security Groups are stateful, meaning they keep track of allowed traffic, both inbound and outbound.

Key Features

  • Stateful behavior: If traffic is allowed in one direction (e.g., HTTP on port 80), it automatically allows the response in the other direction, without extra rules.
  • Instance-Level application: Security Groups apply directly to individual instances, load balancers, or specific AWS services (like RDS).
  • Allow-Only rules: Security Groups only have “allow” rules. If a rule doesn’t permit traffic, it’s blocked by default.

Example

For a database instance on RDS, you might configure a Security Group that allows incoming traffic only on port 3306 (the default port for MySQL) and only from instances within your backend Security Group. This setup keeps the database shielded from any other traffic.

2. Network ACLs as property-level gates

If Security Groups are like room locks, NACLs are more like the gates around a property. They filter traffic at the subnet level, screening everything that tries to get in or out of that part of the network. NACLs are stateless, so they don’t keep track of traffic. If you allow inbound traffic, you’ll need a separate rule to permit outbound responses.

Key Features

  • Stateless behavior: Traffic allowed in one direction doesn’t mean it’s automatically allowed in the other. Each direction needs explicit permission.
  • Subnet-Level application: NACLs apply to entire subnets, meaning they cover all resources within that network layer.
  • Allow and Deny rules: Unlike Security Groups, NACLs allow both “allow” and “deny” rules, giving you more granular control over what traffic is permitted or blocked.

Example

For a public-facing web application, you might configure a NACL to block any IPs outside a specific range or region, adding a layer of protection before traffic even reaches individual instances.

Best practices for using security groups and NACLs together

Combining Security Groups and NACLs creates a multi-layered security setup known as defense in depth. This way, if one layer misconfigures, the other provides a safety net.

Use security groups as your first line of defense

Since Security Groups are stateful and work at the instance level, they should define specific rules tailored to each resource. For example, allow only HTTP/HTTPS traffic for frontend instances, while backend instances only accept requests from the frontend Security Group.

Reinforce with NACLs for subnet-level control

NACLs are stateless and ideal for high-level filtering, such as blocking unwanted IP ranges. For example, you might use a NACL to block all traffic from certain geographic locations, enhancing protection before traffic even reaches your Security Groups.

Apply NACLs for public traffic control

If your application receives public traffic, use NACLs at the subnet level to segment untrusted traffic, keeping unwanted visitors at bay. For example, you could configure NACLs to block all ports except those explicitly needed for public access.

Manage NACL rule order carefully

Remember that NACLs evaluate traffic based on rule order. Rules with lower numbers are prioritized, so keep your most restrictive rules first to ensure they’re applied before others.

Applying layered security in a Three-Tier architecture

Imagine a three-tier application with frontend, backend, and database layers, each in its subnet within a VPC. Here’s how you could use Security Groups and NACLs:

Security Groups

  • Frontend: Security Group allows inbound traffic on ports 80 and 443 from any IP.
  • Backend: Security Group allows traffic only from the frontend Security Group, for example, on port 8080.
  • Database: Security Group allows traffic only from the backend Security Group, on port 3306 (for MySQL).

NACLs

  • Frontend Subnet: NACL allows inbound traffic only on ports 80 and 443, blocking everything else.
  • Backend Subnet: NACL allows inbound traffic only from the frontend subnet and blocks all other traffic.
  • Database Subnet: NACL allows inbound traffic only from the backend subnet and blocks all other traffic.

In a few words

  • Security Groups: Act at the instance level, are stateful, and only permit “allow” rules.
  • NACLs: Act at the subnet level, are stateless, and allow both “allow” and “deny” rules.
  • Combining Security Groups and NACLs: This approach gives you a layered “defense in depth” strategy, securing traffic control across every layer of your VPC.