Docker has become a cornerstone technology for building and deploying applications in modern software development. At the heart of Docker lies the Dockerfile, a configuration file that defines how a container image should be built. This guide explores the essential commands that every DevOps engineer must master to create efficient and secure Dockerfiles.
Essential commands
1. RUN vs CMD: Understanding the fundamentals
The RUN
command executes instructions during image build, while CMD
defines the default command to run when the container starts.
# RUN example
RUN apt-get update && \
apt-get install -y python3 pip && \
rm -rf /var/lib/apt/lists/*
# CMD example
CMD ["python3", "app.py"]
2. Multi-Stage builds: Optimizing image size
Multi-stage builds allow you to create lightweight images by separating the build and runtime environments.
# Build stage
FROM node:16 AS builder
WORKDIR /build
COPY package*.json ./
RUN npm install
COPY . .
RUN npm run build
# Production stage
FROM nginx:alpine
COPY --from=builder /build/dist /usr/share/nginx/html
3. EXPOSE: Documenting ports
EXPOSE
documents which ports will be available at runtime.
EXPOSE 3000
4. Variables with ARG and ENV
ARG
defines build-time variables, while ENV
sets environment variables for the running container.
ARG NODE_VERSION=16
FROM node:${NODE_VERSION}
ENV APP_PORT=3000
ENV APP_ENV=production
5. LABEL: Image metadata
Add useful metadata to your image to improve documentation and maintainability.
LABEL version="2.0" \
maintainer="dev@example.com" \
description="Example web application" \
org.opencontainers.image.source="https://github.com/user/repo"
6. HEALTHCHECK: Container health monitoring
Define how Docker should check if your container is healthy.
HEALTHCHECK --interval=45s --timeout=10s --start-period=30s --retries=3 \
CMD wget --quiet --tries=1 --spider http://localhost:3000/health || exit 1
7. VOLUME: Data persistence
Declare mount points for persistent data.
VOLUME ["/app/data", "/app/logs"]
8. WORKDIR: Container organization
Set the working directory for subsequent instructions.
WORKDIR /app
COPY . .
RUN npm install
9. ENTRYPOINT vs CMD: Execution control
ENTRYPOINT
defines the main executable, while CMD
provides default arguments.
ENTRYPOINT ["nginx"]
CMD ["-g", "daemon off;"]
10. COPY vs ADD: File transfer
COPY
is more explicit and preferred for local files, while ADD
has additional features like auto-extraction of archives.
# COPY examples - preferred for simple file copying
COPY package*.json ./ # Copy package.json and package-lock.json
COPY src/ /app/src/ # Copy entire directory
# ADD examples - useful for archive extraction
ADD project.tar.gz /app/ # Automatically extracts the archive
ADD https://example.com/file.zip /tmp/ # Downloads and copies remote file
Key differences:
- Use
COPY
for straightforward file/directory copying - Use
ADD
when you need automatic archive extraction or remote URL handling COPY
is preferred for better transparency and predictability
11. USER: Container security
Specify which user should run the container.
RUN adduser --system --group appuser
USER appuser
12. SHELL: Interpreter customization
Define the default shell for RUN commands.
SHELL ["/bin/bash", "-c"]
Best practices and optimizations
- Minimize layers:
- Combine related RUN commands using
&&
- Clean up caches and temporary files in the same layer
- Combine related RUN commands using
- Cache optimization:
- Place less frequently changing instructions first
- Separate dependency installation from code copying
- Security:
- Use official and updated base images
- Avoid exposing secrets in the image
- Run containers as non-root users
Putting it all together
Mastering these Dockerfile commands is essential for any modern DevOps or SRE engineer. Each instruction is crucial in creating efficient, secure, and maintainable Docker images. By following these best practices and understanding when to use each command, you can create containers that not only work correctly but are also optimized for production environments.
A good Dockerfile is like a well-written recipe: it should be clear, reproducible, and efficient. The key is finding the right balance between functionality, performance, and security.