Dockerfile tips for production – [latest]

Dockerfile tips for production latest
Dockerfile tips for production latest

This article covers the latest Dockerfile tips and suggestions for writing clean and reliable Dockerfiles for production. Also, it covers the recommended best practices and methods for building efficient images, which will be highly consumable independent of any environment (cloud or On-premise). Also, take a  look into following references for more comprehensive guidelines Docker documentation – Best practices for writing Dockerfiles

What is a Dockerfile?

Dockerfile are scripts that contain a set of ordered commands, that are executed in the chronological order, by Docker and results as an Image. 

How do I create a basic Dockerfile?

Dockerfile always starts with the base image using the FROM instruction. The following are the basic set of commands that are used to create a dockerfile. 

FROM: Define the base image of the container.

MAINTAINER: Optional, it contains the name of the maintainer of the image.

COPY:  Adds files from your Docker client’s current directory.

RUN: Define the command(s) in the context of the image.

Expose <port>: Establish the ports where container accepts the connections 

ENV var = value: Define the environment variables.

CMD: Specifies the arguments that get passed to the ENTRYPOINT (for arguments)

ENTRYPOINT: Define the default command that will be executed when the container is started.

You might like : List of useful docker commands

Best Practices for writing Dockerfiles in production

A. Go with Small base image

A Docker image should simply contain what is strictly necessary to run the application. The best choice to reduce the image size would be leveraging Alpine or Distroless images for your docker containers. Alphine/distroless images will reduce the size of the docker container significantly (up to 5 times). Using a small base image will also reduce the image pull time whenever a new node gets spun in high transaction applications.

For example, if we pull specific node version node:10.19.0, the below results are evident that Alpine/distroless images are much smaller than normal distributions. More details can be referred to Alphine and Distroless.

REPOSITORYSIZE
Node: 10.19.0911MB
Node: 10.19.0-alpine83.5MB
gcr.io/distroless/nodejs81.2MB
Alpine vs Distroless

B. Minimize the number of image layers

Please keep in mind each instruction is run independently and creates a new layer in the docker image. But only commands like RUN, COPY, and ADD create layers and other instructions create temporary images and do not increase the size of the build. Refer to the below example for reference.

Not Recommended

RUN command1
RUN command 2

Recommended

 RUN command && \
     command2

C. Docker shell or exec mode?

Always there is slight confusion on when to use CMD, RUN and Entrypoint, Generally Docker supports all the three instructions in two ways of doing that in shell or exec form.

What is Shell form: Docker will wrap the below instructions in bin/sh -c and shell processing happens for the same. 

1. RUN <command> 
       Ex: RUN apt-get install nodejs
2. CMD <command> 
       Ex: CMD echo "Test 1"
3. ENTRYPOINT <command> 
       Ex: ENTRYPOINT echo "Test 2"

What is Exec form: The exec form will call the executable directly and it only runs while the container primary process is running and at that time shell processing does not happen.

1. RUN ["executable", "param1",...]  
    Ex: RUN ["apt-get", "install", "nodejs"]
2. CMD ["executable", "param1",...]  
    Ex: CMD ["/bin/echo", "Test1"]
3. ENTRYPOINT ["executable", "param1",...]  
    Ex: ENTRYPOINT ["/bin/echo", "Test 2"]

When to use RUN / CMD / ENTRYPOINT ?

RUN instructions will execute a command in a new layer on top of the current base image and commit the results. Use RUN for building the image. 

Use CMD instruction when you want to execute an ad-hoc command when the container launched. Meaning, it runs the given instructions when the container started. Though multiple CMD instructions present in Dockerfile; only last CMD will take effect as part container execution.

We can override the RUN command which is already defined in the dockerfile by using Docker RUN command at the time of launching the container.

Entrypoint is Similar to RUN. Use Entrypoint when you want to build an executable image and ideal use case would be, execute if you have any startup build scripts before reading arguments.

If Entrypoint has been defined, the CMD instruction will be interpreted as an argument to the Entrypoint. (In this case, make sure you use exec format) – Meaning, Entrypoint instruction does not allow to overriding of exec command through docker run command, it will happen through the CMD instruction. 

Bottom line

  1. RUN instruction is similar to CMD/Entrypoint instruction, the only difference is RUN commands are called when the image is built. CMD & Entry point commands when the container is launched.
  2. Entrypoint specifies the executables invoked when the container is started.
  3. CMD Specifies the args that get passed to the Entrypoint. Finally: Dockerfile should have either one of CMD or ENTRYPOINT commands mandatorily.

D. FROM: Use only verified and approved docker images

From command is the registry where you will refer the image to be pulled from. If you don’t specify a version in your tag then the FROM keyword will just use the latest and Docker by default allows pulling docker images from the public registry which might be exposed to security vulnerabilities and not validated their authenticity. One more dirty issue would be if there is a commit to the base image in the public registry which you are already using in your environment, it would immediately affect your built processes or maybe lead to a production outage.

Recommendation

Build a private registry internal to an organization with a set of base images and make sure the base images (Docker security Scanning)  are free from security vulnerabilities. Bottom line: Every image should be pulled from the organization’s private registry and should be used. 

Example

FROM docker-registry.default.svc:5000/RaghuProject/rpnodejs:v1.0

E. Sign Docker images using Docker content trust

Signing Docker images using Docker Content Trust. It basically prevents a user from using a container image from an unknown source, it will also prevent a user from building a container image from a base layer from an unknown source. Make it a best practice that you always verify images before pulling them in or out. Enable Docker content trust with the below command and make sure you are not running arbitrary code. 

Example

export DOCKER_CONTENT_TRUST=1

F. Use a specific tag to pull an image

Use more specifc tag for your base images. In case if you are using Latest tag, there can be breaking changes over time committed and might result in failing builds and outage of your environment. 

Not recommended

FROM openjdk: latest

Recommended

FROM openjdk:13

There are a lot more tags available so check out the Docker Hub documentation. For more information.

G. Prefer COPY over ADD

Why we have to always Prefer COPY over ADD ?. Please go through my recommendation below

COPY: Used to copy local files from source to destination recursively when building a container.

 COPY <src>... <dest>

ADD:  Both ADD and COPY adds local files, but ADD does some additional magic like adding remote files and unzipping and untaring archives.

ADD <src>... <dest>

Recommendation

In terms of security, The best use case for ADD is to untar the local files into the image. Don’t use ADD to download data from remote URLs directly, because we don’t know what kind of vulnerabilities we will introduce into our environment. Instead, use WGET or CURL, etc to download into your local environment and delete the tar file after untar is done. This will also reduce one layer in your image and also the image size 🙂

Example: Instead of using ADD, I am using CURL to get the GoLang tar file and removing the tar file once the untaring is done.

RUN mkdir -p /usr/opt/ \
&& curl -fsSL \ https://golang.org/dl/go$GOLANG_VERSION.linux-amd64.tar.gz -o golang.tar.gz \
&& tar -C /usr/opt -xzf golang.tar.gz \
&& rm golang.tar.gz

H. Understand the difference WORKDIR vs  ENV vs ARGS

WORKDIR instruction provides a way to set the working directory for the container, RUN, CMD, ENNTRYPOINT to be executed when the container is launched from the image. Note: Another way to override the working directory in runtime is with -w flag.

ENV refers to the environment variables. The environment variable which has been set in the dockerfile will be available in the container run time. Note: Another way to override the ENV variable in runtime is  with –env <key>=<value>

ARGS instruction refers to a variable that is used at the time of the image build process. ARG values are not available after the image is built. Once a container is launched successfully it will not have access to an ARG variable value.

Use WORKDIR to set your working directory across multiple commands and ENV to set environment variables for container runtime and ARGS to set variables for the image build process.

I. Why the docker container needs to be run as root?

As you aware docker always runs the container with the root until and unless you specify the USER in the dockerfile. Why we have to run the container as root user? what is the impact? Once the container is launched with root privileges, by default it allows the hackers to run any process like (install packages/edit configs/bind privileged ports/access to host .. etc) and makes your application vulnerable.

To avoid these issues; we have to run containers such as non-root with dedicated users and group is one of the most popular best practices for security.

Add a Non-Root User to Dockerfile

Create a user with only minimal permissions as is required by the container and use root access for specific tasks that require higher privileges. Meaning that we can use user directive more than once appropriately in the docker file. The below example will Add user to Docker container and start the container with a non-root user.

Example 1

FROM node:10-alpine

# Copy source to container
RUN mkdir -p /usr/app/src

# Copy source code
COPY src /usr/app/src
COPY package.json /usr/app
COPY package-lock.json /usr/app

WORKDIR /usr/app

# Create a user group 'xyzgroup' and # Create a user 'appuser' under 'testgroup'
RUN addgroup -S testgroup \
    && RUN adduser -S -D -h /usr/app/src appuser testgroup

# Chown all the files to the app user.
RUN chown -R appuser:testgroup /usr/app

# Switch to 'appuser'
USER appuser

# Start the process
CMD ["npm", "start"]

The below example shows to add multiple user directive in the dockerfile for running privileged tasks with root and non-root privileges.

Example 2

# Switch to 'Root' 
USER Root

# Copy source code
COPY folder1/testscript.sh /usr/app/src/testscript.sh
RUN chmod +x /usr/app/src/testscript.sh
RUN chown appuser /appfolder/testscript.sh

# Switch to 'appuser'
USER appuser

USER myuser
CMD ["/appfolder/testscript.sh"]

J. Multistage build for slimmed-down images for production?

When building your application with a dockerfile, many layers added to the image (Dev tools, libs, dependencies), eventually, you have to remember to clean up the artifacts you don’t need in order to move on to the next layer.

Otherwise, we can have two dockerfile for Development and production, but having two dockerfile is not ideal.

Here comes multistage and allows us to create multiple temp images in the build process.  So you can bundle all the dependencies in the first step (as explained below) and copy only required files to the next stage. This will result in a slimmed-down image for production. 

Also generally, you should have the image contained exactly what was needed to run in production. This has been referred to as the “builder pattern”

Stage 1

FROM golang:1.7.3
WORKDIR /go/src/github.com/alexellis/href-counter/
RUN go get -d -v golang.org/x/net/html  
COPY app.go .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app .

Stage 2

FROM alpine:latest  
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=0 /go/src/github.com/alexellis/href-counter/app .
CMD ["./app"]

How does it work? 

  1. The first stage will built the image with the alpine: latest image as its base. 
  2. The second stage will copies just the built artifact from the first stage into the second stage. 
  3. Result: The Go SDK and any intermediate artifacts are left behind, and not saved in the final image.

Conclusion

There are more best practices to learn when it comes to dockerfile and below is my closing points. 

  1. Make your image size as small as possible ( multistage builds & Alphine/Distroless images)
  2. Use labels to your image to help organize images by project, version and for better automation.
  3. Split long and complex RUN statements into multiple lines separated with a backslash to make your dockerfile more readable.
  4. Run containers with non-root user privileges.
  5. Signing Docker images using Docker Content Trust
  6. Implement a workflow that enforces frequent image scanning and remove Stale images or images that haven’t been used.
  7. Have a secrets management tool instead of saving the secrets in your images or Dockerfiles. 
  8. Use a specific tag to pull an image
  9. Use Copy instead of ADD 
  10. Keep learning.

2 thoughts on “Dockerfile tips for production – [latest]”

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top