Dockerfile best practices - stories from the field
This is a collection of things I have come across using Docker over the years and working with docker in kubernetes across thousands of projects.
This is a collection of things I have come across using Docker over the years and working with kubernetes across thousands of projects. The hope is that there is a nugget of wisdom somewhere in this post, and that it can help on your next project.
I framed the post as a series of best practices, each one has a much larger story behind it, but I decided to keep this post positive 😁. I will likely continue to update this post as new information comes to light.
1. Create tiny images
Your ideal image should be as small as possible to do its job. No unnecessary binaries, temporary files, development tools, uncompiled code, documentation etc should be in the final image.
Your ideal image should be as small as possible to do its job.
- me, just now
Why this is important? Because size matters when it comes to docker images. The time to deploy your application into kubernetes will be largely determined by the image size. Smaller overall images push and pull faster into registries. Deployments are faster, applications scale horizontally faster in kubernetes. Life is generally better.
You can find the sizes of your local docker images with this handy command (source):
In kubernetes, this is a little trickier, as the kubernetes pod
API does not return image size, the node
API does however, and with some awk
magic, you can get a list of images sorted by size descending in MiB (inspired by this post):
In general, I consider images in the low hundreds of MB to be OK, anything over 1GB to be large, and anything over 2 GB likely needs optimisation 😱. This obviously depends a lot on the language you are using (e.g. Go can build native executable binaries with no runtime), the size of the application (e.g. lines of code) and the Operating System required to run that code (e.g. Alpine Linux vs Ubuntu).
Avoid adding development and build related resources to the final image
The main issue with adding tools like gcc
(which is needed to compile code) into your images is that you don't need this binary in production.
There are several techniques to deal with this ranging from virtual dependencies to builder images (covered later in this post).
Most package managers/languages also have flags you can use to toggle a production build. Production builds omit development dependencies, and will overall lead to smaller images.
Go builds can also be optimised a lot, and have things like cross platform bits removed, and debugging information. See this blog post on some tips and tricks to reduce Go binary size.
Virtual packages
In Alpine Linux you can also make use of virtual packages, to which allow you to install build related packages, and then clean them up. This is extremely useful for compilation.
N.B. you should run all commands in the same RUN
command to ensure the layer stays lightweight.
Builder images (multi-stage builds)
Probably the most popular way to keep your resulting images small where compilation is involved is to make use of a builder image (AKA multi-stage builds).
Say you have a Nodejs application, that requires a compilation step, but you don't want Nodejs installed in your production running container (just Nginx serving the static output).
This is sometimes called multi-stage dockerfiles, and has been a feature of docker since 2017.
For a real life example of this (based on actual code), first you need to create your 'builder' intermediate image, this image will have the full development tools installed in it, and will be used to compile the application:
FROM uselagoon/node-14-builder:latest as builder
RUN npm install --pure-lockfile
RUN npm install -y yarn
RUN yarn build
After you have compiled the application (e.g. with yarn
or something as above), you copy the resulting (static) artifacts into a clean final image:
FROM uselagoon/nginx:latest
COPY --from=builder /app/dist /app
The resulting Nginx image is fairly tiny, this particular one is around 133MB. This also has the nice side effect of being completely static, nothing to hack, no dynamic languages. A nice side effect of tiny images is a tiny attack surface. This will also make your security people happy.
For more information see https://codefresh.io/docker-tutorial/node_docker_multistage/ and https://docs.docker.com/develop/develop-images/multistage-build/#use-multi-stage-builds
Select the best base image
A rookie docker mistake is to start from a fairly generic base image, and then customise it to the nth degree. There are vast libraries of images out there already that you can elect to start from to give yourself a head start.
Using another organisation's images does come with certain points to ponder:
- Who maintains the images?
- How fast do they release new versions when a security issue is identified?
- Do you trust them?
- Is the code open source?
At amazee.io, the Lagoon team looks after a suite of upstream docker images in the uselagoon namespace. These images in turn (for the most part) inherit from specialised builds of Alpine Linux. e.g. this is a line from the PHP 8.0 FPM dockerfile:
By selecting the best starting point for your images, will mean less code you need to maintain, less layers in your dockerfiles, and faster builds.
Most applications also function perfectly well on Alpine Linux (a lightweight Linux distribution) to which can run on a 8MB image. If you have not checked this out, do it, your dockerhost thanks you in advance.
If you are using Rust or Go, you should consider using scratch
(blog post on this topic) or distroless static
(blog post on this topic) as your base image (which are extremely basic and lightweight images). Distroless static
is the same as scratch
but with a few niceties such as CA certificates installed, a functioning /tmp
directory etc. All of this for 1 additional MB, sounds pretty good.
2. Docker Layer Caching (DLC)
During a build, docker steps through each line one at a time. As each line is read, docker attempts to match this against it's cache to see if the step can be re-used from cache.
Making effective use of layer caching can speed up your build times a lot. It is important to note that only certain commands create layers, namely ADD
, COPY
and RUN
.
See https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#leverage-build-cache
Ordering layers to increase cache hits
The order in which you create layers matters. In general, you want to have the most static things at the beginning of the docker file (e.g. environment variables) and more volatile things towards the end (e.g. code changes by a developer).
As soon as you have a cache miss on a layer, then all subsequent layers will need to be rebuilt.
This is especially useful for COPY
commands. You want to structure the dockerfile such that the files that are most frequently changes are copied in as late as possible.
Inlining commands to reduce layers
Rather that create a layer for each similar command, you can chain lines with a backslash \
to ensure only a single layer is created.
RUN apk --no-cache add \
bash \
bind-tools \
ca-certificates \
curl \
git \
unzip
A pro tip is to ensure only 1 package per line, and keep them in alphabetical order. This helps to ensure the next PR to update the list is a tiny bit easier.
Breaking layer cache
You will run into situations where Operating System packages, NPM packages or a Git repo are updated to newer versions but as your Dockerfile or package.json
hasn't updated, docker will continue using the cache. This may be less than ideal for your particular circumstance.
A quick way to 'bust the cache' is to define the use of a build time variable, and run a simple command to use it. Lagoon injects the SHA of the commit as such a variable, so if you do want to ensure the layer is built fresh, you can quite easily:
Other build systems will likely have a similar way to obtain the SHA of the commit.
3. .dockerignore
files
This will prevent certain local files and debug logs from being copied onto your Docker image and possibly overwriting files installed within your image during the build.
It also is a good idea to not copy your dockerfiles themselves into the docker image.
My best advice here is to SSH into a running container and inspect the files you have left in there. If you see anything that is not essential to production runtime, then consider not COPY
'ing it, or adding it to .dockerignore
.
See https://docs.docker.com/engine/reference/builder/#dockerignore-file
Comments
If you have any neat tips or tricks, please let me know!