I’ve been writing software for quite some time. At the beginning of my career, progress on projects frequently came to a screeching halt when I had to answer the question, “How do I build and deploy my application now?”
I’ve worked at companies of all sizes in half a dozen different industries and every one of those companies has a different process for building and deploying applications. However, over the past five years, there’s been one phrase that’s been commonplace: “Just build a Docker container.” Of course there are a lot of steps that happen before and after building the Docker container, but Docker unifies build processes and makes application deployment significantly easier than it used to be.
On the surface, building a Docker container for your application is quite simple. I won’t go into the details here on how it’s done as I assume that you either already know how to do it or are willing to read one of the 54.3 million options on Google that are returned when you search “How do I build a Docker image?”
When you build a Docker image, Docker creates a series of layers for you, each one representing a cached step in the build process. When you push your Docker image, all of those layers (which don’t exist on the remote server) are pushed to that server. When someone pulls down that Docker image, they retrieve all of the layers of the image which don’t already exist on their machine. This saves a lot of time and space for many builds. However, it falls short in a few areas.
Secret values present a potential Docker security vulnerability
Part of your build process might require use of a secret value (such as a password). This is common when pulling artifacts from a private, remote server such as Nexus, Artifactory, AWS S3, or anywhere that you might keep Python packages, JARs, or really anything that only folks within your organization should have access to download. If you run a download step with some secret value using RUN as part of your build process, then that secret will be available in the cached layer to anyone who downloads your image. This is a security vulnerability.
For years now, folks have circumvented this problem by using multi-stage builds, essentially building two (or more) images within one Dockerfile. Suppose you need to use pipenv to download and install Python packages from a private, secure PYPI server. The first image is built with something like this:
FROM python:3.7 AS builder
followed by a:
RUN PIPENV_PYPI_MIRROR=somethingwithyoursecret@server pipenv install
Next, we specify our base image again with FROM and we build our image as we usually would, but instead of downloading the packages this time, we’ll just COPY them out of the previous image:
FROM python:3.7 COPY --from=builder /tmp/python_packages /opt/python_packages
This is great because the layers created by the first image are not cached, so the secret isn’t stored and isn’t pushed with our final image. However, this approach falls short because the layers created by the first image are not cached, so the next time we build this image, the whole thing will be rebuilt from scratch.
Luckily, Docker released version 18.09 (circa late 2018) with support for BuildKit which solves this problem. BuildKit makes a number of improvements to Docker, but most notably in this case, it allows us to mount the secret into the image, avoid multi-stage builds, and maximize Docker’s ability to cache layers.
Given our theoretical build process that requires connecting to our private PYPI server to install Python packages, you could update your build to use Docker BuildKit like so:
- Add:
# syntax = docker/dockerfile:1.0-experimental
to the top of your Dockerfile
- Put the full secret (in this case username:password@yourpypiserver) into a file on your machine. Call it something like:
pypi_secret.txt
- Update your build command to something like this:
DOCKER_BUILDKIT=1 docker build --secret id=pipenv_pypi_mirror,src=pypi_secret.txt --output=plain
(Note: BuildKit will try to collapse output in the terminal. The –output=plain bit will stop this)
- And then in your build, you’d just update your pipenv install command to this instead:
RUN --mount=type=secret,id=pipenv_pypi_mirror PIPENV_PYPI_MIRROR=`cat /run/secrets/pipenv_pypi_mirror` pipenv install
The syntax is weird and I’ll admit that I have to look it up anytime I plan to use it, but the results are phenomenal. With this, you’ll get the standard caching that Docker provides without publishing your secrets to anyone who downloads your image.
Consider this work-around for a standard caching challenge
However, we still have a problem with regard to caching: if we make changes to our list of required Python packages, or if a build step prior to our pip install changes and invalidates layers in the cache, we have to re-download and re-install the Python packages again on the next build.
To get around this, we can specify a directory in the container that we would like to mount as a cache. This way, if the pip install step ever needs to run again, packages can be pulled from the cache instead of being pulled from a remote server. To do this, simply mount the directory with:
RUN --mount=type=cache,target=~/.cache/pip pip install -r requirements.txt
(Note: There’s a full list of other options available on GitHub.)
The first time you run your docker build command, you’ll see things like this:
Downloading numpy-1.19.1-cp37-cp37m-manylinux2010_x86_64.whl (14.5 MB)
If you change your `requirements.txt` file by adding or removing a package (such that the required version of numpy does not change) and re-run the build, the layer will technically be invalidated, but you’ll see output like this instead:
Using cached numpy-1.19.1-cp37-cp37m-manylinux2010_x86_64.whl (14.5 MB)
Docker BuildKit addresses security, caching and build performance concerns that avid Docker users have had with Docker since its inception. The technology is currently marked as experimental and isn’t supported yet by most build framework plugins. However, it provides a large number of benefits that make it worth using as part of your build process today, even if you have to roll your own plugin.
Looking to join a team of creative innovators? Let’s talk.