Single Dockerfile across development, CI and Production

Introduction

This article tries to summarize how we came up with a solution to use a single Docker image among all our different environments (development, staging and production) for our Python services. We have been inspired by other projects, such as the official Postgres image, where an entrypoint script is used to customize containers at startup depending on environment variables.

Working with Dockerfiles

Let's revisit some Docker concepts before getting to the entrypoint definition. You can safely skip this section if you've written Dockerfiles containing RUN, COPY, CMD and changed the default user.

A Dockerfile is a text file containing the recipe for packaging a software project as a Docker image. It consists of a series of commands that are executed in order when you run docker build -t app ..
Each command produces a layer of changes that are stacked up and provided as a unified file system to our application at runtime.

A simple Dockerfile for a Python project, may look like this:

FROM python:3
ENV ENV=prod UID=1000 USER=user PORT=8000

# Add User
RUN groupadd -g ${UID} -r ${USER} \
    && useradd -u ${UID} -r -g ${USER} ${USER}

# Installs third party dependencies
COPY /requirements.txt /app/requirements.txt
RUN pip install -r /app/requirements.txt

# Copy the code
COPY ./code/ /app/

EXPOSE ${PORT}

USER ${USER}
WORKDIR /app

# This will run inside WORKDIR
CMD ["python", "main.py"]

The line containing the FROM refers to the base image. In this case we use python:3 for simplicity but it's quite common to use lightweight images based on Alpine Linux. Then we set a bunch of environment variables with a default value.

The next line is a bit cryptic, it creates a user and adds it to a group for later use. All these RUN and COPY at this stage are executed as root until we use the USER clause, close to the end of the file.

The next lines correspond to the third party requirements, since we're using Python, this is the input for pip install, and copying the code itself to /app/. The reason behind not having requirements.txt inside the code directory is that caché invalidation would occur with every code change, forcing us go through the installation with every build even though none of the requirements had changed.

Finally, our file defined a command to run with the CMD clause. Based on the EXPOSE clause, it's very likely that main.py will be exposing that port.

Use of volumes

Let's talk about the development process for a bit. It's very likely that you're using a framework to build your software. In Python Django and Flask are very popular for web applications. These framework offer a code reload feature, or development server that lets you try your changes without needing to restart your process, only refreshing your web browser.
Since Docker layers are immutable, at first we wouldn't be able to benefit from any code reloading server.
Docker containers can have volumes, which let us share directories between the Docker host and the container. They're also used to preserve state. Most web applications will preserve state in a database rather than in the container itself, so we won't be discussing that use case.

In the Dockerfile we introduced in the previous section, the folder that holds the code is called ./code/ and is copied to the /app/ location. Using volumes, we instruct docker to mount the host folder inside the container, enabling the hot code reload. It's important to note that having a volume for your code won't affect the final image that docker build generates since volumes are not shipped with your images.

Let's try this concept of volumes using the docker command line interface. We need to give the full path to the host directory, so we use a call to the pwd command.

# In the directory containing the Dockerfile
docker build -t app .
# Then, the code is mounted like this
docker run --name myapp -v $(pwd)/code:/app/ -p 8000:8000 app

This will run the container in foreground so we'll need a separate console to verify that the host folder has been mounted inside the container (note that this was run from OSX):

docker exec myapp mount | grep /app
osxfs on /app type fuse.osxfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other,max_read=1048576)

You can also run exec in interactive mode and observe changes in /app as you change files in the host.

There's also another kind of volume managed by the docker daemon, that can be created with the docker CLI (docker volume create my-vol) or when it's declared in your Dockerfile like this or defined in the volumes section of a compose file. This kind of volume will be used in the following section.

Environments and dependencies

We have learnt how to take advantage of volumes for development, but one important thing we've skipped is environments. We call an environment to a set of settings and in many cases dependencies. The recommended way to provide settings to containers are environment variables such as the ENV USER in our example Dockerfile, but dependencies are a little trickier. We don't want to ship development dependencies, such as testing and debugging tools, both for security and space saving reasons.

Regarding dependencies, or requirements, it's is very common in Python to have separate files for production and development (and in some cases CI), but this does not play nicely with Docker without having different images for each environment.
An example of this division can be seen in the popular django cookiecutter template for production vs. the dev settings.

Writing a custom entrypoint

The entrypoint is a executable script or binary that is in charge of running your command. It's typically just sh or bash but it can be customized. For example, a non trivial one can be found in the official Postgres image. This ensures that you will have a properly set up database that will be persisted in a volume, but also allows the user to define special setup such as running custom SQL.

In a similar fashion to Postgres, we decided to use an entrypoint script, but instead of setting up a database we installed development requirements when the container was told it was in development mode. This lead to a problem, since our site packages are written by root, we needed a place where this installation could happen without giving the user extra privileges. Fortunately Python comes with such a feature, it's called pip install --user and installs things to ~/.local/. It's not commonly used because these packages are hard to uninstall, but that's not a problem to us, since we decided to use a volume for that path.

To generalize our approach, we decided to base our solution on the Postgres official docker image, so we adapted part of their initialization process and made it specific for a Python project as the following code sample:

#!/bin/bash
set -e
cmd="$@"

if [ "${ENV:dev}" != "prod" ]; then
    # Custom environment setup
    if [ -z "${SKIP_SETUP}" -a -d "/shared/${ENV}" ]; then
        for f in /shared/${ENV}/*; do
            case "$f" in
            *.sh)     echo "$0: running $f"; . "$f" ;;
            *.py)     echo "$0: running $f"; python "$f"; echo ;;
            *)        echo "$0: ignoring $f" ;;
            esac
            echo "Finished ${ENV} setup. Running $cmd"
        done
    fi
fi

# Django specific
if [ -z "${SKIP_COLLECTSTATIC}" ]; then
    echo "Running collect static"
    python manage.py collectstatic --noinput
fi

exec $cmd

Bash is not so great at legibility, so we summarize the key elements of this script as:

  1. Stores the command the user wants to run
  2. If the environment variable ENV is not prod, and we have provided a volume called /shared/<NAME OF ENV> their .sh and .py files will be run.
  3. Finally we needed to make sure that our container were running Django's collect static, so we added it to the entrypoint script.
  4. It's important to use exec to call the user supplied command, so it keeps the PID 1.

Then we wrote some scripts, that were placed in volumes, for example this is was placed in a volume, mounted in /shared/dev/01_setup.sh (note that dev here denotes the environment):

#!/bin/bash
# This script runs on top of the production Docker image
# and installs development requirements as a normal user.
set -e
pip install --user -r /app/requirements/dev.txt

Whenever you create a container from your image, and the ENV is set to dev, all our development requirements will be stored in the volume mounted in /home/user/.local/lib. It's important to tell python to look at that directory, since it does not by default, by adding the env variable PYTHONPATH pointing to /home/user/.local/lib/python3.6/site-packages.

Another useful script that can be added here, is waiting for the databases to come up, we stored this file in /shared/dev/02_wait_db.sh:


import os
import socket
from time import sleep

DATABASES_TO_WAIT = {
    # Populate settings from environment variables
}

for name, opts in DATABASES_TO_WAIT.items():
    while True:
        addr = (opts['host'], opts['port'])
        try:
            sock = socket.socket()
            sock.connect(addr)
        except Exception as e:
            print("Waiting for {}: {}".format(name, e))
            sleep(1)
            continue
        sock.close()
        break

Finally, most of the initialization scripts placed in /shared/<ENV>/ are shared between everything that is not production in general. We don't have rules for production using this approach, since we want to guarantee that the entrypoint itself will take care of this. We only modify or augment the production image, to fit development or CI needs.

Conclusion

We have covered how we can use a custom entrypoint with volumes and environment variables to dynamically customize a production image that is always shippable. Continuous Integration images can be tagged once tests and be pushed right away. We hope you can take advantage of this approach for your python images!

Single Dockerfile across development, CI and Production
Share this
All content is licensed with: