Docker inside Airflow when running via Docker Compose
Apache Airflow is a workflow management system that is remarkably easy to use and get started with.
A very simple way of getting started with Airflow is by running it through Docker. A tutorial is on the official Airflow docs and a docker-compose file is even provided as a great starting off point.
Using Airflow in a container
Most tasks work out of the box when working inside a container apart from DockerOperator. Using DockerOperator inside a container requires docker to be installed. You may even be using Docker inside other tasks.
Installing Docker inside the Airflow container is possible but doesn’t make a lot of sense to run docker-in-docker as that would add unnecessary complexity. A better solution is to somehow use the host machine’s Docker daemon, but control it through the Airflow container.
Setting up Docker in Airflow
It’s fairly simple to use the host machine’s docker inside Airflow. There are 2 main changes we have to make:
- Install the docker-cli client in the Airflow image
- Mount the Docker socket to the container
Installing the Docker client in the Airflow image
In order to install the Docker client, we need to create our own Dockerfile. I’ll be using the apache/airflow:2.2.3
image, but you are free to use any or integrate it with existing docker files. The installation method is inspired from here.
Mount the Docker socket to the container
When running on a Linux machine, the default location for the docker socket is /var/run/docker.sock
We can mount this socket to the container directly. That way the container uses the host’s Docker daemon instead.
Couple of things to note in the above docker-compose file.
- Here, the image is being built in line 4. You can pre-build the image and use it in line 5 instead.
- The volume on line 7 mounts the host’s Docker socket to the container. Now, any docker commands run inside the container will run as if they were run in the host.
- Docker requires the user to be root or be present in the docker group. Here, the default user is defined as root in line 8. You can also replace this with a different non-root user as long as they are in the docker group, and the group is specified. Ex:
1001:998
(The user and group ID will be different) - This volume, image, and user can also be directly updated in the
airflow-worker
service in thedocker-compose.yml
. This prevents the changes from affecting other Airflow containers.
Conclusion
Mounting the host’s Docker socket to Airflow is a simple and straightforward way of installing and running Airflow in containers while maintaining full functionality.