Docker is great. In my opinion, Docker has the power to redefine the way we develop, test, deploy, maintain and even view applications. It might have redefined them already. For a lot of people, the first time they interacted with docker, was when they started using GitLab pipelines. With more experience and after I saw how cool docker can be, I wanted to dive deeper and develop applications that are shipped as docker images. However, this immediately leads to problems.
docker-machine in combination with mounting the docker socket
Following best practices, one would like to have a CI setup which runs unittests every time someone pushes new code. But how do you run native docker applications inside GitLab CI? You could, of course, drop the separation between the host system and the CI job by using a GitLab Runner with a shell executor (see GitLab’s help), but this opens up a huge security hole. After all, wasn’t this exactly one of the use cases in which docker prevails?
It almost like we are too greedy. We asked for a way to separate CI jobs from the host system in order to have reproducible CI results and to prevent privilege escalation. The answer is docker. But then we started using docker for the application itself and now we need a new way to provide appropriate isolation.
Running tests in GitLab’s CI for native docker application is of course not the only use case in which one faces this issue. One might want to use a dockerized application to orchestrate other containers. For this, one also needs a docker daemon inside the orchestrating container. (Today, there are better ways to do this: have a look at Kubernetes.) The issue not only occurs if you develop dockerized applications, you could be working on a docker tool, such as docker-compose. During testing, you want to run commands against a real docker daemon. If the CI tests run inside a docker daemon, we are back to the initial problem. Most fundamentally, if you are working on docker itself, you need a docker daemon inside the CI job.
So this situation calls for an appropriate solution. I think there are three possible solutions, in order to overcome the issue. All of them come with advantages and disadvantages. The methods will be discussed in the following. The three solutions are
Docker-in-docker sounds like exactly what we wanted. It is readily available with a specialized image. The command
host$ docker run -d --privileged --name mydind docker:dind
launches the docker-in-docker container. To connect to the docker-in-docker daemon run
host$ docker run --rm -it --link mydind docker / # export DOCKER_HOST=tcp://mydind/ / # docker ps
In this example, I use the
docker image because it ships with the docker client,
however, you can choose or build an image that suits your situation best.
In Jérôme Petazzoni’s blog, he states that docker-in-docker has serious downsides including issues with the file systems. According to Petazzoni, the use case of docker-in-docker is very limited. One viable use case being developing docker itself.
Obviously, I don’t have nearly as much experience with docker as Petazzoni, however, from my point of view, I think docker-in-docker is fine in a lot of cases. However, if you plan to use docker-in-docker, there are issues I’d like to draw your attention to, which in my opinion are more important in practice compared to what is stated in Petazzoni’s blog.
The docker-in-docker container has to be started with the
--privilegedflag. This effectively bypasses all the security measures in place to isolate the code inside the container form the host system. The isolation is usually enforced by the kernel concept of capabilities, however, privileged containers are given a large set of capabilities. In summary, there is no effective isolation between the code inside the container and the host system. Malicious code could easily interfere with the host system.
If you use a dind container and
linktwo other containers to it, these two containers are not isolated from each other. They could even stop each other. If you run several things in parallel, make sure that they do not interfere with each other.
Another disadvantage is the lack of an image cache. If you start a container via docker-in-docker, the image has to be downloaded from a registry, because the docker-in-docker has no access to the image of the host system. Depending on your use case, this could affect the performance of your system.
In summary, this solution is not perfect because of security issues (privileged containers and lack of inter-container isolation) and performance issues (lack of cache). However, if you trust the code, I think this is a viable solution. This blog entry by DigitalOcean advocates this solution if you want to build docker images in GitLab’s CI. It is also mentioned in GitLab’s help
Mount docker socket
The second solution is to mount the host system’s docker daemon inside the container. You can achieve this with
host$ docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock docker / # docker ps
Again here I’m using the docker image because it ships with the docker client, but you can use whatever image you want. The solution is really easy: you need only one container. It is also performant because there is an image caching system in place: we can directly access all the images of the host system.
Please, “think twice” before you opt for this solution. If you want to use this method for you CI setup, you should be aware that any code that is checked into the repository and potentially runs inside your container can run with root privileges on your host system. Consider the following example.
host$ docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock docker / # docker run --rm -it -v /:/host centos [root@1c621348d039 /]# touch /host/i_was_here [root@1c621348d039 /]# exit / # exit host$ ls /i_was_here /i_was_here
If you execute the above commands, you start a container which has the host’s
docker socket mounted inside the container. The second line launches a new
container. The second container is a sibling to the previous container on the
host system. The command
docker ps on the host system shows both containers
(and any other container on the system). Running
docker ps inside the
container with access to the docker socket has the exact same output.
Since we have full control over the socket, we can mount the host’s
root file system at
/host. Inside the container running CentOS, we have root
access to the host’s root file system.
Obviously, this method opens a huge security hole. The solution seems tempting at first glance because of its ease of use because it is advertised on various blogs. I even discovered that a prestigious research institution uses this exact method to provide access to a docker daemon in the CI jobs and therefore exposing themselves to potential attacks.
Even if you have full control over the code that is run inside the container (which means you can never allow contributions from others), one can easily make a mistake and severely compromise the host system. You’re also not logging on as root to run unittests, or do you?
Therefore, please, don’t mount the host’s docker socket!
The last solution is based on
Docker-machine is an easy-to-use
command line tool, which lets you provision virtual machines and
connect to remote docker daemons. Docker-machine is intended to setup VMs with a
docker daemon running inside.
In order to work with an independent daemon inside the VM, create a new VM called test by executing
host$ docker-machine create test
The above command starts a new virtual machine with the lightweight
operation system. Inside the VM you have access to a docker daemon.
You can either
docker-machine ssh into the VM to get shell access, or use
docker-machine env to get direct access to the docker daemon.
Since the VM
provides isolation between the host system and the docker daemon, we
can safely execute code on the shell inside the VM, start a privileged container
inside the VM or mount the VM’s docker socket inside another container.
Please note that this is different from the previous methods because we do not
use the docker daemon of the host system.
to guarantee reproducible results, we must spin up a new VM for every CI job.
The VMs are in a sense ephemeral like docker containers. However, the
virtualization is realized at the CPU level and not at the kernel level.
This solution provides efficient isolation between CI jobs and the host system. If you follow the advice that every VM should be used only for one CI job, there is also efficient isolation between different CI jobs.
The disadvantage with this method is the overhead of the VM and the lack of a cache (besides that fact that this is not a pure docker-based solution). Since I advocate disposable VMs, one has to download the required docker images every time. This could be overcome with registry caches and mirrors. In my experience, the overhead of the VM is minimal. Whether you need registry mirrors, is for you to decide.
In summary, I strongly advise you to use this approach. It has some disadvantages on the performance side, but it ensures a safe and secure environment. The next article shows how to set up a GitLab runner with docker access.