What is a Container?

Did you know that Docker containers are processes where applications such as Nginx, Apache etc run and serve. A typical OS process on a system (Apache for example) has a PID, a File-System associated with it and usually exposes it’s services on a port (such as 80). The same is true for an Apache server process running inside a Docker container. So where is the difference lies? Let’s compare them to understand more.

A typical OS process

When you run a process, the OS Kernel loads the program into the main memory and allocates some amount of CPU to the program to process instructions. But what if more than one processes need to processed and (suppose) you only have a single core processor then what would happen? In that case the Kernel will use Context Switching, use Temporary Storage, VM Management among processes because of which seems like all the processes are executing simultaneously but actually they don’t.

os-kernel-busy

This approach makes the Kernel busy as now the Kernel has to manage more than one process at a given time within the constraints of scarce computing resources.

A Process inside a Container

As said above that the containers are the processes. But if we specifically refer containers as processes, we would rather say Containers are isolated processes. Let’s take a look at it to elaborate more.

As we are familiar with that if we take OS level process isolation, only a single process can make CPU to execute instructions at a given time. A process utilizes other computing resources as well such as RAM, file-system, mount points etc which are scarce resources as well.

“Now Think of a container as a process with isolated computing resources (i.e. CPU, RAM, Network Interfaces etc) from other processes and are enough isolated to run the process itself.”

os-kernel-easy

Consequently a container makes the process to feel like it is running on a completely different system. This approach makes the Kernel happy as it isolates different computing resources such as File-System, CPU utilizations and even provides a separate IP address for a process to communicate with the world and allocate to the process without the Kernel interventions.

Linux Control Groups and Namespaces

Now here is a bit trickier part that we should know about that how does a container isolate itself from the rest of the processes out there on a system. This is possible via 2 major Linux features Control Groups (or cgroups) and Namespaces.

Namespaces

Linux Kernel has a feature known as Namespaces whose primary purpose is to Isolate or Virtualize the global set of system resources for a process. For example it isolates\virtualizes File-System for a process and other computing attributes such as Process IDs, User IDs, Network Access etc. Click Here to learn more namespaces and its different types.

Control Groups or cgroups

In the release of Linux Kernel 2.6.24, a new feature known as cgroups, short for Control Groups were merged into the Kernel and made available for use. Using cgroups, You can limit and monitor a process to use a specific computing resource to a certain limit. It also organizes processes into a hierarchical structure like:

namespaces

Take a precise look at the above given image which demonstrates the anatomy of different processes running inside a container. For example, if you run a process such as Apache Httpd inside a Docker container with the command:

docker container run -d --name httpd httpd

If you are using Docker version prior than 1.13, exclude container and image from the CLI.

Hook into the container’s terminal session and run ps ax command there, you will see that the httpd process is running at the PID 1 inside the container (on the right-side in the below image) where as if you run the same ps ax command on your OS terminal while the container is running (On the left-side ), you still can see httpd process in the list with a different PID (25426 on my system). Hence the process is the same but with different PIDs because of different namespaces.

namespaces

Docker Containers hit the ground

Docker is the most popular software containerization platform. Using Docker, you don’t have to worry about managing Kernel level features such as cgroups, namespaces etc. Instead, Docker automates everything for you. You don’t have to use nice or unshare commands to create container based components. You just spin up a Container from a Docker image using Docker CLI and Docker creates the container with everything setup such as isolated PIDs, network access, File-System and Mount namespaces. You can even limit the container to use computing resources to a certain limit (If you want so).

If you run a container such as:

docker container run -d --name httpd httpd

And see its properties with the command:

docker container inspect httpd

You will get a JSON document with all the attributes of the container such as it’s IP address, Gateway, Working Directory, Drivers etc.

With Linux namespaces and cgroups, Docker also includes a 3rd component which is Union FS (UFS). UFS is a file-system that you can create with Docker Image Layers (i.e. via a Dockerfile) which is pretty light-weight. Docker packages Linux namespace, cgroups and UFS into a container format call libcontainer that the Docker-Engine uses to create and run containers. Click Here to learn about Docker-Engine’s Overview.

Conclusion

Docker is the most widely used software containerization platform. It also makes software distributions easier with Docker images. Such as the majority of the open-source softwares are available as Docker images on a Docker registery such as on Docker Hub. Docker containers are not limited to just containerizing softwares but we can also use it for setting up consistent environments, easier software installations and production grade orchestrations and more.