This is an article I've been meaning to write for a while. The purpose of this article is to cover the methodology and purpose behind "Purpose Built Containers". This is a concept you've likely seen expressed elsewhere and in other canontations. As good example of this is the 12 Factor Application or OpenShift's S2I. To be clear, these are not new concepts when it comes to containers, but it's a methodology tied to GitLab and one that stands on the shoulders of giants.
A "Purpose-Built Container" is, if you've ever seen Rick and Morty, esentially a Meeseeks Box of the IT World. That means it's a container that exist only to serve a specific purpose and then it quickly and quietly dies or is shut down. It is a container that is not designed to be long standing or host an application. It starts, it does it's purpose, then it stops and exits.
Note: When we refer to container, we mean the compiled image that is runable. When we refer to the image, we're refering to the dockerfile definition, prior to being built.
Every CI/CD Job inside of a GitLab Pipeline observes this process. Every Job spins up, pulls & starts a docker image, and then conducts a build job. Finally the container is stopped and shutdown. Every job spins up a new container from a docker image, and does the same process. No container that is created is ran twice; Even if a job is retried, a new container is spun up.
GitLab CI/CD can utilize any docker image, but not every docker image is built with this process in mind. For example, some docker images spin up an entire virtual machine, others will spin up a database alongside your application in the container. These are not purpose-built containers. They serve multiple purposes and their purposes change over time, Where-as a purpose built container serves one purpose and then dies.
Anatomy of a Purpose-Built Container
A Purpose-Built Container is comprised of the following components:
They're Small: A purpose-built container should be comprised of only the minimum number of components needed to run. This may include a NodeJS Runtine for example. However, you'd want to avoid including any application specific dependencies. You only want to include the NodeJS Runtime. After the container is spun up and running, then you inject your dependancies into it. There are exceptions to this rule, For example internal certificates may be included into the image. The rule is, if the image doesn't need it to run, it shouldn't be included.
They're Multiuse: "Purpose-built" applies to how a container runs, and not how it's built. You can build a container for multiple purposes and uses aslong as it doesn't violate rule #1. You want your purpose-built containers to be built with multiple consumers and usages in mind. Not only should this reduce the specific configurations you include in it, it should keep the image small in size. Therefore following Rule #1 and giving it speed. I.e. Maven and NodeJS should be separate containers, but each can include global requirements like internal certificates.
Their entrypoint is empty: A "Purpose-Built" container is a zombie, it has no purpose other than what is given to it. Because of this, it should contain a dummy entrypoint or empty entrypoint. A perfect example of this is an entrypoint with a shell-script that echos the instructions on how to use the container and not run it. "Purpose-Built" containers are not designed to be long standing or ran on a customer facing/production environment.
Avoid needless layers: Every command inside of a Dockerfile produces a new "layer" inside of the Docker Image, which makes the Image larger and also can make it more difficult to pull down. Both of these things result in a slower image to pull down, start, run, and destroy. You should reduce those as much as possible, opting for
RUN command && commandas opposed to
RUN command \n RUN command- The first will produce one layer, the second two layers.
Run with least-amount of privilege:" Most purpose-built containers will not need any form of elevated permissions. They need the bare minimum of user permissions to function. With these you should have them run with
USER 1001so that they run with a random user. On OpenShift this is a requirement, on Docker/Kubernetes this is a great practice. On OpenShift, all random user's run under the ROOT user group, but without ROOT permissions. So just set any files or folders to being owned by the root group and they'll be usable by this root user.
How do we build the image?
To build the image, we're going to start with a basics folder structure. Go ahead and make a new GitLab Project, then initialize it with a readme and clone it down to your local machine. The folder structure for this should follow the following.
1. For our Dockerfile we're going to do the following..
Remember, we want out Dockerfile to be as light and generic as possible. Notice on the RUN line how I'm merging two commands into one to reduce layers. You will also note that I have added
USER 1001 to the bottom of the Dockerfile, this is to ensure that the container runs as a random user without any permissions. While some workloads may require more, for this purpose, we don't.
At the end of the RUN line, we have a chmod command. This command is used to give the
/.npm folder root group permissions. The purpose of this is that any random user that spins up has the root group, but not root user privilege.
RUN apk update && apk add --no-cache nodejs npm && mkdir /.npm && chmod -R g=u /.npm
CMD ["echo", "This is a 'Purpose-Built Container', It is not meant to be ran this way. Please visit www.lackastack.net to see how to use it!"]
2. Our GitLab CI File to automating create and maintain it.
This GitLab CI file will build our image and save it in the local GitLab Container Registry of this project. It will save it as latest when built. When git is tagged, it will release an image with that GitLab CI Tag as the Docker Image tag. This will allow you to properly version your Docker Images for release and maintenance.
- docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" $CI_REGISTRY
- docker build --pull -t "$CI_REGISTRY_IMAGE" .
- docker push "$CI_REGISTRY_IMAGE"
- docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" $CI_REGISTRY
- docker build --pull -t "$CI_REGISTRY_IMAGE:$CI_COMMIT_REF_SLUG" .
- docker push "$CI_REGISTRY_IMAGE:$CI_COMMIT_REF_SLUG"
How do we use the image?
If you've gotten here, we have an image. It's light-weight, it follows the rules, and it has a NodeJS Runtime. So, now below we have a GitLab CI Pipeline. This pipeline will utilize our image. You'll also notice that this has a
before_script block with an array of parameters. We're using only one parameter, to install
expressjs. Now, Why are we installing this here?
Above, we spoke about the fact our image needs to be multiuse. This means we can't bake things like expressjs into our NodeJS image. So we use
before_script to run the commands to install it. This is a careful dance, if you have a couple dependancies, this wont take long. But if you have a large number of dependancies that many people use. You should bake them into the docker image and allow them to be reused.
The goal here is about speed, and less waste. If you have a bunch of unique images where everyone makes one, you have a bunch of waste. But if you have a small number of images with everything but the kitchen sink, you end up with very slow pipelines. There's a careful and delicate balance to be had.
- npm install expressjs
- echo "Installed"
I hope you got some new insight out of this. The goal here again, is to build a container for use inside of GitLab CI in addition to being used inside of OpenShift. Many public containers are not usable inside of OpenShift due to security rules and thus you may be forced to make your own. If you follow this process you images should be quick and effective on GitLab but also run properly on OpenShift.
May your containers be small, and your CI Pipelines be fast.