Understanding Docker Layers and Caching

Docker overview

Docker is an open platform for developing, shipping, and running application. Read more about Docker here.

Basic Docker Terminology 🐳

Here are some basic Docker terminologies:

Docker Image

A Docker image is the blueprint of the Docker Container. The image of the application needs to be created for shipping any app. The Docker Image provides a convenient way to package applications and other preconfigured server environments to make development much more streamlined.

Docker Container

A Docker Container is a running instance of a Docker Image. Simply put, the Docker Image is pulled from a registry and it is executed as a Container.

Docker Layers

When building an image from scratch, Docker creates layers to make the successive deployments and builds efficient. Each layer is a diff/delta from the previous layer that was built before it.

Let us try to understand it with the help of an example.

For this article we use this Docker Sample Application along with its Dockerfile.

FROM node:12-alpine
RUN apk add --no-cache python2 g++ make
WORKDIR /app
COPY . .
RUN yarn install --production
CMD ["node", "src/index.js"]
EXPOSE 3000

Each line in this Dockerfile is a Docker layer and if not changed will be reused from the cached layers in later builds.

Running the build for the 1st time:

docker build -t getting-started .

For the first time, every layer will be built from scratch so the entire build process will take a relatively long time.

Screenshot 2022-04-10 at 2.57.47 PM.png Screenshot 2022-04-10 at 3.00.24 PM.png

We can see here that the base images are downloaded from the internet and the commands are run inside of it to create the image and take 175 secs.

Now, let us try to rebuild it:

docker build -t getting-started .

Screenshot 2022-04-10 at 3.00.57 PM.png

The build time now goes down to 4s 🤯🤯

This is what layering and caching in docker does. The subsequent builds are built from the cached layers that were created from the previous builds, and as no changes were made to Dockerfile all the layers were taken up from the cache.

Now, let us make changes in the Dockerfile and see how the cache behaves here.

We simply change the WORKDIR command in Dockerfile.

FROM node:12-alpine
RUN apk add --no-cache python2 g++ make
WORKDIR /app_temp
COPY . .
RUN yarn install --production
CMD ["node", "src/index.js"]
EXPOSE 3000

Now, building it gives a different result: Screenshot 2022-04-10 at 3.44.01 PM.png

Layers [1/5] [2/5] are cached whereas only [3/5] [4/5] [5/5] are again built. This is still better than building everything from scratch.

The layers can be reused in other images created.

Note that both adding and removing files will result in a new layer.

Using Multi-Stage Builds

One of the most challenging things about building images is keeping the image size down. Each instruction in the Dockerfile adds a layer to the image, and you need to remember to clean up any artifacts that you do not need before moving on to the next layer. This is where multi-stage builds help.

Updated Dockerfile:

# syntax=docker/dockerfile:1
FROM node:12-alpine as initial_builder
RUN apk add --no-cache python2 g++ make
WORKDIR /app
COPY . .
RUN yarn install --production

FROM alpine                                #Final build stage
WORKDIR /app
COPY --from=initial_builder /app /app

CMD ["node", "src/index.js"]
EXPOSE 3000

In the final build stage just the built artifacts are brought from the previous stage into this new stage.

docker build multi-stage .

Now, let us compare the size between the 1st image and the final image.

docker image ls

Screenshot 2022-04-10 at 8.36.20 PM.png

The size drastically reduces here. 😎😎

Avoid Caching

Using --no-cache while building the image will always start building the image from scratch even if cached layers are available.

Understanding R/W Layer

An image has many layers. When a container starts, only one read-write layer is attached on top of all the layers of images.

All the changes a container makes are made to the editable R/W layer and not to the underlying image layers. Therefore, a number of containers can use the same image with each having its own R/W layer.

Copy-on-Write (CoW) mechanism in its storage drivers. This mechanism satisfies the need of different containers to share the same image. However, when a single container performs operations such as modification of an image file, a duplicate image is created in the upper read-write layer.

Advantages of using Docker Layers

Good storage management
Faster builds
Faster deployments
Sharing across multiple containers
Enhanced scalability

Conclusion:

Docker Layers and Cache are important concepts when it comes to adopting good practices of creating any Docker infrastructure. Small tweaks here and there can increase the efficiency of scalability and deployments.

I have tried to explain the concepts in a simple and easy to understand language here to make readers interested into using these in their docker practices.

Hope you enjoyed the article, have a great day !!✌🏻✌🏻

This is a part of a series of articles to help understand Docker better. Find the other articles as follows:

Understanding Docker Networking

Understanding Docker Volumes

Understanding Docker Layers and Caching

Docker overview

Basic Docker Terminology 🐳

Docker Layers

Using Multi-Stage Builds

Avoid Caching

Understanding R/W Layer

Advantages of using Docker Layers

Conclusion:

Comments

More from this blog

K3s in Action: Why We Chose It First and How It Scales With Us

How to Mock in Integration Tests: Tools and Implementation

Katalon and the Rise of Low-Code Test Automation

Microservices Architecture: From Theory to Practice

Your $100+ Monthly AI Subscriptions Are About to Become Browser Features

Command Palette

Docker overview

Basic Docker Terminology 🐳

Docker Layers

Using Multi-Stage Builds

Avoid Caching

Understanding R/W Layer

Advantages of using Docker Layers

Conclusion:

Comments

More from this blog