Gaining Docker Image Size Efficiencies By Separating Application Layers

Luke Patterson Docker, Java, Problem Solving, Spring Boot, Technology Snapshot Leave a Comment

Problem

I was pushing a new Docker image tag for each application code commit, and the admins of the private registry were getting annoyed at how much space I was using.

Solution Summary

Yes, I know there are strategies to clean up old tags but I first wanted to reduce the impact of the tags I was pushing. With the right layering strategy, I knew I could reduce the net registry size increase of consecutive tag pushes.

I wanted to only push what had actually changed in the application. In addition to reducing the impact on the registry, having smaller tag deltas could possibly speed up rolling deployments since nodes could potentially have less to download.

Qualifiers

I don’t know how helpful or applicable my strategies will be your specific application, so let’s get the specifics of my application out of the way:

  • Docker (obviously)
  • Spring Boot
  • Maven
  • Java
  • Most of the day-to-day changes in the application occur in classes directly found in the top-most project, the project containing the @SpringBootApplication class. If this isn’t the case in your application, then a different “separation strategy” might apply.

Solution Details

FYI – There is executable example code. So, if the level of detail in the following steps is lacking, checking out the code and playing around with it is one good way to fill in the gaps. This code repository’s commit history contains some commits that illustrate the correlation of changes in different parts of the code base with how that affects the layer caching behavior. The details that follow revolve around those commits. You’ll see that commit diffs also include the relevant command line execution logs.

A link labeled “Line Link” points to a specific line in the code, so when following the link look for the highlighted line.

Step 0 – Example Codebase and Layering Pattern

[commit link] Pro-Tip: you can press t to see a different view of the overall file structure

  • A Maven multi-module project so it’s easy to show the effects of changing different layers of the codebase. (Line Link)
  • Reproducible-build-maven-plugin to ensure JARs only change when there is an actual code change. (Line Link)

From the execution log, the layout of the exploded archive:

 /unzipped/
  ├── BOOT-INF
  │   ├── classes
  │   │   ├── application.properties
  │   │   └── org
  │   │       └── lukewpatterson
  │   │           └── layers
  │   │               └── blog
  │   │                   └── app
  │   │                       └── OurSpringBootApplication.class
  │   └── lib
  │       ...
  │       ├── jul-to-slf4j-1.7.25.jar
  │       ├── layers-blog-snapshot-dependency-1-SNAPSHOT.jar
  │       ├── log4j-api-2.10.0.jar
  │       ...
  │       ├── spring-web-5.0.9.RELEASE.jar
  │       ├── spring-webmvc-5.0.9.RELEASE.jar
  │       └── validation-api-2.0.1.Final.jar
  ├── META-INF
  │   ├── MANIFEST.MF
  │   └── maven
  │       └── org.lukewpatterson
  │           └── layers-blog-app
  │               ├── pom.properties
  │               └── pom.xml
  └── org
      └── springframework
          └── boot
              └── loader
                  ├── ExecutableArchiveLauncher.class
                  ├── JarLauncher.class
                  ├── ...
                  └── util
                      └── SystemPropertyUtils.class
  • Separation of exploded archive layers in the build image, separation progresses from “changes most often” to “changes least often”. (Line Link)
    • The other steps will refer to these layers.
See Also:  AWS Lambda with Spring Boot

From the execution log, the layout of layers once separated:

  /layers/
  ├── 1-dependencies-release
  │   ├── BOOT-INF
  │   │   └── lib
  │   │       ...
  │   │       ├── spring-beans-5.0.9.RELEASE.jar
  │   │       ├── spring-boot-2.0.5.RELEASE.jar
  │   │       ├── spring-boot-autoconfigure-2.0.5.RELEASE.jar
  │   │       ...
  │   │       └── validation-api-2.0.1.Final.jar
  │   └── org
  │       └── springframework
  │           └── boot
  │               └── loader
  │                   ├── ExecutableArchiveLauncher.class
  │                   ├── JarLauncher.class
  │                   ...
  │                   └── util
  │                       └── SystemPropertyUtils.class
  ├── 2-dependencies-snapshot
  │   └── BOOT-INF
  │       └── lib
  │           └── layers-blog-snapshot-dependency-1-SNAPSHOT.jar
  └── 3-app
      ├── BOOT-INF
      │   └── classes
      │       ├── application.properties
      │       └── org
      │           └── lukewpatterson
      │               └── layers
      │                   └── blog
      │                       └── app
      │                           └── OurSpringBootApplication.class
      └── META-INF
          ├── MANIFEST.MF
          └── maven
              └── org.lukewpatterson
                  └── layers-blog-app
                      ├── pom.properties
                      └── pom.xml
  • Reconstitution of layers in the runtime image. Reconstitution progresses from “changes least often” to “changes most often”. (Line Link)

From the execution log, the layout once reconstituted: (notice it’s the same as the ‘unzipped’ form)

  ├── BOOT-INF
  │   ├── classes
  │   │   ├── application.properties
  │   │   └── org
  │   │       └── lukewpatterson
  │   │           └── layers
  │   │               └── blog
  │   │                   └── app
  │   │                       └── OurSpringBootApplication.class
  │   └── lib
  │       ...
  │       ├── jul-to-slf4j-1.7.25.jar
  │       ├── layers-blog-snapshot-dependency-1-SNAPSHOT.jar
  │       ├── log4j-api-2.10.0.jar
  │       ...
  │       ├── spring-web-5.0.9.RELEASE.jar
  │       ├── spring-webmvc-5.0.9.RELEASE.jar
  │       └── validation-api-2.0.1.Final.jar
  ├── META-INF
  │   ├── MANIFEST.MF
  │   └── maven
  │       └── org.lukewpatterson
  │           └── layers-blog-app
  │               ├── pom.properties
  │               └── pom.xml
  └── org
      └── springframework
          └── boot
              └── loader
                  ├── ExecutableArchiveLauncher.class
                  ├── JarLauncher.class
                  ├── ...
                  └── util
                      └── SystemPropertyUtils.class

From the execution log, you can see that no application codebase layers were already cached since this is the first build. The absence of a ---> Using cache following a COPY command here means that the layer has never been built before. This is the key to the whole thing. Watch what happens in this area on the steps that follow.

Step 20/24 : COPY --from=build /layers/1-dependencies-release/ ./
 ---> d158e97b92fe
Step 21/24 : COPY --from=build /layers/2-dependencies-snapshot/ ./
 ---> bb5e6342aa80
Step 22/24 : COPY --from=build /layers/3-app/ ./
 ---> e0395252c3b1

Step 1 – Change A Release Dependency

[commit link]

  • Added a release dependency. (Line Link)
  • This is the least likely layer to change, and this strategy isn’t optimized for this, so this type of change doesn’t illustrate any size savings for registry pushes or consumers pulls.
  • From the execution log, you can see that the ---> Using cache situation isn’t any better.
See Also:  Tastes Like Burning: An Example of ARKit and iOS Particle Systems

Step 2 – Change A Snapshot Dependency

[commit link]

  • Changed a snapshot dependency. (Line Link)
  • During development, it’s quite common for snapshot dependencies to get updated.

From the execution log, you can see that the ---> Using cache situation is getting better:

  Step 20/24 : COPY --from=build /layers/1-dependencies-release/ ./
   ---> Using cache
   ---> a5f34ee04615
  Step 21/24 : COPY --from=build /layers/2-dependencies-snapshot/ ./
   ---> 7446b0325602
  Step 22/24 : COPY --from=build /layers/3-app/ ./
   ---> 6b0cd535260f
  • A push to the private registry at this point, assuming previous tags had been pushed, would not require the duplication of the layer containing the release dependencies. In most applications, the release dependency layer is by far the largest of the three types identified in these steps.

Step 3 – Change the Application’s Top-Level Code

[commit link]

  • Change some code in the top-level application. (Line Link)
  • Of the three layers identified in these steps, the layer most frequently changed.

From the execution log, you can see that the ---> Using cache situation is getting better:

  Step 20/24 : COPY --from=build /layers/1-dependencies-release/ ./
   ---> Using cache
   ---> a5f34ee04615
  Step 21/24 : COPY --from=build /layers/2-dependencies-snapshot/ ./
   ---> Using cache
   ---> 7446b0325602
  Step 22/24 : COPY --from=build /layers/3-app/ ./
   ---> 70d1d9f84c14
  • A push to the private registry at this point, assuming previous tags had been pushed, would not require the duplication of the layers containing the release dependencies or the snapshot dependencies.

Final Thoughts

There can be size and deployment advantages of separating your application into layers of differing change frequencies. The pattern can be a little fragile if the application’s internal layout changes, but a “failure” is simply a loss of the efficiencies gained by layering and not a failure to include pieces of the application. I know there are many areas in this walkthrough that would be improved by more detail. I’d like to have written a book, but didn’t have time. I just wrote what I could and hopefully, you find something helpful in the code and patterns outlined here.

What Do You Think?