Problem
I was pushing a new Docker image tag for each application code commit, and the admins of the private registry were getting annoyed at how much space I was using.
Solution Summary
Yes, I know there are strategies to clean up old tags but I first wanted to reduce the impact of the tags I was pushing. With the right layering strategy, I knew I could reduce the net registry size increase of consecutive tag pushes.
I wanted to only push what had actually changed in the application. In addition to reducing the impact on the registry, having smaller tag deltas could possibly speed up rolling deployments since nodes could potentially have less to download.
Qualifiers
I don’t know how helpful or applicable my strategies will be your specific application, so let’s get the specifics of my application out of the way:
- Docker (obviously)
- Spring Boot
- Maven
- Java
- Most of the day-to-day changes in the application occur in classes directly found in the top-most project, the project containing the
@SpringBootApplication
class. If this isn’t the case in your application, then a different “separation strategy” might apply.
Solution Details
FYI – There is executable example code. So, if the level of detail in the following steps is lacking, checking out the code and playing around with it is one good way to fill in the gaps. This code repository’s commit history contains some commits that illustrate the correlation of changes in different parts of the code base with how that affects the layer caching behavior. The details that follow revolve around those commits. You’ll see that commit diffs also include the relevant command line execution logs.
A link labeled “Line Link” points to a specific line in the code, so when following the link look for the highlighted line.
Step 0 – Example Codebase and Layering Pattern
[commit link] Pro-Tip: you can press t
to see a different view of the overall file structure
- A Maven multi-module project so it’s easy to show the effects of changing different layers of the codebase. (Line Link)
- Reproducible-build-maven-plugin to ensure JARs only change when there is an actual code change. (Line Link)
- Spring Boot’s “exploded archive”-style of executable since a “non-exploded” archive (a JAR file) is opaque to Docker’s layer caching mechanism. (Line Link)
From the execution log, the layout of the exploded archive:
/unzipped/ ├── BOOT-INF │ ├── classes │ │ ├── application.properties │ │ └── org │ │ └── lukewpatterson │ │ └── layers │ │ └── blog │ │ └── app │ │ └── OurSpringBootApplication.class │ └── lib │ ... │ ├── jul-to-slf4j-1.7.25.jar │ ├── layers-blog-snapshot-dependency-1-SNAPSHOT.jar │ ├── log4j-api-2.10.0.jar │ ... │ ├── spring-web-5.0.9.RELEASE.jar │ ├── spring-webmvc-5.0.9.RELEASE.jar │ └── validation-api-2.0.1.Final.jar ├── META-INF │ ├── MANIFEST.MF │ └── maven │ └── org.lukewpatterson │ └── layers-blog-app │ ├── pom.properties │ └── pom.xml └── org └── springframework └── boot └── loader ├── ExecutableArchiveLauncher.class ├── JarLauncher.class ├── ... └── util └── SystemPropertyUtils.class
- Separation of exploded archive layers in the build image, separation progresses from “changes most often” to “changes least often”. (Line Link)
- The other steps will refer to these layers.
From the execution log, the layout of layers once separated:
/layers/ ├── 1-dependencies-release │ ├── BOOT-INF │ │ └── lib │ │ ... │ │ ├── spring-beans-5.0.9.RELEASE.jar │ │ ├── spring-boot-2.0.5.RELEASE.jar │ │ ├── spring-boot-autoconfigure-2.0.5.RELEASE.jar │ │ ... │ │ └── validation-api-2.0.1.Final.jar │ └── org │ └── springframework │ └── boot │ └── loader │ ├── ExecutableArchiveLauncher.class │ ├── JarLauncher.class │ ... │ └── util │ └── SystemPropertyUtils.class ├── 2-dependencies-snapshot │ └── BOOT-INF │ └── lib │ └── layers-blog-snapshot-dependency-1-SNAPSHOT.jar └── 3-app ├── BOOT-INF │ └── classes │ ├── application.properties │ └── org │ └── lukewpatterson │ └── layers │ └── blog │ └── app │ └── OurSpringBootApplication.class └── META-INF ├── MANIFEST.MF └── maven └── org.lukewpatterson └── layers-blog-app ├── pom.properties └── pom.xml
- Reconstitution of layers in the runtime image. Reconstitution progresses from “changes least often” to “changes most often”. (Line Link)
From the execution log, the layout once reconstituted: (notice it’s the same as the ‘unzipped’ form)
├── BOOT-INF │ ├── classes │ │ ├── application.properties │ │ └── org │ │ └── lukewpatterson │ │ └── layers │ │ └── blog │ │ └── app │ │ └── OurSpringBootApplication.class │ └── lib │ ... │ ├── jul-to-slf4j-1.7.25.jar │ ├── layers-blog-snapshot-dependency-1-SNAPSHOT.jar │ ├── log4j-api-2.10.0.jar │ ... │ ├── spring-web-5.0.9.RELEASE.jar │ ├── spring-webmvc-5.0.9.RELEASE.jar │ └── validation-api-2.0.1.Final.jar ├── META-INF │ ├── MANIFEST.MF │ └── maven │ └── org.lukewpatterson │ └── layers-blog-app │ ├── pom.properties │ └── pom.xml └── org └── springframework └── boot └── loader ├── ExecutableArchiveLauncher.class ├── JarLauncher.class ├── ... └── util └── SystemPropertyUtils.class
From the execution log, you can see that no application codebase layers were already cached since this is the first build. The absence of a ---> Using cache
following a COPY
command here means that the layer has never been built before. This is the key to the whole thing. Watch what happens in this area on the steps that follow.
Step 20/24 : COPY --from=build /layers/1-dependencies-release/ ./ ---> d158e97b92fe Step 21/24 : COPY --from=build /layers/2-dependencies-snapshot/ ./ ---> bb5e6342aa80 Step 22/24 : COPY --from=build /layers/3-app/ ./ ---> e0395252c3b1
Step 1 – Change A Release Dependency
- Added a release dependency. (Line Link)
- This is the least likely layer to change, and this strategy isn’t optimized for this, so this type of change doesn’t illustrate any size savings for registry pushes or consumers pulls.
- From the execution log, you can see that the
---> Using cache
situation isn’t any better.
Step 2 – Change A Snapshot Dependency
- Changed a snapshot dependency. (Line Link)
- During development, it’s quite common for snapshot dependencies to get updated.
From the execution log, you can see that the ---> Using cache
situation is getting better:
Step 20/24 : COPY --from=build /layers/1-dependencies-release/ ./ ---> Using cache ---> a5f34ee04615 Step 21/24 : COPY --from=build /layers/2-dependencies-snapshot/ ./ ---> 7446b0325602 Step 22/24 : COPY --from=build /layers/3-app/ ./ ---> 6b0cd535260f
- A push to the private registry at this point, assuming previous tags had been pushed, would not require the duplication of the layer containing the release dependencies. In most applications, the release dependency layer is by far the largest of the three types identified in these steps.
Step 3 – Change the Application’s Top-Level Code
- Change some code in the top-level application. (Line Link)
- Of the three layers identified in these steps, the layer most frequently changed.
From the execution log, you can see that the ---> Using cache
situation is getting better:
Step 20/24 : COPY --from=build /layers/1-dependencies-release/ ./ ---> Using cache ---> a5f34ee04615 Step 21/24 : COPY --from=build /layers/2-dependencies-snapshot/ ./ ---> Using cache ---> 7446b0325602 Step 22/24 : COPY --from=build /layers/3-app/ ./ ---> 70d1d9f84c14
- A push to the private registry at this point, assuming previous tags had been pushed, would not require the duplication of the layers containing the release dependencies or the snapshot dependencies.
Final Thoughts
There can be size and deployment advantages of separating your application into layers of differing change frequencies. The pattern can be a little fragile if the application’s internal layout changes, but a “failure” is simply a loss of the efficiencies gained by layering and not a failure to include pieces of the application. I know there are many areas in this walkthrough that would be improved by more detail. I’d like to have written a book, but didn’t have time. I just wrote what I could and hopefully, you find something helpful in the code and patterns outlined here.