In this post, I’m briefly going over how installing dependencies in your Docker image using yarn install
can cause unnecessary bloat, why it happens, and the simple steps to take on how to fix it.
The other day, I was containerizing a simple Node.js project. The MVP Dockerfile was as simple as it gets:
FROM node:16.18.0-alpine
COPY ./package.json ./yarn.lock ./
RUN yarn install --immutable --production
Yet, after adding a development dependency the resulting image size bloated from 57.5 MB to 70.36 MB (v1.0.0 vs. v1.0.1).
So, even though we didn’t add anything extra to the resulting image, we still ended up with a 22% increase in image size 🤔. Not nice.
There’s this handy tool dive
which makes it easy to figure out how each layer changed the file system of the resulting image. Let’s open up that new version and see what’s going on:
dive addono/bull-monitor:1.0.1
In this case, the only action in the Dockerfile which can cause this substantial change is in the yarn install
-step. So that’s the first place we will look:
Unexpectedly, the dependency installation step added a lot of files in the node_modules
-directory, as that’s where Node.js dependencies are usually stored.
But, what’s interesting is this .cache
directory Yarn created for itself. We don’t need this cache, since we don’t intend to install more dependencies in this image. Thus, it’s useless to ship this as part of our image.
Simply deleting the files in a subsequent step in the Dockerfile won’t work. The initial layer already contains the cache-files, so a later layer deleting the file wouldn’t remove those files from the preceding layer.
Luckily we do have the power of multi-stage builds. In this way, we can first create an image which pulls in all dependencies, then copy only the files we really need over to the production image.
It’s only a small change, but now we’re sure that only the dependency files and the original package.json
will land in our final image. This last bit is now application specific, since the image we’re building needs to understand exactly which files are needed in the final image.
###################################################
# Builder stage which pulls in all dependencies #
###################################################
FROM node:16.18.0-alpine as builder
COPY ./package.json ./yarn.lock ./
RUN yarn install --immutable --production
###################################################
# Create the stage which will run the application #
###################################################
FROM node:16.18.0-alpine as runner
# Copy in all the dependencies we need, by avoiding
# installing them in this stage, we prevent Yarn
# from including additional cache files, which
# yields a slimmer image.
COPY ./package.json ./
COPY --from=builder ./node_modules/ ./node_modules/
Immediately we see that our Docker image reduced in size by over 35% 😮💨 . Pure profit.