0

I'd like to incrementally create a docker image without keeping the intermediate layers.

I understood that docker keep the intermediate layers in order to be able to roll back to a previous version of the image, but I'm not interested in this feature.

The docker-squash project seems to provide a solution, but it has been archived and not maintained for 5 years.

Here is a simple example explaining what I mean

I have an image of 72.7MB

image1                       latest            9873176a8ff5   4 weeks ago    72.7MB

I add a 10MB file to a container and commit it to image2

docker run -it image1 /bin/bash
truncate -s 10MB 10MB_FILE
docker cp 10MB_FILE container_id:/
docker commit -m "add 10MB" container_id image2

image2 weigh 10MB more

image2                       latest            b7d1ea4043fb   58 seconds ago   82.7MB

I remove the 10MB file from a container and commit it to image3

docker run -it image2 /bin/bash
rm 10MB_FILE
exit
docker commit -m "remove fime" container_id image3

image3 weigh the same

image3                       latest            11ff12ee0626   4 seconds ago    82.7MB

I would like to have an image3 weighing 72.7MB.

I don't want to simply revert a commit because there might be some other commit that I want to keep in between the commit where I add the file and delete the file.

I know that I can change the Dockerfile and rebuild from scratch (as answered here), but some installs take a lot of time to do. This is why I want to incrementally create the docker image with docker commit.

lblenner
  • 372
  • 2
  • 14

2 Answers2

1

Suggestion: Create a base image of your version "image1". Tag it and name it. Use it as base for creating "image2" and "image3" (FROM xyz in your Dockerfile). Following this you should be able to achieve your goal. I bet a pint on that ;-)

Javali
  • 535
  • 4
  • 14
1

You should pretty much never use docker commit. Write a Dockerfile and check it into source control.

Note that your image2 in particular is very easy to describe with a Dockerfile, probably easier than creating a container, copying the file in manually, and committing:

FROM image1
COPY 10MB_FILE /
# on the host system
truncate -s 10MB 10MB_FILE
docker build -t image2 .

Now let's say you changed your mind on the contents of the file. You don't need to change the Dockerfile at all; just re-run the build steps.

# on the host system
# create a different 10 MB file
dd if=/dev/zero of=10MB_FILE bs=1M count=10
# build a new image
docker build -t image3 .

When you do this you don't have to remember what images get built from which other images and with what steps; it's written down for you in the Dockerfile. If you're still trying to figure out what goes into the image there's nothing wrong with editing the Dockerfile and iterating on docker build until it's right.

The other important detail is that docker commit always makes an image larger. The new image is the original unmodified image, plus a new image that says what changed. Even if you delete files (as in the last case) your image3 is image2, plus "here's a new layer with the detail that the file got deleted".

Finally, you're almost certainly going to have to rebuild this image at some point in the future, if nothing else because your stack of images is based on a Linux distribution that inevitably will have a security issue. If you can check out your source code and re-run docker build --pull -t image2 . you don't need to remember this very manual commit sequence.

David Maze
  • 130,717
  • 29
  • 175
  • 215
  • I get that I will have to rebuild in the future *at some point*, but the image is used as a common environment for a team and there are changes to the image every day (and the installation of some library takes lots of hours). docker commit was used to incrementally change the image without having to rebuild the libraries, but the Dockerfile is being updated as well. Too bad it's not possible. – lblenner Jul 16 '21 at 12:03
  • 1
    Because of [layer caching](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#leverage-build-cache) a well-written Dockerfile should be pretty quick to re-run; in particular, make sure the "install some library" step happens before any `COPY` or `ARG` statements, and it won't be repeated on rebuild. – David Maze Jul 16 '21 at 13:11