5

Is there any library to show progress when adding files to a tar archive in python or alternativly would be be possible to extend the functionality of the tarfile module to do this?

In an ideal world I would like to show the overall progress of the tar creation as well as an ETA as to when it will be complete.

Any help on this would be really appreciated.

chakara
  • 51
  • 1
  • 2
  • 1
    percent complete = bytes so far / total bytes * 100%. time = size / rate. This is grade school math. – Ignacio Vazquez-Abrams Jan 17 '11 at 22:28
  • 1
    The problem is how can I get the "bytes so far" from the tarfile module. When I call add and it starts packing the files how do I know how many bytes of the file have been added? – chakara Jan 17 '11 at 22:48

5 Answers5

2

Seems like you can use the filter parameter of tarfile.add()

with tarfile.open(<tarball path>, 'w') as tarball:
   tarball.add(<some file>, filter = my_filter)

def my_filter(tarinfo):
   #increment some count
   #add tarinfo.size to some byte counter
   return tarinfo

All information you can get from a TarInfo object is available to you.

mingxiao
  • 1,712
  • 4
  • 21
  • 33
2

Unfortunately it doesn't look like there is an easy way to get byte by byte numbers.

Are you adding really large files to this tar file? If not, I would update progress on a file-by-file basis so that as files are added to the tar, the progress is updated based on the size of each file.

Supposing that all your filenames are in the variable toadd and tarfile is a TarFile object. How about,

from itertools import imap
from operator import attrgetter
# you may want to change this depending on how you want to update the
# file info for your tarobjs
tarobjs = imap(tarfile.getattrinfo, toadd)
total = sum(imap(attrgetter('size'), tarobjs))
complete = 0.0
for tarobj in tarobjs:
    sys.stdout.write("\rPercent Complete: {0:2.0d}%".format(complete))
    tarfile.add(tarobj)
    complete += tarobj.size / total * 100
sys.stdout.write("\rPercent Complete: {0:2.0d}%\n".format(complete))
sys.stdout.write("Job Done!")
milkypostman
  • 2,955
  • 27
  • 23
2

Find or write a file-like that wraps a real file which provides progress reporting, and pass it to Tarfile.addfile() so that you can know how many bytes have been requested for inclusion in the archive. You may have to use/implement throttling in case Tarfile tries to read the whole file at once.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
2

I have recently written a wrapper library that provides a progress callback. Have a look at it on git hub:

https://github.com/thomaspurchas/tarfile-Progress-Reporter

Feel free to ask for help if you need it.

thomas
  • 949
  • 6
  • 20
0

How are you adding files to the tar file? Is is through "add" with recursive=True? You could build the list of files yourself and call "add" one-by-one, showing the progress as you go. If you're building from a stream/file then it looks like you could wrap that fileobj to see the read status and pass that into addfile.

It does not look like you will need to modify tarfile.py at all.

Andrew Dalke
  • 14,889
  • 4
  • 39
  • 54