Is there any way to initialize a ZipFile
object by passing in the literal bytes of the zip file, instead of having it read a filename? I'm building a restful app that doesn't need to ever touch the disk; it just opens the file, does some work on it, re-zips it and sends it on.

- 13,475
- 17
- 66
- 105
-
Just to be clear: what I'd like is to be able to *open* a zip file from a binary string. As in, I pass in a string containing binary data, and I receive a ZipFile object. The StringIO code I've seen so far deals with writing a ZipFile into a string, but it doesn't seem to provide a way to avoid writing a temporary file. – limp_chimp Sep 23 '13 at 19:47
3 Answers
In comments on the other answers, you say you want to do this:
open a binary string as if it were a zip file. Open it, read/write to files inside of it, and then close it
You just do the same thing as in the other answers, except you create a StringIO.StringIO
(or cStringIO.StringIO
or io.BytesIO
) that's pre-filled with the binary string, and extract the string in the end. StringIO
and friends take an optional initial string for their constructor, and have a getvalue
method to extract the string when you're done. The documentation is very simple, and worth reading.
So, sticking as close to alecxe's answer as possible:
from zipfile import ZipFile
try:
import cStringIO as StringIO
except ImportError:
import StringIO
in_memory = StringIO.StringIO(original_zip_data)
zf = ZipFile(in_memory, "a")
zf.writestr("file.txt", "some text contents")
zf.close()
new_zip_data = in_memory.getvalue()
However, note that ZipFile
can't really write to a zip archive in-place, except for the special case of appending new files to it. This is just as true for in-memory zip archives as on-disk. You can often get away with overwriting a file in the archive by appending a new file with the same path, but that's usually a bad idea (especially if you're creating these things to be sent over the internet).
So, what you probably want to do is exactly the same as when you want to modify a file: create a separate output file, copy the things you need from the input file and write the new things as you go along. It's just that in this case, the input and output files are both ZipFile
objects wrapping StringIO
objects.

- 354,177
- 51
- 601
- 671
Sure, use (c)StringIO instead: http://docs.python.org/2/library/stringio.html Also, you should use BytesIO for Python 3. It does exist for 2.6 and 2.7 though.

- 859
- 1
- 17
- 28
-
-
1@Marcin: In Python 2.x, `StringIO.StringIO` is fine. In 3.x, yes, you'd want to use `io.BytesIO` instead. – abarnert Sep 23 '13 at 18:57
-
@abarnert Right. So using `StringIO` just introduces a portability issue for no reason. – Marcin Sep 23 '13 at 19:00
-
2@Marcin: Sure, but using `BytesIO` also introduces a portability issue, since it doesn't exist in Python 2.5 (and, IIRC, is significantly slower than `StringIO`/`cStringIO` before 2.7.2 or so… but I could be remembering wrong, and it's less likely to matter). Put another way: would you argue that you should always use `io.open` in 2.x instead of `open` for ordinary files? – abarnert Sep 23 '13 at 19:02
-
1@abarnert At this point, pretty much yes, unless pre-2.7 compatibility is known to be required. We're now 3 years into 2.7 being the current release series. There's very little reason for anyone to be using 2.6. (I don't consider having an out of date version installed by your distro to be a good reason). – Marcin Sep 23 '13 at 19:12
-
@Marcin: I'm all for pushing people to use 3.3 whenever possible, and failing that to drop 2.5 compat and try to write dual-version software when you can get away with it. But if you, e.g., have an out of date version installed by the distro used by your hosting company, you can't get away with it. That's still, sadly, the reality in 2013. So you can't just tell people to write 3.x-compatible code if they're using 2.x; you have to explain the differences. – abarnert Sep 23 '13 at 19:27
-
@Marcin: Also, if your project is going to need `2to3` anyway, it's often more reasonable to write the `2to3`-compatible code than the directly-3.x-compatible code. – abarnert Sep 23 '13 at 19:30
-
@abarnert Python isn't large. I've yet to use a hosting service where installing up-to-date software wasn't feasible. I also don't think maintaining multiple codebases is a wonderful idea. – Marcin Sep 23 '13 at 19:32
-
This addresses one half of what I'm looking for, but not the other. I want to be able to *open* a binary string as if it were a zip file. Open it, read/write to files inside of it, and then close it -- without ever actually touching the disk. Maybe I missed something but I don't think that this covers that. – limp_chimp Sep 23 '13 at 19:34
-
@limp_chimp: This is trivial: You can pass an initial value to the `StringIO`/`BytesIO` constructor, and you can get the final string back out at the end with the `getvalue` method. If this isn't clear enough, I'll write an answer. – abarnert Sep 23 '13 at 19:56
Here's an example using (c)StringIO
:
from zipfile import ZipFile
try:
import cStringIO as StringIO
except ImportError:
import StringIO
in_memory = StringIO.StringIO()
zf = ZipFile(in_memory, "a")
zf.writestr("file.txt", "some text contents")
zf.close()
Also see:
-
Wouldn't BytesIO be better? Specifically because the code will then be portable between 2 and 3. – Marcin Sep 23 '13 at 18:57