0

I'm trying to save a bunch of numpy arrays keyed by the absolute file path that the data came from using savez. However, when I use load to retrieve that data the leading slashes have been removed from the keys.

>>> import numpy as np
>>> data = {}
>>> data['/foo/bar'] = np.array([1, 2, 3])
>>> data.keys()
['/foo/bar']
>>> np.savez('/tmp/test', **data)
>>> data2 = np.load('/tmp/test.npz')
>>> data2.keys()
['foo/bar']

Is this behavior expected from numpy.savez? Is there a workaround or am I doing something wrong?

j314erre
  • 2,737
  • 2
  • 19
  • 26
  • 2
    An `npz` file is a `zip` archive. Keys are archive file names. Look at the archive with a system tool. It may be sanitizing your names to fit that role. – hpaulj May 25 '17 at 01:25
  • 1
    yep, seems like this is related https://stackoverflow.com/questions/9258069/numpy-savez-interprets-my-keys-as-filenames-ioerror – j314erre May 25 '17 at 01:40
  • Who has a `/foo` root directory? – hpaulj May 25 '17 at 04:26

1 Answers1

1

Looks like the stripping is done by the Python zipfile module, possibly on extract rather than on writing:

https://docs.python.org/2/library/zipfile.html

Note If a member filename is an absolute path, a drive/UNC sharepoint and leading (back)slashes will be stripped, e.g.: ///foo/bar becomes foo/bar on Unix, and C:\foo\bar becomes foo\bar on Windows. And all ".." components in a member filename will be removed, e.g.: ../../foo../../ba..r becomes foo../ba..r. On Windows illegal characters (:, <, >, |, ", ?, and *) replaced by underscore (_).

Writing is done in np.lib.npyio._savez, first to a tmpfile and then to the archive with zipf.write(tmpfile, arcname=fname).

In [98]: np.savez('test.npz',**{'/foo/bar':arr})
In [99]: !unzip -lv test.npz
Archive:  test.npz
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
     152  Stored      152   0% 2017-05-24 19:58 ef792502  foo/bar.npy
--------          -------  ---                            -------
     152              152   0%                            1 file
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Thanks this answers it. I found the answer here to use base64 encode/decode gives me a reasonable workaround https://stackoverflow.com/questions/9258069/numpy-savez-interprets-my-keys-as-filenames-ioerror – j314erre May 25 '17 at 05:23