0

With an unfortunately outdated numpy version 1.8.2, I get the following behavior:

I have a dictionary with eight sparse CSR matrices as values.

>>> tmp = [ (D[key][select,:].T.sum(0))[:,:,None] for key in D ];

Until this point, there is no problem. The list contains dense 2d numpy matrices with shapes (1,len(select),1). The len(select) is less than 300. Memory consumption is only around 3% and almost 7 GB free RAM available.

>>> result = np.concatenate(tmp,axis=2);

Within the blink of an eye I get Segmentation Fault ('Speicherzugriffsfehler') from the terminal, no development visible on htop that the memory would be running full or anything. Also, I would think that the consumption should not exceed roughly twice as much as before and that was practically nothing. Nevertheless, I can repeat as often as I want, it always gives me a SegFault.

I would like to rule out that it is a problem of my implementation.

UPDATE: It seems that after updating numpy slightly to version 1.10, the problem does not happen any more. Maybe some severe bug in 1.8.2 that no one cares about as it is completely outdated...

Radio Controlled
  • 825
  • 8
  • 23
  • 1
    Have you tried `concatenate` on a subset of `tmp`? I'd also check the `shape` of all elements. If `tmp` contains dense arrays, then then fact that `D[key]` was originally a `csr` matrix shouldn't matter. If memory was a problem, I'd expect a `memory error`, not s segmentation fault, but the age of your version might make a difference. – hpaulj Apr 20 '19 at 16:29
  • Unfortunately those also don't yield a result. But now it has worked once with exactly the same code. Next time again SegFault... Could it even be my hardware? – Radio Controlled Apr 20 '19 at 17:18
  • I suspect it had something to do with the elements to be concatenated being matrices and thereby assumed to be 2d... – Radio Controlled Apr 21 '19 at 06:47

1 Answers1

1

Looking at your code, there is something strange going on (even in 1.16)

Start with a sample sparse matrix:

In [365]: M                                                                          
Out[365]: 
<10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 20 stored elements in Compressed Sparse Row format>
In [366]: M[0,:].T                                                                   
Out[366]: 
<10x1 sparse matrix of type '<class 'numpy.float64'>'
    with 3 stored elements in Compressed Sparse Column format>
In [367]: M[0,:].T.sum(0)       

row or column sum of a sparse matrix produces np.matrix.

Out[367]: matrix([[1.91771869]])
In [368]: M[0,:].T.sum(0)[:,:,None]                                              
Out[368]: matrix([[[1.91771869]]])

we shouldn't be able to expand a np.matrix to 3d. Could this be causing problems in concatenate? Not now, but it might have in earlier versions

In [369]: np.concatenate([M[0,:].T.sum(0)[:,:,None]])   
Out[369]: matrix([[1.91771869]])
In [370]: _368.shape                                                                 
Out[370]: (1, 1, 1)  
In [371]: np.concatenate([_368,_368])                                                
Out[371]: matrix([[1.91771869, 1.91771869]])

Just a few days ago I saw a question that produced a 3d np.matrix, when it shouldn't have.

Why does indexing this Numpy matrix cause an error?

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • I changed it now to `np.concatenate([np.array(M[0,:].T.sum(0))[:,:,None]])`, but I can imagine this mistake can happen relatively easily to others, too... – Radio Controlled Apr 22 '19 at 08:02