Python: loop optimization

Question

I am quite new to Python but I've started using it for some data analysis and now I love it. Before, I used C, which I find just terrible for file I/O.

I am working on a script which computes the radial distribution function between a set of N=10000 (ten thousands) points in a 3D box with periodic boundary conditions (PBC). Basically, I have a file of 10000 lines made like this:

0.037827 0.127853 -0.481895
12.056849 -12.100216 1.607448
10.594823 1.937731 -9.527205
-5.333775 -2.345856 -9.217283
-5.772468 -10.625633 13.097802
-5.290887 12.135528 -0.143371
0.250986 7.155687 2.813220

which represents the coordinates of the N points. What I have to do is to compute the distance between every couple of points (I hence have to consider all the 49995000 combinations of 2 elements) and then do some operation on it.

Of course the most taxing part of the program is the loop over the 49995000 combinations.

My current function looks like this:

    g=[0 for i in range(Nbins)]

for i in range(Nparticles):
    for j in range(i+1,Nparticles):

        #compute distance and apply PBC
        dx=coors[i][0]-coors[j][0]
        if(dx>halfSide):
            dx-=boxSide
        elif(dx<-halfSide):
            dx+=boxSide

        dy=coors[i][1]-coors[j][1]
        if(dy>halfSide):
            dy-=boxSide
        elif(dy<-halfSide):
            dy+=boxSide

        dz=coors[i][2]-coors[j][2]
        if(dz>halfSide):
            dz-=boxSide
        elif(dz<-halfSide):
            dz+=boxSide

        d2=dx**2+dy**2+dz**2

        if(d2<(cutoff*boxSide)**2):
            g[int(sqrt(d2)/dr)]+=2

Notice: coors is a nested array, created using loadtxt() on the data file.

I basically recycled a function that I used in another program, written in C.

I am not using itertool.combinations() because I have noticed that the program runs slightly slower if I use it for some reason (one iteration is around 111 s while it runs in around 106 with this implementation).

This function takes around 106 s to run, which is terrible considering that I have to analyze some 500 configuration files.

Now, my question is: is there general a way to make this kind of loops run faster using Python? I guess that the corresponding C code would be faster, but I would like to stick to Python because it is so much easier to use.

I would like to stress that even though I'm certainly looking for a particular solution to my problem, I would like to know how to iterate more efficiently when using Python in general.

PS Please try to explain the code in your answers as much as possible (if you write any) because I'm a newbie in Python.

Update

First, I want to say that I know that, since there is a cutoff, I could write a more efficient algorithm if I divide the box in smaller boxes and only compute the distances in neighboring boxes, but for now I would only like to speed up this particular algorithm.

I also want to say that using Cython (I followed this tutorial) I managed to speed up everything a little bit, as expected (77 s, where before it took 106).

If you're using python2.x, you'll want to use `xrange` rather than `range`. If that doesn't get it going fast enough, you'll probably need to start using something like `numpy` and figure out how you can do the operations using numpy's vectorized functions. — mgilson, Jun 20 '16 at 16:40
Do you absolutly have to use nested for loops? thats usually a code smell and should be avoided — Mo H., Jun 20 '16 at 16:40
If numpy or pandas doesn't speed things up enough, Python also allows you to execute C code--you might want to check into that as well. — 7stud, Jun 20 '16 at 16:43
I'd try some basic math to avoid extra conditionals. For example: `if(dx>halfSide): dx-=boxSide elif(dx<-halfSide): dx+=boxSide` looks *a lot* like it could be simplified with `if abs(dx) < halfSide`. — EOF, Jun 20 '16 at 16:46
Try to avoid loops completely. `numpy` is very well suited for this sort of thing. — Mad Physicist, Jun 20 '16 at 16:52
Once the for loops are optimized, if `cutoff` is large, it is likely that `if(d2<(cutoff*boxSide)**2): g[int(sqrt(d2)/dr)]+=2` spends a large part of time. Computing a square root and a division is time consumming. The division can be easily avoided. Is binning by radius absolutely required? Otherwise, try `one_dx2=1./(dx*dx) ... for, for ... g[int(d2*one_dx2)]+=2` If `cutoff` is small, there are bounding-box based improvements within the grasp of your hand! — francis, Jun 20 '16 at 21:05

Mad Physicist · Accepted Answer · 2016-06-21T16:51:31.553

3

If memory is not an issue (and it probably isn't given that the actual amount of data won't be different from what you are doing now), you can use numpy to do the math for you and put everything into an NxN array (around 800MB at 8 bytes/float).

Given the operations your code is trying to do, I do not think you need any loops outside numpy:

g = numpy.zeros((Nbins,))
coors = numpy.array(coors)

#compute distance and apply PBC
dx = numpy.subtract.outer(coors[:, 0], coors[:, 0])
dx[dx < -halfSide] += boxSide
dx[dx > halfSide)] -= boxSide
dy = numpy.subtract.outer(coors[:, 1], coors[:, 1])
dy[dy < -halfSide] += boxSide
dy[dy > halfSide] -= boxSide
dz = numpy.subtract.outer(coors[:, 2], coors[:, 2])
dz[dz < -halfSide] += boxSide
dz[dz > halfSide] -= boxSide

d2=dx**2 + dy**2 + dz**2
# Avoid counting diagonal elements: inf would do as well as nan
numpy.fill_diagonal(d2, numpy.nan)

# This is the same length as the number of times the
# if-statement in the loops would have triggered
cond = numpy.sqrt(d2[d2 < (cutoff*boxSide)**2]) / dr
# See http://stackoverflow.com/a/24100418/2988730
np.add.at(g, cond.astype(numpy.int_), 2)

The point is to always stay away from looping in Python when dealing with large amounts of data. Python loops are inefficient. They perform lots of book-keeping operations that slow down the math code.

Libraries such as numpy and dynd provide operations that run the loops you want "under the hood", usually bypassing the bookkeeping with a C implementation. The advantage is that you can do the Python-level book-keeping once for each enormous array and then get on with processing the raw numbers instead of dealing with Python wrapper objects (numbers are full blown-objects in Python, there are no primitive types).

In this particular example, I have recast your code to use a sequence of array-level operations that do the same thing that your original code did. This requires restructuring how you think about the steps a little, but should save a lot on runtime.

edited Jun 21 '16 at 16:51

answered Jun 20 '16 at 17:36

Mad Physicist

107,652
25
181
264

Thank you, very clear answer. There is a problem, though: I tried to run your code and I got the following error: at `dx = numpy.subtract.outer(coors[:, 0], coors[:, 0])` I get `list indices must be integers, not tuple` – valerio Jun 21 '16 at 10:33
@valerio92. That is because I assumed that your `coors` array was an array rather than a nested Python list. There are a couple of things you can do. The easiest would probably be to convert `coors` to a `numpy` array: `coors = numpy.array(coors)`. You can also look into the `pandas` library. It will easily load your file since it is basically a CSV. Pandas provides an extension to `numpy` that makes certain aspects of data analysis much easier. – Mad Physicist Jun 21 '16 at 13:26
I used `loadtxt` to create it so it came like this (a nested array). I will try to make it an array so that I can try your implementation. – valerio Jun 21 '16 at 14:18
`numpy.array(coors)` should work fine. I've even updated the answer since you made it explicit in the question. A list would be OK, but unfortunately you have a list of lines instead of columns. Such is life. – Mad Physicist Jun 21 '16 at 14:20
Now, at `np.add.at(g, int(cond), 2)`, I get `only length-1 arrays can be converted to Python scalars`. Unfortunately my understanding of your code is really poor, so it is difficult for me to deal with the errors... – valerio Jun 21 '16 at 14:28
That's fine. I would recommend reading the docs. I will try to reproduce the error. – Mad Physicist Jun 21 '16 at 14:44
My mistake. Updated. I forgot `cond` was an array, tried casting it to a single int instead of converting the type. – Mad Physicist Jun 21 '16 at 14:48
Awesome!! It took 10 seconds!! I wish I could understand your code better so I could learn how to do such things in the future...Is there any tutorial to learn how to program properly using numpy? – valerio Jun 21 '16 at 15:02
1

As a matter of fact, yes. I would read through the official docs starting here: https://docs.scipy.org/doc/numpy-dev/user/quickstart.html. After that, browse through all of the links that seem relevant. After that, look up the functions that I used here (just Google "numpy " and the man-page will come up). Once you have some of the background down, I would recommend SO and Google as your primary resources. Most of the basic "How do I ..." have been hashed out pretty well here. – Mad Physicist Jun 21 '16 at 15:32
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/115220/discussion-between-valerio92-and-mad-physicist). – valerio Jun 21 '16 at 16:07

score 2 · Answer 2 · answered Jun 21 '16 at 09:16

You've got an interesting problem there. There are some common guidelines for high performance Python; these are for Python2 but should carry for the most part to Python3.

Profile your code. Using %timeit and %%timeit on jupyter/ipython is quick and fast for interactive sessions, but cProfile and line_profiler are valuable for finding bottlenecks.
This a link to a short essay that covers the basics from the Python documentation that I found helpful: https://www.python.org/doc/essays/list2str
Numpy is a great package for vectorized operations. Note that numpy vectors are generally slower to work with than small-medium size lists but the memory savings are huge. Speed gain is substantial over lists when doing multi-dimensional arrays. Also if you start observing cache misses and page-faults with pure Python, then the numpy benefit will be even greater.
I have been using Cython recently with a fair amount of success, where numpy/scipy don't quite cut it with the builtin functions.

Check out scipy which has a huge library of functions for scientific computing. There's a lot to explore; eg., the scipy.spatial.pdist function computes fast nC2 pairwise distances. In the testrun below, 10k items pairwise distances completed in 375ms. 100k items will probably break my machine though without refactoring.

import numpy as np
from scipy.spatial.distance import pdist
xyz_list = np.random.rand(10000, 3)
xyz_list
Out[2]: 
array([[ 0.95763306,  0.47458207,  0.24051024],
       [ 0.48429121,  0.12201472,  0.80701931],
       [ 0.26035835,  0.76394588,  0.7832222 ],
       ..., 
       [ 0.07835084,  0.8775841 ,  0.20906537],
       [ 0.73248369,  0.60908474,  0.57163023],
       [ 0.68945879,  0.19393467,  0.23771904]])
In [10]: %timeit xyz_pairwise = pdist(xyz_list)
1 loop, best of 3: 375 ms per loop

In [12]: xyz_pairwise = pdist(xyz_list)
In [13]: len(xyz_pairwise)
Out[13]: 49995000

Happy exploring!

Thank you for your answer, you gave me much food for thought! This function pdist is really cool, but how can I implement periodic boundary condition in it? — valerio, Jun 21 '16 at 10:25

Python: loop optimization

2 Answers2