4

I have some code that is heavily using np.fft.rfft and np.fft.irfft, such that this is the bottleneck for optimisation.

Is there any chance of going faster than this, and if so what are my best options. Thoughts that occur to me would be:

  • Cython - heard this is very fast; but would it help here though?
  • digging into numpy - rfft calls _raw_fft which does lots of checking then calls fftpack.cfftf. Profiler is telling me only 80% of the time is in fftpack.cfftf, stripping down the wrapping to the only bits I need could save a little time.
  • find a faster DFT algorithm somewhere?
  • buy more computers

So really the question boils down to:

  1. Does anyone with Cython experience know if it would be worth trying here - or can it not make numpy any faster.
  2. Are there any faster packages out there? How much faster is possible?
Corvus
  • 7,548
  • 9
  • 42
  • 68
  • I found a partial answer in another question and put it below - but if people think this question should be closed/deleted I won't be offended. (I left it since one person has upvoted the question, I'll leave it to others to decide it this is a helpful question) – Corvus Aug 30 '13 at 15:36
  • Given the duplicate I'll delete this question tomorrow if no-one says anything interesting etc. – Corvus Aug 30 '13 at 15:42
  • 1
    Do you actually need to compute the response for all frequencies? If you're only looking for a few select frequencies, you could significantly increase your speed by writing a custom FFT or even creating a digital filter to resonate at the desired frequencies. – DrRobotNinja Aug 30 '13 at 18:42
  • Just to double check you compiled scipy with an optimized BLAS? – Daniel Aug 30 '13 at 18:50
  • @Ophion I hadn't, in part because I hadn't tried using scipy yet - would this make a difference to numpy? Also, do you have a link to explain a little more what you mean - I've never compiled any python package - I just download them! – Corvus Aug 30 '13 at 21:42
  • 4
    @corone Ah whoops- compile numpy with an optimized BLAS such as the [MKL](http://software.intel.com/en-us/articles/the-intel-math-kernel-library-and-its-fast-fourier-transform-routines). A way to test it out is grab python anaconda and try the mkl optimization for a month (free) from [here](https://store.continuum.io) to see what difference it makes easily. Out of curiosity are you / can you parallelize your code? – Daniel Aug 30 '13 at 21:59
  • @Ophion the overall problem is embarrassingly parallel, and so will be split across many cores/machines. So I'm only looking for single threaded optimisations to the FFT part. I'll have a look at those - thanks! – Corvus Aug 31 '13 at 10:41
  • "I have some code that is heavily using np.fft.rfft and np.fft.irfft, such that this is the bottleneck for optimisation." Well start here. What exactly are you doing with the FFTs? Can you pad to convenient sizes that make the FFT go faster? Can you do Zoom FFT if you don't need all the frequencies? – endolith Oct 18 '16 at 15:31

1 Answers1

6

I've found this question/answer which actually answer part of this:

https://stackoverflow.com/a/8481916/1900520

Shows that there is another FFT implementation in scipy that is quite a bit faster, but also that there is a package called FFTW that goes faster still (up to about 3x looking at these benchmarks).

So that just leaves the question of whether Cython would make this go any faster.

Community
  • 1
  • 1
Corvus
  • 7,548
  • 9
  • 42
  • 68