1

I want to use the mkl to compute 1D FFT of a 2D array stored as a 1D array. for example,

for (int j=0; j<NJ; j++) //rows
{
  for (int i=0; i<NI; i++) //columns
   {
     Pre_2D_array[i+j*NI].x=1.0;
     Pre_2D_array[i+j*NI].y=2.0;
   }
}

I want to compute 1D FFT of the Pre_2D_array in row dimension. The only way I can think out is reshaping the array and the doing the FFT like this,

   for (int i=0; i<NI; i++) //columns
    {
      for (int j=0; j<NJ; j++) //rows
       {
         2D_array[j+i*NJ]=Pre_2D_array[i+j*NI];
       }
    }

DFTI_DESCRIPTOR_HANDLE desc_x = 0;
DftiCreateDescriptor(&desc_x, DFTI_PREC, DFTI_COMPLEX, 1, NJ);
DftiSetValue(desc_x, DFTI_NUMBER_OF_TRANSFORMS, NI);
DftiSetValue(desc_x, DFTI_INPUT_DISTANCE,  NJ);
DftiCommitDescriptor(desc_x);

DftiComputeForward(desc_x, 2D_array);

Althougth this can get the right answer. But doing the transposition(reshaping) of the original array wastes too much time when the array is big. Is there any way to do the FFT without reshaping the array? Or any fast way to reshape the array as fast as possible?

the cpuinfo is:

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 79
model name  : Intel(R) Xeon(R) CPU E5-2648L v4 @ 1.80GHz
stepping    : 1
microcode   : 0xb000022
cpu MHz     : 1795.882
cache size  : 35840 KB
physical id : 0
siblings    : 14
core id     : 0
cpu cores   : 14
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 20
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap
bogomips    : 3591.76
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:
Jie.Chen
  • 73
  • 9

3 Answers3

1

As far as I can tell, the Intel MKL doesn't provide the capability to perform FFTs in data with a stride between data elements.

However, FFTW does. Per 4.4.1 Advanced Complex DFTs of the FFTW documentation:

fftw_plan fftw_plan_many_dft(int rank, const int *n, int howmany,
                             fftw_complex *in, const int *inembed,
                             int istride, int idist,
                             fftw_complex *out, const int *onembed,
                             int ostride, int odist,
                             int sign, unsigned flags);

This routine plans multiple multidimensional complex DFTs, and it extends the fftw_plan_dft routine (see Complex DFTs) to compute howmany transforms, each having rank rank and size n. In addition, the transform data need not be contiguous, but it may be laid out in memory with an arbitrary stride. To account for these possibilities, fftw_plan_many_dft adds the new parameters howmany, {i,o}nembed, {i,o}stride, and {i,o}dist. The FFTW basic interface (see Complex DFTs) provides routines specialized for ranks 1, 2, and 3, but the advanced interface handles only the general-rank case.

howmany is the (nonnegative) number of transforms to compute. The resulting plan computes howmany transforms, where the input of the k-th transform is at location in+k*idist (in C pointer arithmetic), and its output is at location out+k*odist. Plans obtained in this way can often be faster than calling FFTW multiple times for the individual transforms. The basic fftw_plan_dft interface corresponds to howmany=1 (in which case the dist parameters are ignored).

Each of the howmany transforms has rank rank and size n, as in the basic interface. In addition, the advanced interface allows the input and output arrays of each transform to be row-major subarrays of larger rank-rank arrays, described by inembed and onembed parameters, respectively. {i,o}nembed must be arrays of length rank, and n should be elementwise less than or equal to {i,o}nembed. Passing NULL for an nembed parameter is equivalent to passing n (i.e. same physical and logical dimensions, as in the basic interface.)

The stride parameters indicate that the j-th element of the input or output arrays is located at j*istride or j*ostride, respectively. (For a multi-dimensional array, j is the ordinary row-major index.) When combined with the k-th transform in a howmany loop, from above, this means that the (j,k)-th element is at j*stride+k*dist. (The basic fftw_plan_dft interface corresponds to a stride of 1.)

For in-place transforms, the input and output stride and dist parameters should be the same; otherwise, the planner may return NULL.

The page conveniently supplies a (somewhat confusing) example of performing 1D FFTs on the columns of a 2D array:

Transform each column of a 2d array with 10 rows and 3 columns:

   int rank = 1; /* not 2: we are computing 1d transforms */
   int n[] = {10}; /* 1d transforms of length 10 */
   int howmany = 3;
   int idist = odist = 1;
   int istride = ostride = 3; /* distance between two elements in 
                                 the same column */
   int *inembed = n, *onembed = n;

Also see How do I use fftw_plan_many_dft on a transposed array of data? for more examples.

Andrew Henle
  • 32,625
  • 3
  • 24
  • 56
  • Thank you for providing another way to solve it. But now I only want to use MKL to solve it. @francis showed that the Intel MKL also provide the function to perform FFTs in data with a stride between data elements. – Jie.Chen Mar 10 '19 at 02:53
1

The FFTW library has introduced arguments istride or ostride in functions like fftw_plan_many_dft() to avoid transposing the array. The last example on that page is a DFT on the second dimension.

Similarly, the Intel math kernel library introduces data layout configuration parameters such as DFTI_INPUT_STRIDES and DFTI_OUTPUT_STRIDES or DFTI_NUMBER_OF_TRANSFORMS.

The DFT over the second dimension may look like (I did not tested it):

DftiCreateDescriptor(&desc_x, DFTI_PREC, DFTI_COMPLEX, 1, NJ);
DftiSetValue(desc_x, DFTI_NUMBER_OF_TRANSFORMS, NI);
DftiSetValue(desc_x, DFTI_INPUT_STRIDES, &NI);
DftiSetValue(desc_x, DFTI_OUTPUT_STRIDES, &NI);
DftiSetValue(desc_x, DFTI_INPUT_DISTANCE,  1);
DftiSetValue(desc_x, DFTI_OUTPUT_DISTANCE,  1);
DftiCommitDescriptor(desc_x);

DFTI_OUTPUT_STRIDES is ignored for in-place transforms (DFTI_PLACEMENT=DFTI_INPLACE).

francis
  • 9,525
  • 2
  • 25
  • 41
  • It will return "Segment fault" if I set the desc_x according to your codes. And I cann't find the problem. – Jie.Chen Mar 09 '19 at 08:39
  • Looking at the documentation, I just rode that `DFTI_INPUT_STRIDES` is supposed to be an integer array. Could you try `DftiSetValue(desc_x, DFTI_INPUT_STRIDES, &NI);` and `DftiSetValue(desc_x, DFTI_OUTPUT_STRIDES, &NI);`? – francis Mar 09 '19 at 10:28
  • Oh,I forgot it! Your answer really works for me! Thank you. @francis – Jie.Chen Mar 10 '19 at 02:49
0

You cannot do 1D FFTs along higher dimensions of your data. You need to do transposition first so that the FT dimension is the one, where the data is contiguous in RAM.

However, it is not as bad as you might think. On a multi core machine, you can easily setup some threads, whose sole job is to pre / post arrange FT data.

Kaveh Vahedipour
  • 3,412
  • 1
  • 14
  • 22