0

I try to loop through big 3D array of structures and it works so slowly. Then I used 1D array instead of 3D, but without success.

I use structure below to describe parameters of one cell of 3D mesh:

struct cellStruct
{
    double v1;
    // more variables here
    double v15;
    double v16;
    double v17;
    double v18;
};

Please take a look to two used approaches.

  1. 3D arrays

    #define Nx 500
    #define Ny 500
    #define Nz 500
    
    cellStruct ***cell;
    cell = new cellStruct **[Nx];
    for(int i=0;i<Nx;i++)
    {
        cell[i]=new cellStruct *[Ny];
        for(int j=0;j<Ny;j++)
            cell[i][j]=new cellStruct [Nz];
    }
    
    for (i = 0; i< Nx; ++i)
        for (j = 0; j< Ny; ++j)
            for (k = 0; k< Nz; ++k)
            {
                // big algorithm that uses array like in string below
                cell[i][j][k+1].v1 = cell[i][j+1][k-1].v2 *
                                     cell[i+1][Ny-1][k+1].v5;
            }
    
  2. 1D array

    #define cell(i,j,k) (cells[(i)*Ny*Nz + (j)*Ny + (k)])
    cellStruct *cells = new cellStruct [Nx*Ny*Nz];
    for (i = 1; i< Nx-1; ++i)
        for (j = 1; j< Ny-1; ++j)
            for (k = 1; k< Nz-1; ++k)
            {
                cell(i,j,k+1).v1 = cell(i,j+1,k-1).v2 * cell(i+1,Ny-1,k+1).v5;
            }
    

Program works more slowly in case 2. How else I can improve approach of working with big 3D array? Using float variables speed up calculations twice, but I want to have more accuracy. Maybe is better to use structure with pointers to variables inside like below?

struct cells
{
    double ***v1;
    // ...
    double ***v15;
    double ***v16;
    double ***v17;
    double ***v18;
};
JohnPisun
  • 9
  • 2
  • http://stackoverflow.com/questions/9951603/c-improving-cache-performance-in-a-3d-array?rq=1 – Peter G. Feb 08 '14 at 08:37
  • http://stackoverflow.com/questions/7734693/improve-c-function-performance-with-cache-locality/7735362#7735362 – Peter G. Feb 08 '14 at 08:38
  • If your sizes are all preprocessor definitions, then you might as well declare `cellStruct[Nx][Ny][Nz]`. Also, please note that in the first example, your program will likely crash due to the `for (k=0; k – barak manos Feb 08 '14 at 08:38
  • Have you compiled your code with optimizations on? This may give a performance boost. Also, when your struct cell contains only doubles, you could try to write it as array too, so that `cell[i][j+1][k-1].v2` becomes `cell[i][j+1][k-1][2]`, maybe this has more optimization potential... – mb84 Feb 08 '14 at 14:58
  • recently was helping with similar performance problems so look at my answer for insights. with switching to floats you halves the processing data so bullet 3 is more suited for you. PS. pointer magic will not gain you anything useful in your case – Spektre Feb 11 '14 at 11:04

2 Answers2

0

well 500^3 is quite a size -> 125M cells

  • so static allocation is out of question
  • each double is 8Byte so for each double inside 8B x 125M = 1G Bytes (G is 1000^3 not 1024^3 in this case !!!)
  • so ~1GB for each double variable inside cell
  • so define what means slow runtime for n[GB] data processing ?

You can do only this:

1.rewrite computation to be more effective

  • so data computed fit inside at least L1 cache
  • this means compute all in chunks
  • of course if you do not have a repetitive use of cells then you have nothing to improve with this

2.use multithreading

  • try to parallelize your computations
  • but your data is so huge so this approach can even slow things down so be careful !!!
  • when your data do not fit inside Caches then your computation power is like more powerful 386 machine !!!

3.pack the input data

  • the best option for you is some cell packing algorithm
  • they are Voxel's or not ?
  • so adjacent cells should be similar (inside regions)
  • I strongly recommend RLE (run length encoding)
  • it is fast and very efficient for this kind of data (at least for kind of data I assume you use)
  • if your data is not suited for RLE then divide cells to regions and use DCT/iDCT (like on jpg compresion)
  • packing/unpacking data should greatly impove your computation times
  • because after packing your data set could be very very smaller then now
Spektre
  • 49,595
  • 11
  • 110
  • 380
0

Since you want to improve your cache efficiency, converting an array of structures to a structure of arrays will help you.

I am almost sure you will have to convert your triple-indirect pointers to 1-D arrays too, in order to make the struct-of-arrays idea effective.

struct cellStruct
{
    double* v1; // you can use std::vector<double> instead of double*
    // more variables here
    double* v15;
    double* v16;
    double* v17;
    double* v18;
};

Since your calculation only uses v1, v2 and v5, caching all other variables is best disabled. Using the struct-of-arrays layout allocates different memory regions for v1, v2, v3, etc - so you don't force the cache to load these useless v3, v4, v6, ...

Some syntax tweaks:

#define CELL_ACCESS(cells,vn,i,j,k) (cells.vn[(i)*Ny*Nz + (j)*Ny + (k)])
cellStruct cells;
cells.v1 = new double[Nx*Ny*Nz]; // if you use std::vector, adjust code accordingly
cells.v2 = new double[Nx*Ny*Nz];
...
for (i = 1; i< Nx-1; ++i)
    for (j = 1; j< Ny-1; ++j)
        for (k = 1; k< Nz-1; ++k)
        {
            CELL_ACCESS(     cells, v1, i,   j,    k+1) =
                CELL_ACCESS( cells, v2, i,   j+1,  k-1) *
                CELL_ACCESS( cells, v5, i+1, Ny-1, k+1);
        }
anatolyg
  • 26,506
  • 9
  • 60
  • 134