Improve performance of 3d array

Question

I try to loop through big 3D array of structures and it works so slowly. Then I used 1D array instead of 3D, but without success.

I use structure below to describe parameters of one cell of 3D mesh:

struct cellStruct
{
    double v1;
    // more variables here
    double v15;
    double v16;
    double v17;
    double v18;
};

Please take a look to two used approaches.

3D arrays

#define Nx 500
#define Ny 500
#define Nz 500

cellStruct ***cell;
cell = new cellStruct **[Nx];
for(int i=0;i<Nx;i++)
{
    cell[i]=new cellStruct *[Ny];
    for(int j=0;j<Ny;j++)
        cell[i][j]=new cellStruct [Nz];
}

for (i = 0; i< Nx; ++i)
    for (j = 0; j< Ny; ++j)
        for (k = 0; k< Nz; ++k)
        {
            // big algorithm that uses array like in string below
            cell[i][j][k+1].v1 = cell[i][j+1][k-1].v2 *
                                 cell[i+1][Ny-1][k+1].v5;
        }

1D array

#define cell(i,j,k) (cells[(i)*Ny*Nz + (j)*Ny + (k)])
cellStruct *cells = new cellStruct [Nx*Ny*Nz];
for (i = 1; i< Nx-1; ++i)
    for (j = 1; j< Ny-1; ++j)
        for (k = 1; k< Nz-1; ++k)
        {
            cell(i,j,k+1).v1 = cell(i,j+1,k-1).v2 * cell(i+1,Ny-1,k+1).v5;
        }

Program works more slowly in case 2. How else I can improve approach of working with big 3D array? Using float variables speed up calculations twice, but I want to have more accuracy. Maybe is better to use structure with pointers to variables inside like below?

struct cells
{
    double ***v1;
    // ...
    double ***v15;
    double ***v16;
    double ***v17;
    double ***v18;
};

http://stackoverflow.com/questions/9951603/c-improving-cache-performance-in-a-3d-array?rq=1 — Peter G., Feb 08 '14 at 08:37
http://stackoverflow.com/questions/7734693/improve-c-function-performance-with-cache-locality/7735362#7735362 — Peter G., Feb 08 '14 at 08:38
If your sizes are all preprocessor definitions, then you might as well declare `cellStruct[Nx][Ny][Nz]`. Also, please note that in the first example, your program will likely crash due to the `for (k=0; k — barak manos, Feb 08 '14 at 08:38
Have you compiled your code with optimizations on? This may give a performance boost. Also, when your struct cell contains only doubles, you could try to write it as array too, so that `cell[i][j+1][k-1].v2` becomes `cell[i][j+1][k-1][2]`, maybe this has more optimization potential... — mb84, Feb 08 '14 at 14:58
recently was helping with similar performance problems so look at my answer for insights. with switching to floats you halves the processing data so bullet 3 is more suited for you. PS. pointer magic will not gain you anything useful in your case — Spektre, Feb 11 '14 at 11:04

score 0 · Answer 1 · answered Feb 11 '14 at 10:59

well 500^3 is quite a size -> 125M cells

so static allocation is out of question
each double is 8Byte so for each double inside 8B x 125M = 1G Bytes (G is 1000^3 not 1024^3 in this case !!!)
so ~1GB for each double variable inside cell
so define what means slow runtime for n[GB] data processing ?

You can do only this:

1.rewrite computation to be more effective

so data computed fit inside at least L1 cache
this means compute all in chunks
of course if you do not have a repetitive use of cells then you have nothing to improve with this

2.use multithreading

try to parallelize your computations
but your data is so huge so this approach can even slow things down so be careful !!!
when your data do not fit inside Caches then your computation power is like more powerful 386 machine !!!

3.pack the input data

the best option for you is some cell packing algorithm
they are Voxel's or not ?
so adjacent cells should be similar (inside regions)
I strongly recommend RLE (run length encoding)
it is fast and very efficient for this kind of data (at least for kind of data I assume you use)
if your data is not suited for RLE then divide cells to regions and use DCT/iDCT (like on jpg compresion)
packing/unpacking data should greatly impove your computation times
because after packing your data set could be very very smaller then now

score 0 · Answer 2 · answered Feb 11 '14 at 11:49

Since you want to improve your cache efficiency, converting an array of structures to a structure of arrays will help you.

I am almost sure you will have to convert your triple-indirect pointers to 1-D arrays too, in order to make the struct-of-arrays idea effective.

struct cellStruct
{
    double* v1; // you can use std::vector<double> instead of double*
    // more variables here
    double* v15;
    double* v16;
    double* v17;
    double* v18;
};

Since your calculation only uses v1, v2 and v5, caching all other variables is best disabled. Using the struct-of-arrays layout allocates different memory regions for v1, v2, v3, etc - so you don't force the cache to load these useless v3, v4, v6, ...

Some syntax tweaks:

#define CELL_ACCESS(cells,vn,i,j,k) (cells.vn[(i)*Ny*Nz + (j)*Ny + (k)])
cellStruct cells;
cells.v1 = new double[Nx*Ny*Nz]; // if you use std::vector, adjust code accordingly
cells.v2 = new double[Nx*Ny*Nz];
...
for (i = 1; i< Nx-1; ++i)
    for (j = 1; j< Ny-1; ++j)
        for (k = 1; k< Nz-1; ++k)
        {
            CELL_ACCESS(     cells, v1, i,   j,    k+1) =
                CELL_ACCESS( cells, v2, i,   j+1,  k-1) *
                CELL_ACCESS( cells, v5, i+1, Ny-1, k+1);
        }

Improve performance of 3d array

2 Answers2