Equivalent of memalign in cuda

Question

I am trying to parallelize a C function using CUDA. I noticed that there are several structs which are being passed as pointers to this function. With the unified memory view, I have identified and modified malloc() to cudaMallocManaged().

But, now there is a allocation using memalign(). I want to achieve a similar task as that was done by cudaMallocManaged().

Does such an equivalent exists ? If no, then what needs to be done?

This is how the memalign() allocation line looks:

float *data = (float*) memalign(16, some_integer*sizeof(float));

According to the [cuda c programming guide](http://docs.nvidia.com/cuda/cuda-c-programming-guide/#data-transfer-between-host-and-device) memory allocated with cuda allocation functions is always aligned to at least 256 bytes. As far as I know you cannot specify other alignments. — havogt, Aug 13 '15 at 12:09

talonmies · Answer 1 · 2015-08-13T13:22:24.147

You should be able to register an existing host memory buffer like this:

float *data = (float*) memalign(16, some_integer*sizeof(float));
cudaHostRegister((void *)data, some_integer*sizeof(float), cudaHostRegisterDefault);

after registration data should behave the same as memory allocated with cudaMallocManaged. Check the return value from the cudaHostRegister call, if it fails, you have chosen an incompatible alignment.

Equivalent of memalign in cuda

1 Answers1

Linked