0

I am trying to parallelize a C function using CUDA. I noticed that there are several structs which are being passed as pointers to this function. With the unified memory view, I have identified and modified malloc() to cudaMallocManaged().

But, now there is a allocation using memalign(). I want to achieve a similar task as that was done by cudaMallocManaged().

Does such an equivalent exists ? If no, then what needs to be done?

This is how the memalign() allocation line looks:

float *data = (float*) memalign(16, some_integer*sizeof(float));
Tlacenka
  • 520
  • 9
  • 15
Swaroop
  • 1,219
  • 3
  • 16
  • 32
  • According to the [cuda c programming guide](http://docs.nvidia.com/cuda/cuda-c-programming-guide/#data-transfer-between-host-and-device) memory allocated with cuda allocation functions is always aligned to at least 256 bytes. As far as I know you cannot specify other alignments. – havogt Aug 13 '15 at 12:09

1 Answers1

2

You should be able to register an existing host memory buffer like this:

float *data = (float*) memalign(16, some_integer*sizeof(float));
cudaHostRegister((void *)data, some_integer*sizeof(float), cudaHostRegisterDefault);

after registration data should behave the same as memory allocated with cudaMallocManaged. Check the return value from the cudaHostRegister call, if it fails, you have chosen an incompatible alignment.

talonmies
  • 70,661
  • 34
  • 192
  • 269