I am trying to parallelize a C function using CUDA. I noticed that there are several structs which are being passed as pointers to this function.
With the unified memory view, I have identified and modified malloc()
to cudaMallocManaged()
.
But, now there is a allocation using memalign()
. I want to achieve a similar task as that was done by cudaMallocManaged()
.
Does such an equivalent exists ? If no, then what needs to be done?
This is how the memalign()
allocation line looks:
float *data = (float*) memalign(16, some_integer*sizeof(float));