Background: I have a C99 routine that needs temporary storage of varying datatypes with varying alignment requirements. Currently I call posix_memalign
multiple times which a) introduces lots of overhead and b) does not guarantee that my temporaries have good memory locality. I cannot pack the temporaries into a single struct as the size requirements are only known at runtime.
Question: I want to call malloc
(or something similar) once with a sufficiently large size that I can dice up/carve up/parcel out individual pointers with the required alignments. Is there a canonical way to accomplish this task within C99?
A possible (but unlikely) answer for a concrete example: Say I want to allocate enough space for a char[3]
, a double[m]
with 16-byte alignment, an int
, and a float[n]
with 16-byte alignment and that I require them in memory in that order. Please ignore that the order is stupid and the example contrived. My actual use case is a a control struct followed by several temporary arrays of mixed integer/numeric types with alignments allowing SSE operations.
Using the ideas from How to allocate aligned memory only using the standard library? one might do:
// Several unnecessary values (e.g. alignment_c) are holdovers
// from the macros generating this logic.
// ell, m, and n are sizes known only at runtime
const size_t datasize_c = 3*sizeof(char);
const size_t alignment_c = __alignof__(char);
const size_t pad_c = alignment_c - 1;
const uintptr_t mask_c = ~(uintptr_t)(alignment_c - 1);
const size_t datasize_d = ell*sizeof(double);
const size_t alignment_d = __alignof__(double) > 16 ? __alignof__(double) : 16;
const size_t pad_d = alignment_d - 1;
const uintptr_t mask_d = ~(uintptr_t)(alignment_d - 1);
const size_t datasize_i = m*sizeof(int);
const size_t alignment_i = __alignof__(int);
const size_t pad_i = alignment_i - 1;
const uintptr_t mask_i = ~(uintptr_t)(alignment_i - 1);
const size_t datasize_f = n*sizeof(float);
const size_t alignment_f = __alignof__(float) > 16 ? __alignof__(float) : 16;
const size_t pad_f = alignment_f - 1;
const uintptr_t mask_f = ~(uintptr_t)(alignment_f - 1);
const size_t p_parcel = (datasize_c + pad_c)
+ (datasize_d + pad_d)
+ (datasize_i + pad_i)
+ (datasize_f + pad_f) ;
void * const p = malloc(p_parcel) ;
char * c = (void *) (((uintptr_t)(p ) + pad_c & mask_c));
double * d = (void *) (((uintptr_t)(c + ell) + pad_d & mask_d));
int * i = (void *) (((uintptr_t)(d + m ) + pad_i & mask_i));
float * f = (void *) (((uintptr_t)(i + n ) + pad_f & mask_f));
// check if p is NULL, use (c, d, i, f), then free p
I believe this possibility is functionally correct but I'm wondering if anyone has a better, cleaner, shorter way?
I think approaches using the struct hack aren't feasible because I can only guarantee alignment of one array using a malloc of a single struct hack. I would still need three malloc
calls for three separate struct hacks.
Lastly, I'd be happy to provide the macros that generate that mess if anyone wants them.