0

Let's say I have an associative array keyed by unsigned int; values could be of any fixed-size type. There is some pre-defined maximum no. of instances.

API usage example: MyStruct * valuePtr = get(1234); and put(6789, &myStructInstance); ...basic.

I want to minimise cache misses as I read entries rapidly and at random from this array, so I pre-malloc(sizeof(MyType) * MAX_ENTRIES) to ensure locality of reference inasmuch as possible.

Genericism is important for the values array. I've looked at C pseudo-generics, but prefer void * for simplicity; however, not sure if this is at odds with performance goals. Ultimately, I would like to know what is best for performance.

How should I implement my associative array for performance? Thoughts thus far...

  • Do I pass the associative array a single void * pointer to the malloced values array and allow it to use that internally (for which we would need to guarantee a matching keys array size)? Can I do this generically, since the type needs(?) to be known in order to index into values array?
  • Do I have a separate void * valuePtrs[] within the associative array, then have these pointers point to each element in the malloced values array? This would seem to avoid need to know about concrete type?
  • Do I use C pseudo-generics and thus allow get() to return a specific value type? Surely in this case, the only benefit is not having to explicitly cast, e.g. MyStruct* value = (MyStruct*) get(...)... the array element still has to be dereferenced and so has the same overhead?

And, in general, does the above approach to minimising cahce misses appear to make sense?

Community
  • 1
  • 1
Engineer
  • 8,529
  • 7
  • 65
  • 105
  • Please consider showing how you imagine the API to use the array, it helps make it more concrete. – unwind Feb 10 '15 at 15:53
  • @unwind As you requested, see second para. – Engineer Feb 10 '15 at 15:56
  • Yeah ... And for those string examples, do you envision the stored data inside the array to be just the string pointer, not the string itself? If it's just the pointer, then of course you'll use `void *` since that can point at anything, and your array then becomes a "pointer array". – unwind Feb 10 '15 at 15:57
  • @unwind Good point, actually, let me adapt this since strings are a bad example. Most generally, the values array would be an array of struct. EDIT: updated accordingly. – Engineer Feb 10 '15 at 15:58
  • Yes, strings are a bad example for thing like this. Now, without generics I don't see how you *can* have a function whose return type is dynamic like you show. It'd have to be `bool get(void *value, const void *key)` or something like that in order to copy a value whose size is not known at compile-time. – unwind Feb 10 '15 at 16:07
  • @unwind, Yes, idiotic of me: NOT a value array. API usage updated. The point is that this pointer dereference should be reasonably cheap if locality of reference is good. So I am willing to go with deref rather than accessing values directly. Maybe this will enable you to supply a fuller answer. – Engineer Feb 10 '15 at 17:09

2 Answers2

1

In both cases, the performance is basically the same. In the first one (void* implementation), you will need to look up the value + dereference the pointer. So these are two instructions.
In the other implementation, you will need to multiply the index with the size of the values. So this implementation also asks for two instructions. However, the first implementation will be easier and more clean to implement. Also, the array is fully transparent; the user will not need to know what kind of structures are in the array.

  • Thanks Ruben. In fact, the associative array lookup is already about as performant as it will get for my use case, where worst-case read access time is more important than anything else... I've opted for a binary search lookup. So I'm afraid this answer isn't very helpful as it stands. If you could elaborate on `void *` adding another dereference, I can at least +1 you. – Engineer Feb 10 '15 at 16:03
  • Either you use the void* array, this having a fully generic value-array. If you want to have values, you will have to follow the pointer to it's location, thus adding instructions. Also an option (to get a semi generic value-array), is to make it a structure, which contains the value-array (with the values, not the pointer to the values), and the size of the entries in this table. This way, you can calculate the position of your entries without knowing what kind of structure is in the array. The first one is the better option. Sorry, my first answer was wrong. – Ruben Van Dijck Feb 10 '15 at 16:14
1

See solutions categorised below, in terms of pros and cons (thanks to Ruben for assisting my thinking)... I've implemented Option 2 and 5 for my use case, which is somewhat generalised; I recommend Option 4 if you need a very specific, one-off data structure. Option 3 is most flexible while being trivial to code, and is also the slowest. Option 4 is the quickest. Option 5 is a little slow but with flexibility on the array size and ease of general use.


Associative array struct points to array of typed pointers:

pros no failure value required, explicit casts not required, does not need compile-time size of array

cons costly double deref, requires generic library code


Associative array struct holds array of void * pointers:

pros no failure value required, no generic library code

cons costly double deref, explicit casts following get(), needs compile time size of array if VLAs are not used


Associative array struct points to array of void * values:

pros no generic library code, does not need compile-time size of array

cons costly triple deref, explicit casts following get(), requires offset calc which requires sizeof value passed in explicitly


Associative array struct holds array of typed values:

pros cheap single deref, explicit casts not required, keys and entries allocated contiguously

cons requires generic library code, failure value must be supplied, needs compile time size of array if VLAs are not used


Associative array struct points to array of typed values:

pros explicit casts not required, flexible array size

cons costly double deref, requires generic library code, failure value must be supplied, needs compile time size of array if VLAs are not used

Engineer
  • 8,529
  • 7
  • 65
  • 105