We are migrating an ActivePivot application to a new server (4 sockets Intel Xeon, 512GB of memory). After deploying we launched our application benchmark (that's a mix of large OLAP queries concurrent to real-time transactions). The measured performance is almost twice slower than on our previous server, that has similar processors but twice less cores and twice less memory.
We have investigated the differences between the two servers, and it appears the big one has a NUMA architecture (non uniform memory acccess). Each CPU socket is physically close to 1/4 of the memory, but further away from the rest of it... The JVM that runs our application allocates a large global heap, there is a random fraction of that heap on each NUMA node. Our analysis is that the memory access pattern is pretty random and CPU cores frequently waste time accessing remote memory.
We are looking after more feedback about leveraging ActivePivot on NUMA severs. Can we configure ActivePivot cubes, or thread pools, change our queries, configure the operating system?