0

I want to use cub to sort an array in each block for me. I call the kernel with multiple blocks, each has 32 threads and each thread has an array of 27 integers. The standard sort according to cubs github page looks like this:

__global__ void foo(...){
int cells[27];
typedef cub::BlockRadixSort<int, 32, 27> BlockRadixSort;
__shared__ typename BlockRadixSort::TempStorage temp_storage;
BlockRadixSort(temp_storage).Sort(cells);
...}

I need to have the cells in shared memory later like this:

__global__ void foo(...){
__shared__ int cells[32 * 27];
...
}

Is it possible in cub to sort arrays already residing in shared memory? or do i have to load all arrays after the sort into shared memory.

Or is there an option to store all the cells in global memory and get them sorted by a cub device function, but separated by blocks of certain size?

John
  • 39
  • 2
  • This [answer](http://stackoverflow.com/questions/21807872/making-cub-blockradixsort-on-chip-entirely/22074209#22074209) may be of interest. – Robert Crovella Apr 24 '17 at 16:34
  • Wow this static cast looks pretty ugly but as it seems it should work. thank you very much! – John Apr 24 '17 at 18:25

0 Answers0