I hardly understand what the value given by the multiProcessorCount property represent, due to the fact that I experience difficulties in grasping the CUDA architecture.
I'm sorry if some of the following statements appear to be naive. From what I understood so far, here are the hardware "layers":
- A CUDA processor is a grid of building blocks.
- A building block is composed of two or more streaming multiprocessors.
- A streaming multiprocessor is composed of many streaming processors, also called core.
- A streaming processor is "massively" threaded, meaning that it implements many hardware managed threads. One streaming processor, one core, can really compute only one thread at a time, but it has many "hardware threads" that can load data while waiting for their turn to be computed by the SP.
On the software side:
- A block is composed of threads, and is executed by a streaming multiprocessor
- If one launched more blocks than the number of streaming multiprocessors on the card, I guess blocks wait in some sort of queue, to be executed.
- Software threads are distributed to streaming processors, which distribute them to their hardware threads. And similar to the previous case, if one launched more threads that the streaming processors can handle with their hardware threads, software threads wait in a queue.
In both cases, the max number of threads, and blocks, that it is allowed to launch, is independent from the number of streaming multiprocessors, streaming processors, and hardware threads of each streaming processor, that actually exist on the card. Those notions are software!
Am I at least close to the reality?
With that being said, what does the multiProcessorCount property gives? On my 610M, it says I only have one multiprocessor... Does that mean that I only have one streaming multiprocessor? I would have a building block composed of only one streaming multiprocessor? That seems impossible to me. And that would mean that I can only execute one block at a time! Besides, when the specifications of my card says that I have 48 cuda cores, are they talking about streaming processors?