-1

I was looking at a game of life gpu code and could not understand why is ceil used for

  dim3 cpyBlockSize(BLOCK_SIZE,1,1);

  dim3cpysimulationRowssimulationSize((int) ceil (size/(float) cpyBlockSize.x), 1, 1);
  dim3 cpysimulationColssimulationSize((int) ceil ((size+2) / (float) cpyBlockSize.x), 1, 1);
talonmies
  • 70,661
  • 34
  • 192
  • 269
killgore_bot
  • 61
  • 1
  • 8
  • 2
    `ceil` is rounding up here. It guarantees that there are enough blocks (and therefore enough threads) to cover the entire working set. This is a pretty basic CUDA concept, so you will find many descriptions of this rounding up approach when choosing the number of CUDA blocks to launch. [Here](https://stackoverflow.com/questions/26217294/should-i-check-the-number-of-threads-in-kernel-code/26217725#26217725) is one example write-up. – Robert Crovella Apr 22 '19 at 22:50

1 Answers1

0

Hard to tell without much context but this is my best guess:

Integer division rounds down but it appears the required behavior is to have (size/(float) cpyBlockSize.x) rounded up. So they cast cpyBlockSize.x to a float so the result of the division is a decimal, then rounded that decimal up using ceil to achieve the expected behavior.

For example in most* programming languages (C included):

> print (5/2)
> 2

> print (5/(float)2)
> 2.5

> print (ceil(5/(float)2))
> 3
talonmies
  • 70,661
  • 34
  • 192
  • 269
JamieT
  • 1,177
  • 1
  • 9
  • 19