1

I am trying to parallelize the following snippet of code using OpenMP. The portion shown is basically a version of the Thomas matrix algorithm (matrix inversion) for implicit solver in fluid flow/turbulence simulations. I have stripped it down from its exact form to make the question easily understandable.

REAL(KIND=DP), INTENT(INOUT), DIMENSION(1:10) :: A, C_old, C_new, D
INTEGER :: K

!$OMP PARALLEL DO SCHEDULE(STATIC) NUM_THREADS(3)&
!$OMP SHARED(A, C_old, C_new, D)

      DO K = 2,9
         C_new(K) =  C_old(K)/(A(K)*C_new(K-1))
      END DO

!$OMP END PARALLEL

C(1) = C(10) = 0 because they are at the fixed boundaries (top and the bottom wall of a channel). Hence there is no need to update them.

As it is evident, the calculation of C(2) needs the information of C(1) and the calculation of C(3) needs the information of the MODIFIED C(2) and so on. Or in other words, it is a forward sweep (the algorithm is here https://en.wikipedia.org/wiki/Tridiagonal_matrix_algorithm). Hence, for three (03) number of threads, I want the distribution of iterations across the threads to be [1,2,3,4], [4,5,6,7], [7,8,9,10]. But if I just do a SCHEDULE STATIC, I would not be able to control this and the distribution might be [1,2,3], [4,5,6], [7,8,9,10] across 3 threads. This would give erroneous results as the information of the grid points 4 and 7 are not passed on to the other threads.

Please let me know if there is a way where I can force the indices to be solved by a certain thread without any race? Say,

thread #1 solve indices [1,2,3,4]
thread #2 solve indices [4,5,6,7]
etc.

RTh
  • 107
  • 1
  • 1
  • 4
  • AFAIK, it is not possible, unless you use tests on the id of threads. And it will not solve your problem. You will have a race on (for instance) C(4) and hence a non deterministic behavior. And C(5) will (probably) use the old value of C(4), not the one computed at iteration 4. This kind of recurrence requires a complex recursive rewrite of the algorithm. See for instance https://stackoverflow.com/questions/18719257/parallel-cumulative-prefix-sums-in-openmp-communicating-values-between-thread – Alain Merigot Nov 04 '19 at 12:17
  • You could use the "ordered" clause on the loop to ensure that iterations occur in order, but, fundamentally, there is no parallelism here, and the loop has few iterations and each iteration has only a small amount of work, so it seems extremely unlikely that parallelising at this level will ever be beneficial. If this is repeated, then you can use tasks and task dependencies to achieve parallelism between the outer iterations (assuming that C_NEW becomes C_OLD next time, as soon as C_OLD(1) here is assigned, you could start work in the next outer iteration). – Jim Cownie Nov 05 '19 at 09:27

0 Answers0