I am trying to parallelize the following snippet of code using OpenMP. The portion shown is basically a version of the Thomas matrix algorithm (matrix inversion) for implicit solver in fluid flow/turbulence simulations. I have stripped it down from its exact form to make the question easily understandable.
REAL(KIND=DP), INTENT(INOUT), DIMENSION(1:10) :: A, C_old, C_new, D
INTEGER :: K
!$OMP PARALLEL DO SCHEDULE(STATIC) NUM_THREADS(3)&
!$OMP SHARED(A, C_old, C_new, D)
DO K = 2,9
C_new(K) = C_old(K)/(A(K)*C_new(K-1))
END DO
!$OMP END PARALLEL
C(1) = C(10) = 0 because they are at the fixed boundaries (top and the bottom wall of a channel). Hence there is no need to update them.
As it is evident, the calculation of C(2) needs the information of C(1) and the calculation of C(3) needs the information of the MODIFIED C(2) and so on. Or in other words, it is a forward sweep (the algorithm is here https://en.wikipedia.org/wiki/Tridiagonal_matrix_algorithm). Hence, for three (03) number of threads, I want the distribution of iterations across the threads to be [1,2,3,4], [4,5,6,7], [7,8,9,10]. But if I just do a SCHEDULE STATIC, I would not be able to control this and the distribution might be [1,2,3], [4,5,6], [7,8,9,10] across 3 threads. This would give erroneous results as the information of the grid points 4 and 7 are not passed on to the other threads.
Please let me know if there is a way where I can force the indices to be solved by a certain thread without any race? Say,
thread #1 solve indices [1,2,3,4]etc.
thread #2 solve indices [4,5,6,7]