It seems many people have similar issues with OpenMP but I couldn't find a solution to my problem.
I'm using a simple code:
PROGRAM Parallel_Hello_World
USE OMP_LIB
integer :: i
real :: start, finish, B
call cpu_time(start)
!$OMP PARALLEL
!$OMP DO
DO i = 1,2000000
B = cos(cos(cos(sin(cos(sqrt(sqrt(sqrt( cos( real(i) ) ))))))))
B = cos(cos(cos(sin(cos(sqrt(sqrt(sqrt( cos( B ) ))))))))
B = cos(cos(cos(sin(cos(sqrt(sqrt(sqrt( cos( B ) ))))))))
END DO
!$OMP END DO
!$OMP END PARALLEL
call cpu_time(finish)
print '("Time = ",f6.3," seconds.")',finish-start
END
I'm confused where the overhead is coming from. Even when I increase the amount of sin/cos/sqrt operations, lower threads always wins.
export OMP_NUM_THREADS=1 Time = 1.58 seconds. (average)
export OMP_NUM_THREADS=8 Time = 2.376 seconds. (average)
Compile: ifort para.f90 -o para.exe -qopenmp -O2
the Intel compiler is from 2020.