I am trying to refactor a OpenMP-based program and encountered a terrible scalability issue. The following (obviously not very meaningful) OpenMP program seems to reproduce the problem. Of course, the tiny sample code can be rewritten as a nested for-loop and using I am trying to refactor a OpenMP-based program