I have blocks of memory that can be quite large (larger than the L2 cache), and sometimes I must set them to all zero. memset is good in a serial code, but what about parallel code ? Has somebody experience if calling memset from concurrent threads actually speed things up for large arrays ? Or even using simple openmp parallel for loops ?I have blocks of memory that can be quite large