I have been trying to get an idea of the impact of having an array in L1 cache versus memory by timing a routine that scales and sums the elements of an array using the following code (I am aware that I should just scale the result by 'a' at the end; the point is to do both a multiply and an add within the loop - so far, the compiler hasn't figured out to factor out 'a'): I have been trying to get an idea of the impact