I want to use the Hardware Performance Counters that come with the Intel and AMD x86_64 multicore processors to calculate the number of retired stores by a program. I want each thread to calculate its retired stores separately. Can it be done? And if so, how in C/C++?I want to use the Hardware Performance Counters