This was originally sent to me from a friend who works on massively parallel medical imaging software: I thought I'd take a little time out of my day to rail, once again, against the incompetency of those [software developers at Microsoft]. Consider the following innocent looking bit of code: #include <omp.h> #include <vector> int main(int argc, char* argv[]) { #pragma omp parallel { std::vector<int> A(1024 * 1024); } } For the OpenMP-uneducated, the inner code block will be executed in parallel by one thread per CPU on your system. In my case that is 8 threads (dual quad-core). If you run this bit of code in VTune and look at which black hole your clock-cycles disappear down, you'll find an unusually large number of them being gobbled up by "ntoskrnl.exe". And, if you dive down into this file, you'll find that a good portion of those cycles are attributable to a kernel function named ExfAcquirePushLockExclusive(). Wha