Skip to main content

Crazy Parallel Madness

This was originally sent to me from a friend who works on massively parallel medical imaging software:

I thought I'd take a little time out of my day to rail, once again, against the incompetency of those [software developers at Microsoft]. Consider the following innocent looking bit of code:

  #include <omp.h>
  #include <vector>

  int main(int argc, char* argv[])
  {
    #pragma omp parallel
    {
      std::vector<int> A(1024 * 1024);
    }
  }

For the OpenMP-uneducated, the inner code block will be executed
in parallel by one thread per CPU on your system. In my case that is 8 threads (dual quad-core). If you run this bit of code in VTune and look at which black hole your clock-cycles disappear down, you'll find an unusually large number of them being gobbled up by "ntoskrnl.exe". And, if you dive down into this file, you'll find that a good portion of those cycles are attributable to a kernel function named ExfAcquirePushLockExclusive().

What happens in the above code segment? Eight threads are created, each running on a separate core. All eight proceed to allocate and then zero-fill 4MB worth of memory. The zero-fill occurs in this case because std::vector always initializes its contents. Because Microsoft writes their software for the [average consumer] who are loath to spend the $25 it would take to outfit their system with more than 256 MB of RAM, the NT kernel conveniently doesn't assign you any physical memory when you allocate the 4MB array. Instead it waits until you actually write to a page in that array. Our code segment, then, is actually eight threads executing:

  Allocate 4MB
  Loop from 1 to 1024
      Page fault resulting in the allocation of one page (4KB) of physical memory
      Write 1024 zeroes to that page

The coup de gras is that those [Microsoft developers] decided it would be just fine if each page fault required the locking of some sort of internal kernel structure that is shared between all the cores. Don't know exactly what because the details of the kernel are, of course, hidden from my prying eyes. But I do know the end result - massive lock contention and performance that sucks ass.

Now the above example is obviously contrived. But Bill spent a substantial bit of time in November digging into why, when optimizing the 3D volume loading/decompression of our software, he kept seeing a good 30% of the CPU cycles swallowed up by this particular black hole. So this particular issue is not simply academic. I'm writing about it now because one of my colleagues just ran into a slightly different manifestation of exactly the same problem. His trials have freshly aggravated my own wound.

This solution, while simple in execution, is insane in its necessity. Whenever I have a significantly sized data structure, or data structures, which is to be filled rapidly by multiple concurrent threads, I must, after allocation, perform what I've coined a "page touching" operation on it. This is exactly what the name implies… I have a single thread march over the entire extent of the memory, at page-sized intervals, and write a single zero value into each page. After the page touching, my parallel algorithm can proceed to fill the data structure without the performance loss that results from the lock contention.

Popular posts from this blog

Host Species Barrier to Influenza Virus Infections

The title of this entry was taken from a paper written by Thijs Kuiken, Edward C. Holmes, John McCauley, Guus F. Rimmelzwaan, Catherine S. Williams, and Bryan T. Grenfell. This paper appeared in SCIENCE Volume 312, pp 394 – 397. If you have the gumption to really know how viral infections cross the species barrier, then this is the paper for you. It’s written as a “perspective” rather than as a technical publication, which means there isn’t a bunch of jargon in it. You can also contact the authors of the paper at t.kuiken@erasmusmc.nl . A particularly interesting quote taken from the paper: “It is well established that, as the proportion of susceptibles in the population, s , drops (as individuals become infected, then recover), the number of secondary cases per infection, R , also drops: R = s * R0 . If R is less than 1, as is currently the case for H5N1 virus in humans, an infection will not cause a major epidemic.” (pg. 312) The value, R0 , “is the number of secondary cases produced...

UNTITLED

I like people who can talk straight and take it standing. There's not enough straight talkers in the world, and certainly not enough in the USA. It seems as though our opinions are illegal if they are not in-line with the normative line of acceptance. That truly seems Orwellian to me. That said, though, this blog is more about race and ignorance than about the Thought Police. There does not exist a more sensitive and inflammatory topic than race . You should read the Wikipedia entry on race as it pertains to humans. It may enlighten you somewhat. The USA has two presidential candidates in its 2008 Presidential race. One of them is sort of a pinkish-white color, and the other is something of a brown color. The pinkish-white one has an American heritage with clear ancestry back to Northern Europeans. The brownish colored one has an Indonesian heritage with some suspected ancestry back to Africa, although he also has European ancestry. Call them whatever race you want. Where I have ...

The Spinning Brain

Intuition is a phenomenon of the biological brain that doesn't have any physical explanation. Many people experience intuition with varying degrees of success. There are a variety of theories regarding intuition [1] and some people regard intuition with much caution [2] . Yet, I am happily in the camp that has learned to respect my intuition as it has proven time and time again to be correct. Recently, though, I'd been thinking about intuition and soothsaying . There are many cases of people who claim to see the future, whatever that might be. Maybe there is something to be said about this mystical phenomenon. Maybe there is a real physical process at work that we just haven't thought of yet. To this end, I am proposing a theory about human intuition. This theory, though requires some background in quantum mechanics . Specifically, quantum entanglement . I'm not the only person who has theorized about quantum entanglement and its role in biological congnition and th...