Skip to main content

Binary Sort Bug?

I was forwarded a link today from a friend that explains a bug in a binary sort implementation.

Here is the blog entry

First it's important to note that the bug is not in the binary sort algorithm, but just in the short sighted implementations of binary sort that pervade the industry. When binary sort was first proposed as an implementation, we were using 8-bit processors and the only big iron systems out there were multi-million dollar systems that could barely process a million records, let alone 2.2 billion.

The bug, as pointed out in the blog entry, is in the calculation of the pivot. The implementations use (high+low)/2 as their pivot, which then fails when you are sorting billions of records because high+low quickly overflows.

The real simple solution is to use the arithmetic lessons we learned in the 3rd grade: pivot = high/2 + low/2. This will work because you're not adding the two out of range values together, so you won't have an arithmetic overflow during your pivot calculation. The easier calculation is "pivot = high >> 1 + low >> 1".

The real solution is to use a 64-bit system and use long ints (QWORD) for your index variables intead of DWORDS (32-bit). Any other solution is just a band-aid that will eventually experience an overflow again.

Popular posts from this blog

Clustered Foolishness

I had morning coffee with a well respected friend of mine recently. Aside from chatting about the usual wifery and family, we touched on the subject of clustered indices and SQL Server performance. A common misconception in the software industry is that a clustered index will make your database queries faster. In fact, most cases will demonstrate the polar opposite of this assumption. The reason for this misconception is a misunderstanding of how the clustered index works in any database server. A clustered index is a node clustering of records that share a common index value. When you decide on an index strategy for your data, you must consider the range of data to be indexed. Remember back to your data structures classes and what you were taught about hashtable optimizations. A hashtable, which is another way of saying a database index, is just a table of N values that organizes a set of M records in quickly accessible lists that are of order L, where L is significantly less than M. ...

Deadly Information

Remember back to 2006 when a young girl killed herself [1] , [4] after being tricked and harassed by a faux boy she found on the Web using MySpace. The trial against the faux boy, an adult woman (Lori Drew), did not result in prosecution for the death of Megan, much to the dismay of many.  Yet, today we read about another trial where someone is being accused of second degree murder because they may have mentioned something slanderous about another person who was later killed by a hit man [2] . In this case, though, the person on trial is a former FBI agent who was working deep cover to infiltrate organized crime. In both cases, someone released information to third parties that resulted in the death of another person.  Neither defendant in either of these cases actually committed the act of murder, though. In the case of the FBI agent, though, the murder charge is being taken seriously. Yet, in the MySpace slander case, the murder charge was not taken seriously. How are t...

Faster Climate Change

CNN reports that a WWF study has found that global climate change is happening faster than predicted in 2007 and that there will not be any arctic ice by 2013, or 2040. [1] Then it goes on to say that global sea level will increase by 1.08 meters by the end of the century, which is 2100, 92 years from now. Quite honestly, nobody really cares what is going to happen to the planet in 98 years. Why? Because in 98 years we (as humans) will either: (1) Obliterate ourselves because God told us to do it. (2) Eat eachother because there will no longer be any land available to grow crops and sustain living quarters for our 50 billion people. (3) Suffocate because our planet will no longer smell nice thanks to 50 billion people producing lots of solid waste in our oceans. (4) Leave the planet because there will no longer be enough fresh water to sustain our lives. Wait a minute. Consider (4) for a moment. Where can we get an abundance of fresh water TODAY? Anyone? Yeah, the arctic! It's goin...