Last updated
Last updated
When using the HASHBYTES() function in SQL Server to hash multiple columns for matching, use CONCAT() and separate values with a pipe.
Usually, the values in columns are disparate enough that you really do not have to worry. For example, it was years before I actually discovered an edge case. However, I did discover at least one scenario where concatenating the columns was not enough to develop a unique record. This occurred in a table with few columns and small amounts of data.
A better approach is to just CONCAT() columns with a pipe between values. Since this character is rarely used, it lowers the probability of having different values hash the same.
A Walk Through Of The Problem And Solution
Here is the edge case scenario that got me into trouble.
In [ ]:
As you can see, record 1 and record 2 are two entirely different records. But look what happens when we hash the row!
In [ ]:
UH OH! Two different records hashing the same? That's not cool! Let's fix this. This is why we use pipe!
As you can see, the addition of the pipe makes it so the two records can be clearly delineated.
In [ ]: