Lesson 64. Using Pipe To Hash Multiple Columns For Matching

When using the HASHBYTES() function in SQL Server to hash multiple columns for matching, use CONCAT() and separate values with a pipe.

Usually, the values in columns are disparate enough that you really do not have to worry. For example, it was years before I actually discovered an edge case. However, I did discover at least one scenario where concatenating the columns was not enough to develop a unique record. This occurred in a table with few columns and small amounts of data.

A better approach is to just CONCAT() columns with a pipe between values. Since this character is rarely used, it lowers the probability of having different values hash the same.

Examples

A Walk Through Of The Problem And Solution

Here is the edge case scenario that got me into trouble.

In [ ]:

USE demo

DECLARE @SampleStageTable AS TABLE(ID INT, VALUE1 NVARCHAR(10), VALUE2 NVARCHAR(10))

INSERT INTO @SampleStageTable(ID, VALUE1, VALUE2)
SELECT 1, '012','345'
UNION
SELECT 2, '01','2345'


SELECT * FROM @SampleStageTable

As you can see, record 1 and record 2 are two entirely different records. But look what happens when we hash the row!

In [ ]:

USE demo

DECLARE @SampleStageTable AS TABLE(ID INT, VALUE1 NVARCHAR(10), VALUE2 NVARCHAR(10))

INSERT INTO @SampleStageTable(ID, VALUE1, VALUE2)
SELECT 1, '012','345'
UNION
SELECT 2, '01','2345'


SELECT ID, VALUE1, VALUE2, HASHBYTES('MD5', CONCAT(VALUE1, VALUE2)) AS ROWHASH
FROM @SampleStageTable

UH OH! Two different records hashing the same? That's not cool! Let's fix this. This is why we use pipe!

As you can see, the addition of the pipe makes it so the two records can be clearly delineated.

In [ ]:

USE demo

DECLARE @SampleStageTable AS TABLE(ID INT, VALUE1 NVARCHAR(10), VALUE2 NVARCHAR(10))

INSERT INTO @SampleStageTable(ID, VALUE1, VALUE2)
SELECT 1, '012','345'
UNION
SELECT 2, '01','2345'


SELECT ID, VALUE1, VALUE2, HASHBYTES('MD5', CONCAT(VALUE1,'|', VALUE2)) AS ROWHASH
FROM @SampleStageTable

Last updated