Lesson 63. Using HASHBYTES() To Compare Character Strings

If you have to compare string values on a regular basis, and especially if you have to compare more than one column at a time, instead of comparing the columns directly, hash the columns first. Hashing creates a string based on the encryption algorithm you select. You can pass in concatenated string values. The ultimate benefit is that all the algorithms generate a value that is a set number of characters. This comes in handy for storing this value when it comes time for loading junk dimensions into data warehouses.

You cannot use HASHBYTES() on NVARCHAR(MAX) in SQL Server. HASHBYTES has a 4,000-character limit for NVARCHAR. Technically, you can get around this programmatically, but if your data is longer than 4,000 characters, I’d suggest looking for another way to identify the record.

Examples

A Walk Through Of The Problem And Solution

Here is a demonstration that shows no matter how big the string is, HASHBYTES() algos always resolve to the same number of characters.

In [ ]:

USE demo

DECLARE @Statement1 NVARCHAR(255)
DECLARE @Statement2 NVARCHAR(255)

SET @Statement1 = 'Army And Navy Play For Second'
SET @Statement2 = 'Rock Chalk Jayhawk'

PRINT LEN(@Statement1)
PRINT LEN(@Statement2)

PRINT HASHBYTES('MD5', @Statement1)
PRINT HASHBYTES('MD5', @Statement2)

PRINT LEN(HASHBYTES('MD5', @Statement1))
PRINT LEN(HASHBYTES('MD5', @Statement2))

Now let us see how you can use this to compare string values. This assumes the database is using a case sensitive collation.

As you can see, WalMart and walmart do not hash the same so they are different values. This is a toy example, but you can see how comparing a hash is a lot more efficient than comparing the raw string values.

In [ ]:

USE demo

DECLARE @Statement1 NVARCHAR(255)
DECLARE @Statement2 NVARCHAR(255)

SET @Statement1 = 'WalMart'
SET @Statement2 = 'walmart'


PRINT HASHBYTES('MD5', @Statement1) --0xDE2E98EE55B1B249B711300DE7047C75
PRINT HASHBYTES('MD5', @Statement2) --0xC48604C9A656E09D87E99B820499D430

PreviousLesson 62. A Deep Dive On How The Where Clause Functions NextLesson 64. Using Pipe To Hash Multiple Columns For Matching

Last updated 3 years ago

Examples

A Walk Through Of The Problem And Solution

Here is a demonstration that shows no matter how big the string is, HASHBYTES() algos always resolve to the same number of characters.

In [ ]:

USE demo

DECLARE @Statement1 NVARCHAR(255)
DECLARE @Statement2 NVARCHAR(255)

SET @Statement1 = 'Army And Navy Play For Second'
SET @Statement2 = 'Rock Chalk Jayhawk'

PRINT LEN(@Statement1)
PRINT LEN(@Statement2)

PRINT HASHBYTES('MD5', @Statement1)
PRINT HASHBYTES('MD5', @Statement2)

PRINT LEN(HASHBYTES('MD5', @Statement1))
PRINT LEN(HASHBYTES('MD5', @Statement2))

Now let us see how you can use this to compare string values. This assumes the database is using a case sensitive collation.

In [ ]:

USE demo

DECLARE @Statement1 NVARCHAR(255)
DECLARE @Statement2 NVARCHAR(255)

SET @Statement1 = 'WalMart'
SET @Statement2 = 'walmart'


PRINT HASHBYTES('MD5', @Statement1) --0xDE2E98EE55B1B249B711300DE7047C75
PRINT HASHBYTES('MD5', @Statement2) --0xC48604C9A656E09D87E99B820499D430