Last updated
Last updated
If you have to compare string values on a regular basis, and especially if you have to compare more than one column at a time, instead of comparing the columns directly, hash the columns first. Hashing creates a string based on the encryption algorithm you select. You can pass in concatenated string values. The ultimate benefit is that all the algorithms generate a value that is a set number of characters. This comes in handy for storing this value when it comes time for loading junk dimensions into data warehouses.
You cannot use HASHBYTES() on NVARCHAR(MAX) in SQL Server. HASHBYTES has a 4,000-character limit for NVARCHAR. Technically, you can get around this programmatically, but if your data is longer than 4,000 characters, I’d suggest looking for another way to identify the record.
A Walk Through Of The Problem And Solution
Here is a demonstration that shows no matter how big the string is, HASHBYTES() algos always resolve to the same number of characters.
In [ ]:
Now let us see how you can use this to compare string values. This assumes the database is using a case sensitive collation.
As you can see, WalMart and walmart do not hash the same so they are different values. This is a toy example, but you can see how comparing a hash is a lot more efficient than comparing the raw string values.
In [ ]: