Tutorials
Practical T-SQL
Practical T-SQL
  • Practical T-SQL Pocket Guide For Beginners
  • Preface
    • Section I. A Note From The Author
    • Section II. Tutorial Overview
    • Section III. Running The Examples
    • Section IV. How To Report An Issue
    • Section V. Join The MSU Community On Discord
    • Section VI. Supplementary Material
  • Language Basics
    • Lesson 1. Obligatory Hello World
    • Lesson 2. Code Comments With T-SQL
    • Lesson 3. Basic Syntax
    • Lesson 4. Your First Query
    • Lesson 5. Filtering Data
    • Lesson 6. Sorting Data
    • Lesson 7. Complex Data Filtering
    • Lesson 8. Aliases
    • Lesson 9. String Functions
    • Lesson 10. Creating New Columns From Existing Data (Calculated Fields)
    • Lesson 11. Displaying Data Based On Conditions (Case Statement)
    • Lesson 12. Aggregate Functions
    • Lesson 13. Grouping And Summarizing Data
    • Lesson 14. Querying More Than One Table
    • Lesson 15. Combining Queries
    • Lesson 16. Subqueries
    • Lesson 17. Creating Data
    • Lesson 18. Updating Data
    • Lesson 19. Deleting Data
    • Lesson 20. Common Table Expressions (CTEs)
    • Lesson 21. Derived Tables
    • Lesson 22. Putting It All Together
  • Advanced Topics
    • Lesson 23. Selecting Unique Values
    • Lesson 24. Updating Data With A Join
    • Lesson 25. Data Types
    • Lesson 26. Casting Data Types
    • Lesson 27. Creating Tables
    • Lesson 28. Altering Tables
    • Lesson 29. Dropping Tables
    • Lesson 30. Variables
    • Lesson 31. Controlling Flow
    • Lesson 32. Looping
    • Lesson 33. Error Processing
    • Lesson 34. Temporary Tables
    • Lesson 35. Views
    • Lesson 36. Indexed Views
    • Lesson 37. User Defined Functions
    • Lesson 38. Stored Procedures
    • Lesson 39. BULK INSERT
    • Lesson 40. Loading Tables With MERGE
    • Lesson 41. Partitioning A Dataset
    • Lesson 42. Pivoting Data
    • Lesson 43. Dynamic SQL
    • Lesson 44. Cursors
  • Solutions To Real World Problems
    • Lesson 45. Listing All Tables In A SQL Server Database
    • Lesson 46. Listing All Columns In A SQL Server Database
    • Lesson 47. Pull Records From A Table At Random
    • Lesson 48. A Better Alternative To WITH (NOLOCK)
    • Lesson 49. Boost Performance When Calling A Stored Proc From SSIS
    • Lesson 50. Setting Up Queries For Ablation Testing
    • Lesson 51. Reduce Code And Save Time With Default Column Values
    • Lesson 52. Finding Duplicate Records In A Table
    • Lesson 53. Why You Cannot Have More Than One Clustered Index On A Table
    • Lesson 54. Converting Dates To YYYYMMDD
    • Lesson 55. Sending Notification Emails With T-SQL Without Using Hardcoded Email Addresses
    • Lesson 56. Troubleshooting Long Running Queries
    • Lesson 57. Loading Large CSVs Into Data Warehouse Staging Tables
    • Lesson 58. The Only Bloody Good Reason To Use Cursors
    • Lesson 59. Loading A Type II Slowly Changing Dimension With SQL Merge
    • Lesson 60. A Clearer Explanation Of The Parameters Of The Numeric Data Type
    • Lesson 61. Why You Cannot Join On Null Values
    • Lesson 62. A Deep Dive On How The Where Clause Functions
    • Lesson 63. Using HASHBYTES() To Compare Character Strings
    • Lesson 64. Using Pipe To Hash Multiple Columns For Matching
    • Lesson 65. Why People That Indent Code Drive Me Nuts
    • Lesson 66. How To Rapidly Stand Up A Data Warehouse From Scratch
    • Lesson 67. How To Pivot Data With T-SQL When Columns Are Not Predefined
    • Lesson 68. Prepopulating A Junk Dimension
Powered by GitBook
On this page
  1. Solutions To Real World Problems

Lesson 61. Why You Cannot Join On Null Values

This is a fairly common and insidious error because you might not catch the problem if you are working with a large amount of data. Basically, NULL means unknown and unknown cannot equal unknown, so you cannot join records on null values. The best thing to do is not have NULLs in candidate key fields.

When I was working this problem and mentioned this was weird, another data engineer had this to say:

No, it is not weird. It actually follows from the definitions in the ANSI/ISO standards. The in fixed notation that we use now, was actually a result of an article published by Chris Date in one of the database newsstand magazines of the time (he and I had columns in DBMS and Database Programming and Design, and they were both owned by the same publisher at one point).

The actual syntax was not proposed by Date, by one of the ANSI X3H2 committee members. The rule is that infixed joins are evaluated strictly left to right, and the ON clause associates with the nearest join. If I have a really complicated set of inner joins, I prefer to use the original set-oriented notation (no infixed operators). People that use infixed inner join have a mindset locked into + versus "big Sigma" from mathematics.

Examples

A Walk Through Of The Problem And Solution

My expectation here would be to get back results such that I get all the genders and only matching records for toys. However, when I apply a filter on toys, the dataset has a null value for gender ID in the toys table. My left outer join now behaves as an inner join.

In [ ]:

USE demo

DECLARE @genders TABLE(gender_id INT, gender VARCHAR(20))
DECLARE @toys TABLE(toy_id INT, gender_id INT, toy_name VARCHAR(20))

INSERT INTO @genders(gender_id, gender)
SELECT 1, 'boy'
UNION ALL
SELECT 2, 'girl'
UNION ALL
SELECT 3, 'both'

INSERT INTO @toys(toy_id, gender_id, toy_name)
SELECT 1, 1, 'GI JOE'
UNION ALL
SELECT 2,2, 'My Little Pony'


SELECT g.*
FROM @genders g
LEFT OUTER JOIN @toys t
ON g.gender_id = t.gender_id
WHERE t.toy_name NOT IN(
'My Little Pony'
) 

Here is the correct approach to the problem.

In [ ]:

USE demo

DECLARE @genders TABLE(gender_id INT, gender VARCHAR(20))
DECLARE @toys TABLE(toy_id INT, gender_id INT, toy_name VARCHAR(20))

INSERT INTO @genders(gender_id, gender)
SELECT 1, 'boy'
UNION ALL
SELECT 2, 'girl'
UNION ALL
SELECT 3, 'both'

INSERT INTO @toys(toy_id, gender_id, toy_name)
SELECT 1, 1, 'GI JOE'
UNION ALL
SELECT 2,2, 'My Little Pony'


SELECT g.*,t.toy_name
FROM @genders g
LEFT OUTER JOIN @toys t
ON g.gender_id = t.gender_id and t.toy_name NOT IN
('My Little Pony') 

Here is an approach that filters first by encapsulating the filter logic in a CTE.

In [ ]:

USE demo

;with mycte as (
select * from  @toys WHERE toy_name NOT IN('My Little Pony')) 

SELECT g.*, t.toy_name
FROM @genders g
LEFT OUTER JOIN mycte t
ON g.gender_id = t.gender_id
PreviousLesson 60. A Clearer Explanation Of The Parameters Of The Numeric Data TypeNextLesson 62. A Deep Dive On How The Where Clause Functions

Last updated 3 years ago