This is a continuation of the SQL Screw-ups series that stemmed from my Nashville .NET User Group talk on 03/14/2019. Slides and setup details are on the first post in the series.
The definition of null is unknown according to Microsoft documentation. That is, null represents an unknown value. It took an embarrassingly long time for me to catch a common mistake I was sometimes making regarding null. Here’s the scoop.
Say you have a column named “Comment” that allows nulls. You want to find rows that have a null comment, so you write the following query.
SELECT Comment, * FROM Sales.SalesOrderHeader WHERE Comment = NULL;
Your query returns 0 rows. To double check, you write a query for the orders that do have a comment.
SELECT Comment, * FROM Sales.SalesOrderHeader WHERE Comment <> NULL;
Your query returns 0 rows. This doesn’t seem right because your test database contains thousands of orders. One of these queries should return some results! Do note that all orders in the AdventureWorks2014 database have null comments. You can update some orders with non-null comments like this:
UPDATE Sales.SalesOrderHeader SET Comment = '' WHERE SalesOrderID BETWEEN 43850 AND 43859 UPDATE Sales.SalesOrderHeader SET Comment = 'Thank you for subscribing to my blog!' WHERE SalesOrderID BETWEEN 43870 AND 43879
It should not surprise you that both queries return 0 rows if you have had much professional experience with SQL. You know that null is not = to any value. You also know that null is not <> to any value. Read this another way: null is not equal to any value, and null is not not equal to any value. Since null represents a unknown value, the result of comparing null to any value is unknown. Take the C# conditional
count == 5
and tell me the result. Actually pause right here and post your answer as a comment on this post before reading on.
The answer is, “I don’t know.” In the context of this post, the value of ‘count’ is unknown. I’m not saying that ‘count’ is null – I’m saying that you don’t know the value of ‘count’ because I haven’t told you the value of ‘count’. There is literally no way to compare a known value to an unknown value and arrive at a result other than ‘unknown’.
But what about comparing null to itself. If two columns contain null, they must be equal. Nope! Using my previous C# conditional example, evaluate this one and post your answer in the comments:
count == upperBound
You don’t know the value of ‘count’, and you don’t know the value of ‘upperBound’. How can you possibly tell me anything about either of their values or how their values compare to each other? You can’t. Likewise, SQL Server cannot compare values that it doesn’t know.
The way to find orders with a null comment is:
SELECT Comment, * FROM Sales.SalesOrderHeader WHERE Comment IS NULL
A non-null comment:
SELECT Comment, * FROM Sales.SalesOrderHeader WHERE Comment IS NOT NULL
Technically, an order with a null comment means the order could have a comment, but we either don’t know if it has a comment or we don’t know what the comment is. In the business that uses our database, though, we know that the salesperson will enter the order comment if there is one. In this case, in our domain, for our app, we know that a null comment and an empty string comment mean the same thing. SQL Server doesn’t know that they are logically the same thing for this column, in our domain, in our app. There are a couple solutions to writing simple queries that let SQL Server know that null and ” should mean the same thing for the comment column.
You can write queries that treat null the same as ”.
SELECT Comment, * FROM Sales.SalesOrderHeader WHERE ISNULL(Comment, '') = '' SELECT Comment, * FROM Sales.SalesOrderHeader WHERE ISNULL(Comment, '') <> ''
My preference, however, is to not even allow nulls in the column. Since null and ” mean the same thing, they are ambiguous. Ambiguity is no friend of a developer. SQL Server will tell you real quick if you try to filter or sort on an ambiguous column. But it doesn’t know that null and ” are ambiguous values for the comment column. My solution is to update all the nulls in the comment column to ”, alter the comment column to not allow nulls, and add a default constraint to the comment column with a value of ”. Now, you can be sure the order comment is an empty string (or white space) if there is no comment, and you don’t have to worry about nulls.
This isn’t the end of the story, though. Say we take the first approach of writing queries that treat null like ”. We still allow nulls in the column. If my application code automatically generates orders as part of a process, and these auto-generated orders are always created with a comment of ‘System-generated Order’, then I might filter those orders out of certain queries with the following predicate:
WHERE Comment <> 'System-generated Order'
Ok, this would be a bad practice for reasons beyond the scope of this post, but humor me and go with it for now. This predicate filters out system-generated orders and orders with a null comment. Wha?! But how?! Well, null is not <> to any value. This also applies to IN and NOT IN and any other situation in which you compare null. The proper predicate would be:
WHERE ISNULL(Comment, '') <> 'System-generated Order'
So that’s it. My mistake wasn’t that I didn’t know how to get the nulls and the not nulls. That is a pretty easy one to sleuth out when neither query returns results. My mistake was that I didn’t consider the same rule when comparing column values to a literal value.
I’m actually interested in the number of people who have messed this one up or who are still messing it up. I still see it quite commonly in the day-to-day, so I wonder if it is just me and the people who are affected by my aura, or if it’s common across the industry. Feel free to comment to let me and others know that you do or don’t mess this up. Oh, and don’t miss next week’s screw-up on outer join filters!