SQL Screw-ups: Outer Join Filters

This is a continuation of the SQL Screw-ups series that stemmed from my Nashville .NET User Group talk on 03/14/2019. Slides and setup details are on the first post in the series.

Think about the last time you needed apply additional filtering to a query based on information from some other table. You probably JOINed the additional table or added an EXISTS predicate in the WHERE clause. Let’s discuss ways you might’ve screwed it up.

You receive a request to show customers assigned to a particular salesperson. Easy peasy, you write the following query. I’m not convinced this is an accurate use of the data, but we’ll roll with it because it demonstrates the concept.

SELECT *
FROM Sales.Customer
LEFT OUTER JOIN Sales.Store ON
        Store.BusinessEntityID = Customer.StoreID
WHERE Store.SalesPersonID = 281

This gives you all the customers for the store where salesperson 281 works. Cool! Blog post over? Not quite.

Upon further inspection of the data, you realize that some of the stores do not have a business entity ID set, so the salesperson ID in the result of the outer join is null. At this point, you can refer back to last week’s post on how the meaning of null is actually unknown.

Since the salesperson is technically allowed to serve customers that do not have a store assigned, you’ve arbitrarily limited the reach for any salesperson using the report you’re writing. The simple fix is to account for null for every column introduced as a result of an outer join. Even if the column definition says the column cannot contain null, the fact that it’s possible for no match to found on the join condition means the value can be null (unknown). Here’s your new an improved query.

SELECT *
FROM Sales.Customer
LEFT OUTER JOIN Sales.Store ON
        Store.BusinessEntityID = Customer.StoreID
WHERE ISNULL(Store.SalesPersonID, 281) = 281

An even better option in a lot of cases (depending on your team’s culture, standards, habits, etc) is to not introduce a join if you don’t need to select data from the additional table. I prefer to leave filtering to the where clause and eliminate the chance of a bad join duplicating results.

SELECT *
FROM Sales.Customer
WHERE EXISTS(SELECT *
             FROM Sales.Store
             WHERE Store.BusinessEntityID = Customer.StoreID
               AND ISNULL(Store.SalesPersonID, 281) = 281)

Now, you have a filter that is in the where clause, as expected. I’ve heard really passionate arguments for both ways; what are your thoughts on filtering via inner join conditions?