SQL Screw-ups: Type Mismatches

This is a continuation of the SQL Screw-ups series that stemmed from my Nashville .NET User Group talk on 03/14/2019. Slides and setup details are on the first post in the series.

SQL Server helps us out by performing implicit type conversions when it can, but how helpful is this actually? What you don’t know can hurt you when it comes to type conversion . Let’s discuss the type conversions that you may not know are happening and some tips for avoiding them.

First, let’s establish what type conversion is. Microsoft has really good documentation on type conversions, so I’ll point you there. Basically, you can compare, assign, and select data that is not the expected type or is not homogenous by converting one side of the data to the type of the other side of data. You’ve likely heard that you can’t compare apples to oranges. However, if you can find a way to convert the oranges to apples, you can then compare the original apples to the orangey apples. The trick is determining whether you can convert oranges to apples, whether it makes sense to do so, and what the performance implications are. As the docs point out, CAST and CONVERT can both be used to convert from one type to another. I prefer CAST since it complies with ISO standards, but CONVERT can be super convenient, so I don’t rule it out altogether.

One of the most common implicit type conversions that I’ve seen is char to varchar and back. SQL Server largely doesn’t care about comparisons and assignments between these two types as long as you don’t exceed the maximum size of the column. The same goes for varchar and nvarchar. So why bother to know what types you’re dealing with if it’s all strings?

In the following query, City is nvarchar, and ‘Nashville’ is varchar. When you execute the query, SQL Server automatically (or implicitly) converts ‘Nashville’ from varchar to nvarchar without being instructed to do so. Certain types have precedence over other types. This is how SQL Server determines which type wins. nvarchar has higher precedence than varchar, so nvarchar wins.

SELECT *
FROM Person.Address
WHERE City = 'Nashville'

It’s great that SQL Server takes care of the conversion for us! Or is it? The fact is that these implicit conversions are a performance hit. Yes, it takes resources to do the conversion. Until you experience performance problems in production, you may not see the need to address this issue, but I encourage you to write better SQL by making sure your types match organically, rather than by conversion. So how do we solve this problem? Maybe the first question should be how do we know there is a type mismatch in the above query?

I suppose the answer is to learn more about SQL Server. By default, ‘Nashville’ is varchar because it is surrounded by single quotes. There is no way for you to know that other than just knowing it. Fortunately, it’s pretty common knowledge. You can easily change your query so that it is nvarchar by preceding it with an N. This is the resulting query.

SELECT *
FROM Person.Address
WHERE City = N'Nashville'

Great, that was easy! But how do we know that City is nvarchar and not char or varchar? We simply query the database for the schema.

SELECT ISC.TABLE_NAME
      ,ISC.COLUMN_NAME
      ,ISC.DATA_TYPE
      ,ISC.CHARACTER_MAXIMUM_LENGTH AS MAX_LEN
      ,ISC.IS_NULLABLE
      ,ISC.NUMERIC_PRECISION
      ,ISC.NUMERIC_SCALE
      ,ISC.COLUMN_DEFAULT AS Col_Default
FROM INFORMATION_SCHEMA.COLUMNS ISC
WHERE ISC.TABLE_NAME = 'Address'
ORDER BY ISC.TABLE_NAME
        ,ISC.ORDINAL_POSITION

The above query tells us most of the information we need to know about all the columns in the Address table, including the type for the City column. This is a stripped down version of a more complex query that I use often. I left the ISC alias just because you’ll likely want to extend the query, and I’m nice like that.

So what else? Check out the precedence table in the Microsoft docs. You can likely find other implicit conversions that are happening in your queries without your knowledge. You can also have explicit conversions, but those are easy to find since you can search for CAST and CONVERT in your queries. In some cases, you can simply change the syntax to reduce the need for conversions. In some cases, you can change the parameter type in your parameterized query so it matches the data type in the column. You may need to alter column definitions so columns that store the same logical type of information have the same physical type defined. In some cases, you may not be able to do anything about it, and that’s ok.

Depending on your backlog, you may not even want to go looking for implicit conversions in your queries. For existing code, this is an optimization, so it likely doesn’t make sense to change it if you don’t need to squeeze more performance out of your queries. For new code, it likely makes sense to write it well the first time so you don’t have to come back and optimize it later. It’s usually better to be proactive when you can and write better code up front than to be reactive and have fix the code that you rushed in at the last minute.