This is a continuation of the SQL Screw-ups series that stemmed from my Nashville .NET User Group talk on 03/14/2019. Slides and setup details are on the first post in the series.
Imagine you need to calculate the extended price for records in the Sales.SalesOrderDetail table. A simple way to get the extended price for an order line item is to multiple the order quantity by the unit price.
Here is a query of order details for sales order 43659:
SELECT SalesOrderDetailID as Line ,OrderQty ,ProductID ,UnitPrice ,OrderQty * UnitPrice as Extended FROM Sales.SalesOrderDetail WHERE SalesOrderID = 43659 ORDER BY SalesOrderID ,SalesOrderDetailID
This works very well, as you can see from the results:
Line OrderQty ProductID UnitPrice Extended ---- -------- --------- ----------- --------- 1 1 776 2024.994 2024.994 2 3 777 2024.994 6074.982 3 1 778 2024.994 2024.994 4 1 771 2039.994 2039.994 5 1 772 2039.994 2039.994 6 2 773 2039.994 4079.988 7 1 774 2039.994 2039.994 8 3 714 28.8404 86.5212 9 1 716 28.8404 28.8404 10 6 709 5.70 34.20 11 2 712 5.1865 10.373 12 4 711 20.1865 80.746
So, you write queries for a couple batch processes, a few reports, and a view or two. They all calculate extended price the same way. Then, your requirements change. The extended price must now be rounded to two decimal places. You set out to find all the queries you wrote that calculates “Extended”, or did you call it “ExtPrc” because you were in a hurry? Never mind that, you’re sure you always wrote “OrderQty * UnitPrice” … or did you miss a space in one of the queries?
This is a simple use-case for a user-defined function in SQL Server. In this case, you could write a scalar-valued function to calculate the extended price. Then, you only have to change the calculation in one place if it ever changes again.
CREATE FUNCTION Sales.CalculateExtendedFromQuantityUnitPrice ( @Quantity int ,@UnitPrice decimal(12,6) ) RETURNS money AS BEGIN DECLARE @Extended money; SELECT @Extended = ROUND(@Quantity * @UnitPrice, 2); RETURN @Extended; END
Now, your original query becomes:
SELECT SalesOrderDetailID ,OrderQty ,ProductID ,UnitPrice ,Sales.CalculateExtendedFromQuantityUnitPrice(OrderQty, UnitPrice) as Extended FROM Sales.SalesOrderDetail WHERE SalesOrderID = 43659 ORDER BY SalesOrderID ,SalesOrderDetailID
You commit the changes and deploy the new version. Clients are pleased.
While you’re refactoring this SQL, you explored other ways you can save time in the future, but you didn’t quite have time to implement the “better” solution that you came up with. You realize that rather than passing the quantity and unit price to your function, you can simply pass in the order ID and the line ID. After all, it’s on the server – it can fetch the quantity and unit price. You think about altering the function, but you want to compare estimated execution plans between the two versions of the function before assuming they’ll execute in the same way. So, you write a new scalar-valued function:
CREATE FUNCTION Sales.CalculateExtendedFromOrderID ( @OrderID int ,@OrderDetailID int ) RETURNS money AS BEGIN DECLARE @Extended money; SELECT @Extended = ROUND(OrderQty * UnitPrice, 2) FROM Sales.SalesOrderDetail WHERE SalesOrderID = @OrderID AND SalesOrderDetailID = @OrderDetailID; RETURN @Extended; END
You duplicate your query, replace the function call in the second query to use the new function you wrote, and you request an estimated execution plan from SQL Server. The result is 50%-50%. The functions have the same performance impact, so you go with the single-parameter function.
You test your changes, and everything checks out fine. The new version is deployed to clients, and you go home and enjoy a nice glass of warm milk. Your phone rings, and clients are very unhappy. The system has slowed to a crawl, and no one can do their work. Batch processes and reports time out, but order entry works just fine. You recall that performance was the same for both versions of the function in the execution plan. You can’t imagine what could cause the performance issues.
You decide to run your original query that gets the details of order 43659 against a scrubbed copy of a client’s production data. At this point, you have three versions of the query. One with no function call, one with a call to the function that takes quantity and unit price as parameters, and one with a call to the function that takes order ID as a parameter. All three versions of your query run in less than one millisecond in SSMS. The execution plan indicates that all three queries perform the same.
You decide to run all three queries again, but without filtering to a single order. Query 1 executes in less than one millisecond. Query 2 executes in less than one millisecond. And, finally, query 3 executes in … oh my … query 3 is still executing.
SELECT SalesOrderDetailID as Line ,OrderQty ,ProductID ,UnitPrice ,OrderQty * UnitPrice as Extended FROM Sales.SalesOrderDetail ORDER BY SalesOrderID ,SalesOrderDetailID SELECT SalesOrderDetailID ,OrderQty ,ProductID ,UnitPrice ,Sales.CalculateExtendedFromQuantityUnitPrice(OrderQty, UnitPrice) as Extended FROM Sales.SalesOrderDetail ORDER BY SalesOrderID ,SalesOrderDetailID SELECT SalesOrderDetailID ,OrderQty ,ProductID ,UnitPrice ,Sales.CalculateExtendedFromOrderID(SalesOrderID, SalesOrderDetailID) as Extended FROM Sales.SalesOrderDetail ORDER BY SalesOrderID ,SalesOrderDetailID
It turns out that the function with one parameter is querying the SalesOrderDetail table for each row in the SalesOrderDetail table, whereas the function with two parameters does not query additional data. The performance impact was not noticed when querying detail of a single order, and it wasn’t even noticed when querying all orders in your test database. However, your clients have way more orders than your test database. A load test is the only way for QA to find this sort of issue.
But what about the estimated execution plan? The estimated execution plan will consider user-defined functions as 0% cost, which is what makes user-defined functions tricky. Note that Microsoft SQL Server 2017 will actually calculate the relative cost of scalar-valued functions, but your clients are still on SQL Server 2008. You can take one of two approaches to this problem. a) Don’t write heavy scalar-valued functions. b) Don’t use scalar-valued functions for queries that return many rows. I choose option a. My goal is to never query data inside a scalar-valued function. That way I can use the function wherever I need it without worrying.
A more complex version of this scenario actually happened with some new features at my work. The function was much more complex, queried multiple tables, and called nested scalar-valued functions (which also queried multiple tables and called nested scalar-valued functions … and so on and so forth). I think it went four levels deep. Our solution was to build the guts of the functions into a derived table that we join to the main query that originally called the scalar-valued function. The resulting derived table is over 300 lines of SQL.
Be sure to check next week’s post for a lesson in how not to use stored procedures!