SQL Screw-ups: Scalar-Valued Functions

This is a continuation of the SQL Screw-ups series that stemmed from my Nashville .NET User Group talk on 03/14/2019. Slides and setup details are on the first post in the series.

Imagine you need to calculate the extended price for records in the Sales.SalesOrderDetail table. A simple way to get the extended price for an order line item is to multiple the order quantity by the unit price.

Here is a query of order details for sales order 43659:

SELECT SalesOrderDetailID as Line
      ,OrderQty
      ,ProductID
      ,UnitPrice
      ,OrderQty * UnitPrice as Extended
FROM Sales.SalesOrderDetail
WHERE SalesOrderID = 43659
ORDER BY SalesOrderID
        ,SalesOrderDetailID

This works very well, as you can see from the results:

Line OrderQty ProductID UnitPrice   Extended 
---- -------- --------- ----------- ---------
1    1        776       2024.994    2024.994
2    3        777       2024.994    6074.982
3    1        778       2024.994    2024.994
4    1        771       2039.994    2039.994
5    1        772       2039.994    2039.994
6    2        773       2039.994    4079.988
7    1        774       2039.994    2039.994
8    3        714         28.8404     86.5212
9    1        716         28.8404     28.8404
10   6        709          5.70       34.20
11   2        712          5.1865     10.373
12   4        711         20.1865     80.746

So, you write queries for a couple batch processes, a few reports, and a view or two. They all calculate extended price the same way. Then, your requirements change. The extended price must now be rounded to two decimal places. You set out to find all the queries you wrote that calculates “Extended”, or did you call it “ExtPrc” because you were in a hurry? Never mind that, you’re sure you always wrote “OrderQty * UnitPrice” … or did you miss a space in one of the queries?

This is a simple use-case for a user-defined function in SQL Server. In this case, you could write a scalar-valued function to calculate the extended price. Then, you only have to change the calculation in one place if it ever changes again.

CREATE FUNCTION Sales.CalculateExtendedFromQuantityUnitPrice
(
    @Quantity int
   ,@UnitPrice decimal(12,6)
)
RETURNS money
AS
BEGIN
  DECLARE @Extended money;
  SELECT @Extended = ROUND(@Quantity * @UnitPrice, 2);
  RETURN @Extended;
END

Now, your original query becomes:

SELECT SalesOrderDetailID
      ,OrderQty
      ,ProductID
      ,UnitPrice
      ,Sales.CalculateExtendedFromQuantityUnitPrice(OrderQty, UnitPrice) as Extended
FROM Sales.SalesOrderDetail
WHERE SalesOrderID = 43659
ORDER BY SalesOrderID
        ,SalesOrderDetailID

You commit the changes and deploy the new version. Clients are pleased.

While you’re refactoring this SQL, you explored other ways you can save time in the future, but you didn’t quite have time to implement the “better” solution that you came up with. You realize that rather than passing the quantity and unit price to your function, you can simply pass in the order ID and the line ID. After all, it’s on the server – it can fetch the quantity and unit price. You think about altering the function, but you want to compare estimated execution plans between the two versions of the function before assuming they’ll execute in the same way. So, you write a new scalar-valued function:

CREATE FUNCTION Sales.CalculateExtendedFromOrderID
(
    @OrderID int
   ,@OrderDetailID int
)
RETURNS money
AS
BEGIN
  DECLARE @Extended money;

  SELECT @Extended = ROUND(OrderQty * UnitPrice, 2)
  FROM Sales.SalesOrderDetail
  WHERE SalesOrderID = @OrderID
    AND SalesOrderDetailID = @OrderDetailID;

  RETURN @Extended;
END

You duplicate your query, replace the function call in the second query to use the new function you wrote, and you request an estimated execution plan from SQL Server. The result is 50%-50%. The functions have the same performance impact, so you go with the single-parameter function.

You test your changes, and everything checks out fine. The new version is deployed to clients, and you go home and enjoy a nice glass of warm milk. Your phone rings, and clients are very unhappy. The system has slowed to a crawl, and no one can do their work. Batch processes and reports time out, but order entry works just fine. You recall that performance was the same for both versions of the function in the execution plan. You can’t imagine what could cause the performance issues.

You decide to run your original query that gets the details of order 43659 against a scrubbed copy of a client’s production data. At this point, you have three versions of the query. One with no function call, one with a call to the function that takes quantity and unit price as parameters, and one with a call to the function that takes order ID as a parameter. All three versions of your query run in less than one millisecond in SSMS. The execution plan indicates that all three queries perform the same.

You decide to run all three queries again, but without filtering to a single order. Query 1 executes in less than one millisecond. Query 2 executes in less than one millisecond. And, finally, query 3 executes in … oh my … query 3 is still executing.

SELECT SalesOrderDetailID as Line
      ,OrderQty
      ,ProductID
      ,UnitPrice
      ,OrderQty * UnitPrice as Extended
FROM Sales.SalesOrderDetail
ORDER BY SalesOrderID
        ,SalesOrderDetailID

SELECT SalesOrderDetailID
      ,OrderQty
      ,ProductID
      ,UnitPrice
      ,Sales.CalculateExtendedFromQuantityUnitPrice(OrderQty, UnitPrice) as Extended
FROM Sales.SalesOrderDetail
ORDER BY SalesOrderID
        ,SalesOrderDetailID

SELECT SalesOrderDetailID
      ,OrderQty
      ,ProductID
      ,UnitPrice
      ,Sales.CalculateExtendedFromOrderID(SalesOrderID, SalesOrderDetailID) as Extended
FROM Sales.SalesOrderDetail
ORDER BY SalesOrderID
        ,SalesOrderDetailID

It turns out that the function with one parameter is querying the SalesOrderDetail table for each row in the SalesOrderDetail table, whereas the function with two parameters does not query additional data. The performance impact was not noticed when querying detail of a single order, and it wasn’t even noticed when querying all orders in your test database. However, your clients have way more orders than your test database. A load test is the only way for QA to find this sort of issue.

But what about the estimated execution plan? The estimated execution plan will consider user-defined functions as 0% cost, which is what makes user-defined functions tricky. Note that Microsoft SQL Server 2017 will actually calculate the relative cost of scalar-valued functions, but your clients are still on SQL Server 2008. You can take one of two approaches to this problem. a) Don’t write heavy scalar-valued functions. b) Don’t use scalar-valued functions for queries that return many rows. I choose option a. My goal is to never query data inside a scalar-valued function. That way I can use the function wherever I need it without worrying.

A more complex version of this scenario actually happened with some new features at my work. The function was much more complex, queried multiple tables, and called nested scalar-valued functions (which also queried multiple tables and called nested scalar-valued functions … and so on and so forth). I think it went four levels deep. Our solution was to build the guts of the functions into a derived table that we join to the main query that originally called the scalar-valued function. The resulting derived table is over 300 lines of SQL.

Be sure to check next week’s post for a lesson in how not to use stored procedures!

1 thought on “SQL Screw-ups: Scalar-Valued Functions

  1. Pingback: SQL Screw-ups – Evan Smith

Comments are closed.