How necessary are the use of cursors in SQL Server? In a previous article I showed you how you could write your own custom procedures to loop through a collection of objects, such as tables or databases, and execute statements against those objects. That article used cursors exclusively to loop through the objects. Sometimes cursors are necessary, especially when executing stored procedures against separate records in a SQL Server table. However, when it comes to using queries to return result sets, most of the time cursors can be avoided entirely. Today I am going to look at a tricky problem where cursors would traditionally be used, and how you can use features packaged in SQL Server 2005 to avoid them.
The problem
The scenario I will look at today will involve an Inventory table, which holds information regarding initial beginning inventory per product, along with subsequent inventory transactions. To make things a bit more simple, I'll assume that first record for a given date will contain the beginning inventory, each record after the initial record will indicate Inventory being moved.
The required report is a listing of the Products by Date and the amount of Inventory remaining at the end of the day. Management wants to know when inventory is getting low for specific products before the entire supply has been depleted. The code snippet below will create my Inventory table and load some sample data into it.
IF OBJECT_ID('Inventory') IS NOT NULL
DROP TABLE Inventory;
GO
CREATE TABLE [dbo].Inventory
(
InventoryID [int] IDENTITY(1,1) NOT NULL PRIMARY KEY,
[Product] [varchar](150) NULL,
[InventoryDate] [datetime] NULL,
[InventoryCount] INT NULL
)
GO
INSERT INTO Inventory
(Product, InventoryDate, InventoryCount)
SELECT 'Computer', DATEADD(d, -5, GETDATE()), 5000
UNION ALL
SELECT 'Computer', DATEADD(d, -4, GETDATE()), 4000
UNION ALL
SELECT 'Computer', DATEADD(d, -3, GETDATE()), 3000
UNION ALL
SELECT 'Computer', DATEADD(d, -2, GETDATE()), 2000
UNION ALL
SELECT 'Computer', DATEADD(d, -1, GETDATE()), 1000
INSERT INTO Inventory
(Product, InventoryDate, InventoryCount)
SELECT 'BigScreen', DATEADD(d, -5, GETDATE()), 5000
UNION ALL
SELECT 'BigScreen', DATEADD(d, -4, GETDATE()), 2000
UNION ALL
SELECT 'BigScreen', DATEADD(d, -3, GETDATE()), 1900
UNION ALL
SELECT 'BigScreen', DATEADD(d, -2, GETDATE()), 1800
UNION ALL
SELECT 'BigScreen', DATEADD(d, -1, GETDATE()), 1000
INSERT INTO Inventory
(Product, InventoryDate, InventoryCount)
SELECT 'PoolTable', DATEADD(d, -5, GETDATE()), 5000
UNION ALL
SELECT 'PoolTable', DATEADD(d, -4, GETDATE()), 4500
UNION ALL
SELECT 'PoolTable', DATEADD(d, -3, GETDATE()), 3900
UNION ALL
SELECT 'PoolTable', DATEADD(d, -2, GETDATE()), 3800
UNION ALL
SELECT 'PoolTable', DATEADD(d, -1, GETDATE()), 2800
The tricky part about this report comes when trying to determine how much inventory is left after each transaction. For the PoolTable product above, there are 5,000 units available for sale at the end of the first day. At the end of the second day, 4,500 of those units have been sold, leaving only 500 units. At the end of the third day, 3,900 units have been sold, which means that more inventory needs to be purchased. Because the next result is totally dependent on the preceding results, it makes sense to try to use a cursor to loop through each records and store values in variables and temp tables, and just report the result at the end. However, with some crafty TSQL, cursors can be avoided.
In the example below, I use a common-table expression (CTE) and the DENSE_RANK() windowing function, two new features in SQL Server 2005, to re-curse through the results and return the final output as one TSQL statement.
;WITH RecursiveCTE(RowNumber, Product, InventoryCount, InventoryDate, RemainingInventory, Ranking)
AS
(
SELECT *
FROM
(
SELECT
RowNumber = CAST(DENSE_RANK() OVER(PARTITION BY product ORDER BY InventoryDate ASC) AS INT),
Product, InventoryCount, InventoryDate, ValueColumn = InventoryCount, Ranking = 0
FROM Inventory sho
) G
WHERE G.RowNumber = 1
UNION ALL
SELECT
r.RowNumber, r.Product, r.InventoryCount, r.InventoryDate, c.RemainingInventory - r.InventoryCount, c.Ranking + 1
FROM RecursiveCTE c
JOIN
(
SELECT RowNumber = CAST(DENSE_RANK() OVER(PARTITION BY Product ORDER BY InventoryDate ASC) AS INT),*
FROM Inventory
)r ON c.Product = r.Product AND r.RowNumber = c.RowNumber + 1
)
SELECT Product, InventoryDate, InventoryCount, RemainingInventory
FROM RecursiveCTE
ORDER BY Product, InventoryDate
Using a recursive CTE is not the only way to accomplish the desired results. You can also make use of subqueries to return the same thing. In fact, the subquery used below significantly outperforms the recursive CTE example mentioned above.
SELECT First.Product,First.InventoryDate, First.InventoryCount, Outage= 2*MAX(Second.InventoryCount)-SUM(Second.InventoryCount)
FROM
(
SELECT RowNumber = ROW_NUMBER() OVER (ORDER BY Product,InventoryDate ASC),*
FROM Inventory
) First
INNER JOIN
(
SELECT RowNumber = ROW_NUMBER() OVER (ORDER BY Product,InventoryDate ASC),*
FROM Inventory
) Second
ON First.Product = Second.Product AND First.RowNumber >= Second.RowNumber
GROUP BY
First.Product, First.InventoryDate, First.InventoryCount
ORDER BY
First.Product, First.InventoryDate ASC
Why not use a cursor?
The two examples used above are by no means simple TSQL queries. You need a strong foundation of TSQL and of some new SQL Server 2005 features to avoid cursors. But, why should you avoid using a cursor for this problem? They are a little more simple to write and understand because all of the processing happens to one record at a time. However, the use of cursors require more code to write, typically more memory to accomplish the same task, and cursors are typically slower because they only handle one record at a time. The example code below uses cursors to achieve the same as the two queries above, but at the cost of a lot more code and a lot slower execution time.
IF OBJECT_ID('tempdb..#InventoryTemp') IS NOT NULL
DROP TABLE #InventoryTemp
DECLARE @First BIT, @RemainingInventory INT
DECLARE @Product VARCHAR(20), @InventoryID INT, @InventoryCount INT
SELECT * INTO #InventoryTemp
FROM Inventory
ALTER TABLE #InventoryTemp
ADD RemainingInventory INT
DECLARE ProductCursor CURSOR FAST_FORWARD FOR
SELECT DISTINCT Product FROM Inventory
OPEN ProductCursor
FETCH NEXT FROM ProductCursor
INTO @Product
WHILE (@@FETCH_STATUS = 0)
BEGIN
SELECT @First = 1
DECLARE InventoryCursor CURSOR FAST_FORWARD FOR
SELECT InventoryID, InventoryCount
FROM #InventoryTemp
WHERE Product = @Product
ORDER BY InventoryDate ASC
OPEN InventoryCursor
FETCH NEXT FROM InventoryCursor
INTO @InventoryID, @InventoryCounT
WHILE (@@FETCH_STATUS = 0)
BEGIN
SET @RemainingInventory = ISNULL(@RemainingInventory, @InventoryCount)
BEGIN
UPDATE #InventoryTemp
SET RemainingInventory = CASE WHEN @First = 1 THEN InventoryCount ELSE @RemainingInventory - @InventoryCount END
WHERE InventoryID = @InventoryID
SELECT @RemainingInventory = RemainingInventory
FROM #InventoryTemp
WHERE InventoryID = @InventoryID
END
SET @First = 0
FETCH NEXT FROM InventoryCursor
INTO @InventoryID, @InventoryCount
END
CLOSE InventoryCursor
DEALLOCATE InventoryCursor
FETCH NEXT FROM ProductCursor
INTO @Product
END
CLOSE ProductCursor
DEALLOCATE ProductCursor
SELECT * FROM #InventoryTemp
Conclusion
Cursors aren't all bad. In fact, they can make some database problems a lot easier to solve. But, the next time you're saddled with a problem and your first thought is to use a cursor to solve it, take a step back and really examine the problem. There is a chance that you can use a more efficient set-based approach to solve your problem.





1
Tim Chapman - 20/03/09
If you like this article, check out our new site at www.sqlservernation.com.
Thanks,
Tim
» Report offensive content
2
WoodyGuthrie - 20/03/09
Probably the best most straight forward "how not to use cursors" article I've seen. I am curious about how you came up with the formula in the sub-query section to get the remaining inventory. Very clever.
Outage = 2*MAX(Second.InventoryCount)-SUM(Second.InventoryCount)
Thanks.
» Report offensive content
3
John Fox - 20/03/09
I often have forsaken complex Sql statements and used cursors for one reason ... flexibility. In our business, we do a lot of batch processing where tables are updated based on data in a different table, or from batch files. It is common for me to use a cursor for some of these because I can
* Invoke multiple commit points and reduce the chance of data contention since there are no large update statements
* Provide a finer level of data checks without having to use temp tables or multiple passes through the data. I can create a select statement using outer joins, then check for data integrity and null values in one pass, report bad data, and process good data.
* Decrease the impact of a duplicate key insert. How many times have we created a large insert statement only to find out after running it for 6 months that sometimes it doesn't work and now we have thousands of rows we have to peruse to figure out the condition that is causing the duplicate row. Using a cursor provides a means of reporting *EXACTLY* which data item is causing the problem even if no problem is every expected. All my cursors create a key string at the beginning so any SQL error can be succinctly reported and handled and the program can continue. The rejects can be examined the next day rather than at 3am.
* Code that may be larger, but is easier to follow and comment without using items that many SQL developers may not be familiar with.
* Auditability. I can provide fine level statistics on what rows resulted in what updates. This helps during the debug process and also later on if there are concerns whether or not the program is operating properly. Being able to say 'I read 10,000 rows. Of those, 9,000 were processed as insert, and 998 were updates, and 2 were errors. This resulted in 2,548 rows being changed and 12,427 rows being added' saves tremendous amount of time when debugging during development/QC cycles or when troubleshooting production issues. Or, as is often the case, being able to prove the program worked in production because I can trace all the data changes that were made.
While it is great to search for the lowest resource usage statements, if something runs in 4.9 minutes instead of 5, the question becomes is it worth it if higher costs are found elsewhere, such as in maintainability or flexibility. I don't always use cursors, my guess is it's about 50/50 and depends on a lot of criteria. I have rarely seen any significant difference in memory usage or performance that warranted such a decision over other factors.
» Report offensive content
4
Bruce W Cassidy - 20/03/09
I believe what would have validated your argument about taking the time to write more intricate code is showing some performance statistics.
In the real world, inventories are generally modelled by large combinations of tables, and using cursors to scan through moving inventory (even if the tables are well indexes) can be extremely slow. That's a valid reason to look at an alternative to cursors. But the proof would be in comparing the performance results.
Creating complex code just for the sake of avoiding cursors on the other hand violates the "keep it simple" rule. There needs to be a reason to do things the complex way.
While your article is a good presentation for alternatives to something traditionally done with cursors, it doesn't actually give any solid reasoning as to why the alternative is necessary.
Maybe a follow-up with some performance comparisons?
» Report offensive content
5
Jeff Moden - 20/03/09
Heh... you need to do a little more testing... In this case, the cursor is actually far superior to the other 2 solutions in EVERY category except I/O and most of that is just logical reads. Here're some stats on just a paltry 1,000 rows in the 'Computer' category of the test data...
Duration (ms) CPU (ms) Reads Writes Max Internal Rowcount
Recursive CTE 10,785 9,672 15,020 0 100,000
Sub-Query 2,834 2,437 10,079 8 100,000
Cursor 1,715 1,422 133,481 6 1,000
The other thing is that the recursive CTE is the only one that is guaranteed to return the data in date order because of where the windowing function are... the other two methods need ORDER BY's to correctly order the data should they be out of order. Those necessary ORDER BY's are not included in the test results above.
Finally, you have to set the MAX RECURSION option to get a result set out of the recursive CTE with even only 1,000 rows.
There are, in fact, some set based methods to do running totals. None of the methods you have portrayed here are set based methods... not even the second method because they all touch the same row more than once. In the case of the first two methods, they touch the rows in a cartesion join to themselves.
Bottom line is, none of the non-cursor (I can't call them set based because they're not... they're actually RBAR on steroids) examples in this article will beat the cursor for performance on the running total problem. As a friendly suggestion, before you make statements like cursors normally being slower for this type of thing, you should actually do a test with more than 15 rows of data.
» Report offensive content
6
qutesanju - 18/04/09
pls help me for below sql query
» Report offensive content
7
qutesanju - 18/04/09
pls help me on this SQL
» Report offensive content