Thursday, 1 February 2018

SQL Server Ranking Functions

There are four ranking functions included with SQL Server (starting with SQL Server 2005). Those functions; ROW_NUMBER, RANK, DENSE_RANK, and NTILE; can be used to rank the rows of your result set over a partition. This article will present those functions and examples of their use.

Ranking Functions


Ranking functions are a subset of the built in functions in SQL Server. They are used to provide a rank of one kind or another to a set of rows in a partition. The partition can be the full result set, if there is no partition. Otherwise, the partitioning of the result set is defined using the partition clause of the OVER clause. The order of the ranking within the partition is defined by the order clause of OVER. Order is required even though partition is optional. The OVER clause syntax can be found here.
In the following, each ranking function will be discussed and an example shown. The examples for this article use tables in the AdventureWorks2008R2 database.

ROW_NUMBER

The ROW_NUMBER ranking function is the simplest of the ranking functions. Its purpose in life is to provide consecutive numbering of the rows in the result set by the order selected in the OVER clause for each partition specified in the OVER clause.
If no partition is specified, ROW_NUMBER will provide a consecutive numbering of the rows based on the order clause. If a partition is provided, the numbering is consecutive within the partition and begins again at 1 when the partition changes.

Example
Here we have a query to find the top 10 products that sold in 2008. ROW_NUMBER is used here to designate the ordering from highest qty ordered to lowest. The order clause in the OVER clause tells it to rank the rows by the sum of all the order quantities for that product.

SELECT TOP(10) ProductID, SUM(OrderQty) AS ProductOrderCount, ROW_NUMBER() OVER (ORDER BY SUM(OrderQty) DESC) AS Row
FROM Sales.SalesOrderDetail sod
JOIN Sales.SalesOrderHeader soh
ON sod.SalesOrderID = soh.SalesOrderID
WHERE OrderDate BETWEEN '1/1/2008' AND '12/31/2008'
GROUP BY ProductID
ORDER BY Row

Results
ProductID ProductOrderCount Row
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
870 3519 1
712 2376 2
711 2133 3
708 2064 4
707 2053 5
873 2051 6
864 1834 7
921 1782 8
884 1654 9
877 1475 10
(10 row(s) affected)

To give you an example of what ROW_NUMBER would do if there was a partition involved; consider this example where TerritoryID was added to the result set. In this case, we have to retrieve the top 20 so you can see the row number start over when the partition changes. The partition is by ProductID and the order by clause of the select statement changed to order the result set by ProductID and then Row. The GROUP BY clause also changed to include Territory ID. So, we are getting a summation of the OrderQty for each product in each territory. That result set then has the ROW_NUMBER applied to each product in the order of most quantity ordered in each territory. After the ranking is applied, the result set is ordered by ProductID and Row and the top 20 rows are returned.

SELECT TOP(20) ProductID, soh.TerritoryID,
SUM(OrderQty) AS ProductOrderCount,
ROW_NUMBER() OVER ( PARTITION BY sod.ProductID ORDER BY SUM(OrderQty) DESC) AS Row
FROM Sales.SalesOrderDetail sod
JOIN Sales.SalesOrderHeader soh
ON sod.SalesOrderID = soh.SalesOrderID
JOIN Sales.SalesTerritory st
ON soh.TerritoryID = st.TerritoryID
WHERE OrderDate BETWEEN '1/1/2008' AND '12/31/2008'
GROUP BY ProductID, soh.TerritoryID
ORDER BY ProductID, Row

Results

ProductID TerritoryID ProductOrderCount Row
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
707 4 355 1
707 6 324 2
707 9 283 3
707 10 249 4
707 1 243 5
707 7 217 6
707 8 211 7
707 3 66 8
707 5 59 9
707 2 46 10
708 4 374 1
708 6 332 2
708 9 288 3
708 1 236 4
708 10 223 5
708 8 208 6
708 7 182 7
708 2 92 8
708 3 65 9
708 5 64 10
(20 row(s) affected)

RANK

The function RANK provides consecutive numbering except in the case of a tie. This function is used when you want all tied rows to have the same ranking, but the next row after the tied rows to have the rank it would have been assigned if there had been no tie. In other words, the numbers assigned by RANK are not necessarily consecutive.

Example
Let's take the first example from above (products ordered in 2008) and apply RANK instead of ROW_NUMBER. To get a result worth looking at, however, we have to limit the products to those having a ProductOrderCount between 400 and 500. You can see in the results that two of the rows had ProductOrderCount equal to 488. Therefore, with RANK, they both received the same ranking, in this case 2, and the next number is skipped. In this example the 2nd and 3rd items have a ranking of 2 and the fourth item has a ranking of 4.
- Advertisement -
SELECT ProductID, SUM(OrderQty) AS ProductOrderCount,
RANK() OVER (ORDER BY SUM(OrderQty) DESC) AS Ranking
FROM Sales.SalesOrderDetail sod
JOIN Sales.SalesOrderHeader soh
ON sod.SalesOrderID = soh.SalesOrderID
WHERE OrderDate BETWEEN '1/1/2008' AND '12/31/2008'
GROUP BY ProductID
HAVING SUM(OrderQty) BETWEEN 400 AND 500
ORDER BY Ranking

Results
ProductID ProductOrderCount Ranking
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
928 499 1
954 488 2
966 488 2
974 471 4
972 464 5
933 463 6
801 453 7
977 447 8
797 442 9
860 409 10
(10 row(s) affected)

DENSE_RANK

As with RANK, DENSE_RANK will divvy out the same rank to tying rows. In contrast, however, DENSE_RANK will not skip any numbers. So if the first two rows tie, they will both receive 1 for the rank. The third row will receive 2 for its rank, unlike with RANK where it would have been assigned 3.

Example
To display the difference, here is the same example query as above. Changing RANK to DENSE_RANK renders the same results with one distinction. Rows 2 and 3 are still tied and therefore receive the same rank of 2; however the fourth row now receives the rank of 3 instead of 4.

SELECT ProductID, SUM(OrderQty) AS ProductOrderCount,
DENSE_RANK() OVER (ORDER BY SUM(OrderQty) DESC) AS Ranking
FROM Sales.SalesOrderDetail sod
JOIN Sales.SalesOrderHeader soh
ON sod.SalesOrderID = soh.SalesOrderID
WHERE OrderDate BETWEEN '1/1/2008' AND '12/31/2008'
GROUP BY ProductID
HAVING SUM(OrderQty) BETWEEN 400 AND 500
ORDER BY Ranking

Results
ProductID ProductOrderCount Ranking
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
928 499 1
954 488 2
966 488 2
974 471 3
972 464 4
933 463 5
801 453 6
977 447 7
797 442 8
860 409 9
(10 row(s) affected)

NTILE

When you want to group rows equally across a partition, the NTILE function is the one you want. For this ranking function you need to provide it the number of groups into which you want the rows equally divided. Should the row count not be equally divisible by the number of groups, there will be a group or groups with one more row than the rest. Those groups with the extra row will come first in the ranking.
For instance, if you had 26 rows and desired 5 groups, the first group would contain 6 rows and the remaining 4 groups would contain 5 rows each.

Example
Again using the same query to identify products with a total quantity ordered in 2008 between 400 and 500, this time we are grouping them into 4 groups using NTILE and supplying the group count in the function call. Since this query returns 10 rows and 10 is not equally divided into 4 groups, you can see in the results that the first two groups contain 3 rows each and the last two groups contain just two rows each.

SELECT ProductID, SUM(OrderQty) AS ProductOrderCount,
NTILE(4) OVER (ORDER BY SUM(OrderQty) DESC) AS Ranking
FROM Sales.SalesOrderDetail sod
JOIN Sales.SalesOrderHeader soh
ON sod.SalesOrderID = soh.SalesOrderID
WHERE OrderDate BETWEEN '1/1/2008' AND '12/31/2008'
GROUP BY ProductID
HAVING SUM(OrderQty) BETWEEN 400 AND 500
ORDER BY Ranking

Results
ProductID ProductOrderCount Ranking
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
928 499 1
954 488 1
966 488 1
974 471 2
972 464 2
933 463 2
801 453 3
977 447 3
797 442 4
860 409 4
(10 row(s) affected)

Conclusion

SQL Server includes a subset of built-in functions called ranking functions. These ranking functions provide various means of ranking and grouping result sets by utilizing the OVER clause to identify the partitioning of the result set and the order for the ranking.



No comments:

Post a Comment