In the previous post, I covered how to filter the results of a SELECT statement using the WHERE clause. Now I will go over one of the most basic ways that you can use T-SQL to derive information from a data set – aggregate functions and the GROUP BY clause. The Northwind sample database will be used for this tutorial.
What are Aggregate Functions?
Aggregate functions are functions that calculate a value based on the a portion or the entire result set. Here is an overview of some of the built in aggregate functions that I will be covering in this post.
|AVG||Returns the average of the values in the group|
|COUNT||Returns a count of the values in the group|
|MAX||Returns the maximum value in the expression|
|MIN||Returns the minimum value in the expression|
|SUM||Returns the sum of the values in the expression|
One thing to note here is that all of there functions ignore NULLs. Reference this page for a full list of built in aggregate functions.
What is the GROUP BY Clause?
The GROUP BY clause is used to apply an aggregate function to a subset of a query. Suppose that we wanted to find the most expensive item in our product catalog from each supplier. We can use the MAX function to find the most expensive product in the entire catalog as shown below:
SELECT MAX(UnitPrice) FROM Products
But this doesn’t give us any additional information at all. The query will even error out if you try to add the supplier to the query as it is. To find the most expensive product for each supplier, we would have to use the GROUP BY clause.
SELECT SupplierID, MAX(UnitPrice) AS 'Price' FROM Products GROUP BY SupplierID
Now let’s do a quick survey of the remaining aggregate functions.
Here is a demonstration of the usage of AVG on the Products table. It calculates the average product cost for each supplier.
SELECT SupplierID, AVG(UnitPrice) AS 'Price' FROM Products GROUP BY SupplierID
Here is a demonstration of COUNT on the Products table. It calculates how many employees there are with the Sales Representative title.
SELECT COUNT(EmployeeID) FROM [Northwind].[dbo].[Employees] WHERE Title = 'Sales Representative'
What if you were in a situation where there were many of the same item listed in a table? For example, in the Order Details table, the Product ID field repeats for each order. If you wanted to know how many products were sold where more than 10 of those products were purchased at a time, you couldn’t just use count on its own. Instead you would need to include the DISTINCT keyword to count the products rather than the occurrence of the products.
SELECT COUNT(DISTINCT ProductID) FROM [Northwind].[dbo].[Order Details] WHERE Quantity > 10
I have already shown the usage of MAX but MIN works in much the same way. It is important to keep in mind that the MAX/MIN functions work on DATE types too. Here is an example that finds the start date of the longest working Sales Representative:
SELECT MIN(HireDate) AS 'StartDate' FROM Employees WHERE Title = 'Sales Representative'
As the name would suggest, SUM calculates the sum of a column value or an expression. In the code below, I am calculating the total billing amount for each order with any existing discount applied.
SELECT OrderID, SUM((UnitPrice * Quantity) - ((UnitPrice * Quantity) * Discount)) AS OrderTotal FROM [Northwind].[dbo].[Order Details] GROUP BY OrderID ORDER BY OrderTotal