T-SQL Fundamentals 3 – Aggregate Functions and the GROUP BY Clause

In the previous post, I covered how to filter the results of a SELECT statement using the WHERE clause. Now I will go over one of the most basic ways that you can use T-SQL to derive information from a data set – aggregate functions and the GROUP BY clause. The Northwind sample database will be used for this tutorial.

What are Aggregate Functions?

Aggregate functions are functions that calculate a value based on the a portion or the entire result set. Here is an overview of some of the built in aggregate functions that I will be covering in this post.

FunctionUsage
AVGReturns the average of the values in the group
COUNTReturns a count of the values in the group
MAXReturns the maximum value in the expression
MINReturns the minimum value in the expression
SUMReturns the sum of the values in the expression

One thing to note here is that all of there functions ignore NULLs. Reference this page for a full list of built in aggregate functions.

What is the GROUP BY Clause?

The GROUP BY clause is used to apply an aggregate function to a subset of a query. Suppose that we wanted to find the most expensive item in our product catalog from each supplier. We can use the MAX function to find the most expensive product in the entire catalog as shown below:

SELECT MAX(UnitPrice) FROM Products

But this doesn’t give us any additional information at all. The query will even error out if you try to add the supplier to the query as it is. To find the most expensive product for each supplier, we would have to use the GROUP BY clause.

SELECT 
	SupplierID, 
	MAX(UnitPrice) AS 'Price' 
FROM Products 
GROUP BY SupplierID

Now let’s do a quick survey of the remaining aggregate functions.

AVG

Here is a demonstration of the usage of AVG on the Products table. It calculates the average product cost for each supplier.

SELECT 
	SupplierID, 
	AVG(UnitPrice) AS 'Price' 
FROM Products 
GROUP BY SupplierID

COUNT

Here is a demonstration of COUNT on the Products table. It calculates how many employees there are with the Sales Representative title.

SELECT COUNT(EmployeeID)
FROM [Northwind].[dbo].[Employees]
WHERE Title = 'Sales Representative'

What if you were in a situation where there were many of the same item listed in a table? For example, in the Order Details table, the Product ID field repeats for each order. If you wanted to know how many products were sold where more than 10 of those products were purchased at a time, you couldn’t just use count on its own. Instead you would need to include the DISTINCT keyword to count the products rather than the occurrence of the products.

SELECT 
COUNT(DISTINCT ProductID) 
FROM [Northwind].[dbo].[Order Details]
WHERE Quantity > 10

MAX/MIN

I have already shown the usage of MAX but MIN works in much the same way. It is important to keep in mind that the MAX/MIN functions work on DATE types too. Here is an example that finds the start date of the longest working Sales Representative:

SELECT 
	MIN(HireDate) AS 'StartDate'
FROM Employees
WHERE Title = 'Sales Representative'

SUM

As the name would suggest, SUM calculates the sum of a column value or an expression. In the code below, I am calculating the total billing amount for each order with any existing discount applied.

SELECT 
	OrderID,
	SUM((UnitPrice * Quantity) - ((UnitPrice * Quantity) * Discount)) AS OrderTotal
FROM [Northwind].[dbo].[Order Details]
GROUP BY OrderID
ORDER BY OrderTotal

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: