Arranging SQL data that you can effectively analyse requires an understanding of how to use certain SQL clauses and operators. These tips will help you figure out how to build statements that will give you the results you want.

Arranging data in a manner that's meaningful can be a challenge. Sometimes all you need is a simple sort. Often, you need more -- you need groups you can analyse and summarise. Fortunately, SQL offers a number of clauses and operators for sorting, grouping, and summarising. The following tips will help you discern when to sort, when to group, and when and how to summarize. For detailed information on each clause and operator, see Books Online.

#1: Bring order with a sort

More often than not, all your data really needs is a little order. SQL's ORDER BY clause organises data in alphabetic or numeric order. Consequently, similar values sort together in what appear to be groups. However, the apparent groups are a result of the sort; they aren't true groups. ORDER BY displays each record whereas a group may represent multiple records.

#2: Reduce similar values into a group

The biggest difference between sorting and grouping is this: Sorted data displays all the records (within the confines of any limiting criteria) and grouped data doesn't. The GROUP BY clause reduces similar values into one record. For instance, a GROUP BY clause can return a unique list of ZIP codes from a source that repeats those values:

SELECT ZIP FROM Customers GROUP BY ZIP

Include only those columns that define the group in both the GROUP BY and SELECT column lists. In other words, the SELECT list must match the GROUP BY list, with one exception: The SELECT list can include aggregate functions. (GROUP BY doesn't allow aggregate functions.)

Keep in mind that GROUP BY won't sort the resulting groups. To arrange groups alphabetically or numerically, add an ORDER BY clause (# 1). In addition, you can't refer to an aliased field in the GROUP BY clause. Group columns must be in the underlying data, but they don't have to appear in the results.

#3: Limit data before it's grouped

You can limit the data that GROUP BY groups by adding a WHERE clause. For instance, the following statement returns a unique list of ZIP codes for just Kentucky customers:

SELECT ZIP FROM Customers WHERE State = 'KY' GROUP BY ZIP

It's important to remember that WHERE filters data before the GROUP BY clause evaluates it.

Like GROUP BY, WHERE doesn't support aggregate functions.

#4: Return all groups

When you use WHERE to filter data, the resulting groups display only those records you specify. Data that fits the group's definition but does not meet the clause's conditions won't make it to a group. Include ALL when you want to include all data, regardless of the WHERE condition. For instance, adding ALL to the previous statement returns all of the ZIP groups, not just those in Kentucky:

SELECT ZIP FROM Customers WHERE State = 'KY' GROUP BY ALL ZIP

As is, the two clauses are in conflict, and you probably wouldn't use ALL in this way. ALL comes in handy when you use an aggregate to evaluate a column. For example, the following statement counts the number of customers in each Kentucky ZIP, while also displaying other ZIP values:

SELECT ZIP, Count(ZIP) AS KYCustomersByZIP FROM Customers WHERE State = 'KY' GROUP BY ALL ZIP

The resulting groups comprise all ZIP values in the underlying data. However, the aggregate column (KYCustomersByZIP) would display 0 for any group other than a Kentucky ZIP.

Remote queries don't support GROUP BY ALL.

#5: Limit data after it's grouped

The WHERE clause (# 3) evaluates data before the GROUP BY clause does. When you want to limit data after it's grouped, use HAVING. Often, the result will be the same whether you use WHERE or HAVING, but it's important to remember that the clauses are not interchangeable. Here's a good guideline to follow when you're in doubt: Use WHERE to filter records; use HAVING to filter groups.

Usually, you'll use HAVING to evaluate a group using an aggregate. For instance, the following statement returns a unique list of ZIP codes, but the list might not include every ZIP code in the underlying data source:

SELECT ZIP, Count(ZIP) AS CustomersByZIP FROM Customers GROUP BY ZIP HAVING Count(ZIP) = 1

Only those groups with just one customer make it to the results.

#6: Get a closer look at WHERE and HAVING

If you're still confused about when to use WHERE and when to use HAVING, apply the following guidelines:

  • WHERE comes before GROUP BY; SQL evaluates the WHERE clause before it groups records.
  • HAVING comes after GROUP BY; SQL evaluates HAVING after it groups records.

#7: Summarize grouped values with aggregates

Grouping data can help you analyse your data, but sometimes you'll need a bit more information than just the groups themselves. You can add an aggregate function to summarise grouped data. For instance, the following statement displays a subtotal for each order:

SELECT OrderID, Sum(Cost * Quantity) AS OrderTotal FROM Orders GROUP BY OrderID

As with any other group, the SELECT and GROUP BY lists must match. Including an aggregate in the SELECT clause is the only exception to this rule.

#8: Summarise the aggregate

You can further summarise data by displaying a subtotal for each group. SQL's ROLLUP operator displays an extra record, a subtotal, for each group. That record is the result of evaluating all the records within each group using an aggregate function. The following statement totals the OrderTotal column for each group:

SELECT Customer, OrderNumber, Sum(Cost * Quantity) AS OrderTotal FROM Orders GROUP BY Customer, OrderNumber WITH ROLLUP

The ROLLUP row for a group with two OrderTotal values of 20 and 25 would display an OrderTotal of 45. The first record in a ROLLUP result is unique because it evaluates all of the group records. That value is a grand total for the entire recordset.

ROLLUP doesn't support DISTINCT in aggregate functions or the GROUP BY ALL clause.

#9: Summarise each column

The CUBE operator goes a step further than ROLLUP by returning totals for each value in each group. The results are similar to ROLLUP, but CUBE includes an additional record for each column in the group. The following statement displays a subtotal for each group and an additional total for each customer:

SELECT Customer, OrderNumber, Sum(Cost * Quantity) AS OrderTotal FROM Orders GROUP BY Customer, OrderNumber WITH CUBE

CUBE gives the most comprehensive summarisation. It not only does the work of both the aggregate and ROLLUP, but also evaluates the other columns that define the group. In other words, CUBE summarises every possible column combination.

CUBE doesn't support GROUP BY ALL.

#10: Bring order to summaries

When the results of a CUBE are confusing (and they usually are), add the GROUPING function as follows:

SELECT GROUPING(Customer), OrderNumber, Sum(Cost * Quantity) AS OrderTotal FROM Orders GROUP BY Customer, OrderNumber WITH CUBE

The results include two additional values for each row:

  • The value 1 indicates that the value to the left is a summary value--the result of the ROLLUP or CUBE operator.
  • The value 0 indicates that the value to the left is a detail record produced by the original GROUP BY clause.

Do you need help with SQL Server? Gain advice from Builder AU forums

Comments

1

Jay - 27/07/07

Very neat & good article

» Report offensive content

2

Tim Wehr - 27/07/07

In tip #2, your statement, "In other words, the SELECT list must match the GROUP BY list" isn't exactly true. It IS POSSIBLE to GROUP BY a field that does NOT appear in the SELECT list (althougth the resulting data may be less than useful, as you will not know which group each datum belongs to).

The INVERSE of your statement IS TRUE. With the exception of agregates that you mentioned, "The GROUP BY list MUST match the SELECT list."

It is a small point, but may have significance in specific cases.

» Report offensive content

3

Jeff Duff - 28/07/07

Very concise - boils a lot of reference book material down to its key components.

Nice job.

» Report offensive content

4

Tom - 28/07/07

Very useful article - thank you.

Tim - Your comment above is much appreciated, but I'm confused.
"It is possible to group by a field that does not appear in the SELECT list"
vs.
"The GROUP BY list must match the SELECT list"

These two statements seem to contradict each other. Can you show an example?

Thanks

» Report offensive content

5

Akber Alwani - 28/07/07

HI Tom, you can do so like you can write the query and not put that column in select list but the representation of data will be meaning less. Just use this query to Northwind database come with SQL serever

SELECT count(customerid) FROM customers
group by region

» Report offensive content

6

Neil - 30/07/07

Here is a interesting use for Coalesce with RollUp and possibly Cube operators. In your example number 10 you used the Grouping Function to display a 1 or a 0 in the column thus defining it either as a summary or a detail record.

You can use the colaesce funtion on this query to return values such as total instead of 1 see my example code for more details.

SELECT Coalesce(Customer,'TOTAL'), OrderNumber, Sum(Cost * Quantity) AS OrderTotal FROM Orders GROUP BY Customer, OrderNumber WITH CUBE 

» Report offensive content

7

Neil - 30/07/07

Here is a interesting use for Coalesce with RollUp and possibly Cube operators. In your example number 10 you used the Grouping Function to display a 1 or a 0 in the column thus defining it either as a summary or a detail record.

You can use the colaesce funtion on this query to return values such as total instead of 1 see my example code for more details.
SELECT Coalesce(Customer,'TOTAL'), OrderNumber, Sum(Cost * Quantity) AS OrderTotal FROM Orders GROUP BY Customer, OrderNumber WITH CUBE

» Report offensive content

8

Pragadheeswaran - 05/02/09

Author(s) is/are doing a very nice work. Keep it Up. Keep going

» Report offensive content

9

Nabeel amanat - 02/04/09

i need help here guys.
is just that i have created a query joining two tables but when it shows up on screen the id is repeated twice as the data in another table has two value related to the data in first table can any one explain me how i can group them by id
when every i add
GROUP BY companyid, keyperson.name
it gives me an error saying it cannot parse the content
as i am having two keyperson for one company but the company comes twice for each key person
will appreciate if some one can help me out with this

» Report offensive content

10

Rob Mayes - 03/02/10

Is it possiblle to sort a aggreate function such as sum(fieldname) or count(*) with the group by clause? I would like to sort the highest count to the lowest.

» Report offensive content

11

Jason P - 30/03/10

Nice post.

Rob- Yes, just add the sum(fieldname) or count(*) as the sort in the ORDER BY.

» Report offensive content

Leave a comment

You must read and type the 6 chars within 0..9 and A..F

* indicates mandatory fields.

11

Jason P - 30/03/10

Nice post. Rob- Yes, just add the sum(fieldname) or count(*) as the sort in the ORDER BY. ... more

10

Rob Mayes - 02/03/10

Is it possiblle to sort a aggreate function such as sum(fieldname) or count(*) with the group by clause? I would ... more

9

Nabeel amanat - 04/02/09

i need help here guys. is just that i have created a query joining two tables but when it shows up on ... more

Log in


Sign up | Forgot your password?

What's on?

  • Optus Deal

    Broadband + home phone + PlayStation®3 in a single package price!