SQL Server group by/ order by

sql-server tsql

15,033

Any time you use an aggregate function (COUNT,SUM,MAX, etc.) you need to include all other columns in a GROUP BY clause. COUNT in your example is returning the number of records that have the same value for datepart(yy, orderdate), datename(mm, orderdate), datepart(mm, orderdate).

An example:

SELECT col1, col2, col3, MAX(col4)
FROM MyTable
GROUP BY col1, col2, col3

Say this returned:

1,2,3,9
1,2,5,9

If you changed your query to this:

SELECT col1, col2, MIN(col3), MAX(col4)
FROM MyTable
GROUP BY col1, col2

It would return:

1,2,3,9

Notice that I added an Aggregate function to col3 (MIN) so I was able to remove col3 from my GROUP BY clause.

15,033

Author by

Hans Rudel

Updated on June 21, 2022

Comments

Hans Rudel about 2 years
I have the following query
```
select
    datepart(yy, orderDate) as 'year',
    datename(mm, OrderDate) as 'month',
    count(*) as 'Orders'
from orders     (yeah I know its missing the schema, its from a book)
group by
    datepart(yy, orderdate),
    datename(mm, orderdate)
order by
    datepart(yy, orderdate),
    datename(mm, orderdate);
```
which returns 3 columns but datename(mm, orderdate) returns a string and therefore ordering by it puts August before January etc.

The solution to this was the following:
```
select
    datepart(yy, orderDate) as 'year',
    datename(mm, OrderDate) as 'month',
    count(*) as 'Orders'
from orders  (yeah i know its missing the schema, its from a book)
group by
    datepart(yy, orderdate),
    datename(mm, orderdate),
    datepart(mm, orderdate)
order by
    datepart(yy, orderdate),
    datepart(mm, orderdate);
```
I'm still a little confused with the whole group by / order by sections and how it actually works.

As far as I have understood, group by is creating a work table with 4 columns (that might be wrong) datepart(yy, orderdate), datename(mm, orderdate), datepart(mm, orderdate), and a count column.

Each time it encounters an orderdate that it has in the work table it increases the count, otherwise it adds a new row?

Originally I thought I could remove the DateName(mm, orderdate) from within the group by section but the book said that's not possible.

If someone could step through what actually happens behind the scenes/point out a resource which explains how this works in a little more detail id appreciate it.

thanks for your help.
Abe Miessler about 12 years

It really doesn't matter how many columns the table has, it matters how many you are selecting and how many of those selected columns have an aggregate function applied to them.
Hans Rudel about 12 years

One last question, if i removed the count(*) from the select statement, i would just get 2 columns returned, and they would be ordered as they were before right? (Im assuming i should use Distinct() instead).
Andriy M about 12 years

Did you actually mean the second query would return 1,2,3,18?
Abe Miessler about 12 years

Nope. I'm doing MAX on the final column, not a SUM.
Ben almost 6 years

Anytime all other column must be included in group by is not really true. All relevent column must be either in aggr. function or group by, irrelevant are simply ignored. It not uncommon when the source table has far more info columns than the query would ever concern, nobody aggregate on all irrelevent columns presented on source table.