SQL Server group by/ order by
Any time you use an aggregate function (COUNT,SUM,MAX, etc.) you need to include all other columns in a GROUP BY clause. COUNT in your example is returning the number of records that have the same value for datepart(yy, orderdate)
, datename(mm, orderdate)
, datepart(mm, orderdate)
.
An example:
SELECT col1, col2, col3, MAX(col4)
FROM MyTable
GROUP BY col1, col2, col3
Say this returned:
1,2,3,9
1,2,5,9
If you changed your query to this:
SELECT col1, col2, MIN(col3), MAX(col4)
FROM MyTable
GROUP BY col1, col2
It would return:
1,2,3,9
Notice that I added an Aggregate function to col3
(MIN) so I was able to remove col3
from my GROUP BY clause.
Hans Rudel
Updated on June 21, 2022Comments
-
Hans Rudel about 2 years
I have the following query
select datepart(yy, orderDate) as 'year', datename(mm, OrderDate) as 'month', count(*) as 'Orders' from orders (yeah I know its missing the schema, its from a book) group by datepart(yy, orderdate), datename(mm, orderdate) order by datepart(yy, orderdate), datename(mm, orderdate);
which returns 3 columns but
datename(mm, orderdate)
returns a string and therefore ordering by it puts August before January etc.The solution to this was the following:
select datepart(yy, orderDate) as 'year', datename(mm, OrderDate) as 'month', count(*) as 'Orders' from orders (yeah i know its missing the schema, its from a book) group by datepart(yy, orderdate), datename(mm, orderdate), datepart(mm, orderdate) order by datepart(yy, orderdate), datepart(mm, orderdate);
I'm still a little confused with the whole group by / order by sections and how it actually works.
As far as I have understood, group by is creating a work table with 4 columns (that might be wrong)
datepart(yy, orderdate), datename(mm, orderdate), datepart(mm, orderdate)
, and acount
column.Each time it encounters an
orderdate
that it has in the work table it increases the count, otherwise it adds a new row?Originally I thought I could remove the
DateName(mm, orderdate)
from within the group by section but the book said that's not possible.If someone could step through what actually happens behind the scenes/point out a resource which explains how this works in a little more detail id appreciate it.
thanks for your help.
-
Abe Miessler about 12 yearsIt really doesn't matter how many columns the table has, it matters how many you are selecting and how many of those selected columns have an aggregate function applied to them.
-
Hans Rudel about 12 yearsOne last question, if i removed the count(*) from the select statement, i would just get 2 columns returned, and they would be ordered as they were before right? (Im assuming i should use Distinct() instead).
-
Andriy M about 12 yearsDid you actually mean the second query would return
1,2,3,18
? -
Abe Miessler about 12 yearsNope. I'm doing MAX on the final column, not a SUM.
-
Ben almost 6 years
Anytime
all other column must be included in group by
is not really true. All relevent column must be either in aggr. function or group by, irrelevant are simply ignored. It not uncommon when the source table has far more info columns than the query would ever concern, nobody aggregate on all irrelevent columns presented on source table.