What is the difference between group by, distinct, Union for selecting distinct values for multiple columns?

11,485

Solution 1

Starting with what I think is the simplest, DISTINCT, really is just that. It returns the distinct combinations of rows. Think of this dataset:

COL1      COL2      COL3
A         B         C
D         E         F
G         H         I
A         B         C   <- duplicate of row 1

This will return 3 rows because the 4th row in the dataset exactly matches the first row. Result:

COL1      COL2      COL3
A         B         C
D         E         F
G         H         I

The GROUP BY is frequently used for summaries and other calculations select COL1, SUM(COL2) from table group by column1;

For this dataset:

COL1      COL2
A         5
A         6
B         2
C         3
C         4
C         5

would return

COL1     SUM(COL2)
A        11
B        2
C        12

a UNION just takes results from different queries and presents them as 1 result set:

Table1
COL1
A

Table2
COLX
B

Table3
WHATEVER_COLUMN_NAME
Giddyup

select COL1 from Table1
UNION
select COLX from Table2
UNION 
select WHATEVER_COLUMN_NAME from Table3;

Result Set:

A
B
Giddyup

When performing a union, the column datatypes must match up. You can't UNION a number column with a char column (unless you explicitly perform a data conversion)

Solution 2

Lets assume this is your db data:

column1 | column2 | column3
1       | 2       | 1
1       | 2       | 2
1       | 2       | 1
3       | 1       | 2
1       | 2       | 2
1       | 2       | 2
1       | 2       | 2

First query

In the first example you will get all column combinations from the db (as GROUP BY 1,2,3 does nothing) including duplicates, so it will return:

1       | 2       | 1
1       | 2       | 2
1       | 2       | 1
3       | 1       | 2
1       | 2       | 2
1       | 2       | 2
1       | 2       | 2

2nd query

Second example takes unique values for column tuples so you will end with

1       | 2       | 1
1       | 2       | 2
3       | 1       | 2

3rd query

Last query takes all values from three columns and then it removes duplicates from that set. So you will get all values from any of the tables. In the end this will return

1
2
3

Does this makes it clear?

Solution 3

Lets go with a sample set of data

orderid    customer orderdate
1          B        July 29
2          A        Aug 1
3          A        Aug 4
4          C        Aug 5
5          B        Aug 6
6          A        Aug 11

Distinct basically returns a single instance of a given record with no duplicates of the entire set of columns in the result set. Ex: "select distinct customer from orders" would return "A", "B", "C" defaulted in alpha order of column(s) chosen.

Group by is to do aggregations within a given set of fields in a query. Ex:

select customer, count(*) as NumberOfOrders from Orders group by 1

Would result with...
A    3
B    2
C    1

You can also apply distinct (only once), within a query, but within a given group..

select customer, count(*) as NumberOfOrders, count( distinct {month of orderdate} ) as CustomerMonths from orders group by customer

Would result with
A    3    1  (all orders were in August)
B    2    2  (had orders in July and August)
C    1    1  (only one order in August)

Unions are queries that must be the exact same result format, column names and sequence of fields. Lets say you have an orders table that is the exact same structure as an archived version of data too. You only keep current data over the most current year, all historical is pushed to archive. If you wanted to get ALL order activity for a given customer in one query, you would want to do a union

select customerid, orderdate, amount from CurrentOrders where customerid = ?? order by 2 descending UNION select customerid, orderdate, amount from ArchivedOrders where customerid = ??

The ORDER by clause of the first select will drive the results all all subsequent records being pulled into the results. Its like SQL saying go to table one, get all that qualify, then sort. Then, go to table two, get all that qualify there and pull into the existing sorted list from table one. Final result is ALL records.

HTH

Share:
11,485

Related videos on Youtube

Enjoy coding
Author by

Enjoy coding

Procedural Programmer with knowledge in C, C++, C#, VB, VC++, JCL, COBOL, MYSQL, Teradata, Business Objects, Microstrategy, R, AutoIt, AutoHotkey, HTML, CSS, JavaScript, JQUERY, AJAX, PHP, Perl, CodeIgniter. Thirsty for more Procedural languages.

Updated on June 04, 2022

Comments

  • Enjoy coding
    Enjoy coding almost 2 years

    This question explained about a way of getting distinct combination of multiple columns. But I want to know the difference between the methods of DISTINCT, UNION, GROUP BY keyword method for this purpose. I am getting different results when using them. My queries are like this Query 1.

    select 
    column1,
    column2,
    column3
    from table
    group by 1,2,3
    

    Query 2.

    select distinct 
    column1,
    column2,
    column3
    from table
    

    Query 3.

    SELECT DISTINCT(ans) FROM (
        SELECT column1 AS ans FROM sametable
        UNION
        SELECT column2 AS ans FROM sametable
        UNION
        SELECT column3 AS ans FROM sametable
    ) AS Temp
    

    I am getting different number of rows for above queries(Edit: The first two are giving equal number of rows but last one is giving differnetly). Can any body explain what the above queries are doing? Especially the third one?

    EDIT: Note that I am doing UNION on same table. In that case what will happen?

  • Mr. Shiny and New 安宇
    Mr. Shiny and New 安宇 over 14 years
    In some dialects of SQL GROUP BY 1, 2, 3 means "Group by the first, second and third column" and thus would be equivalent to the second query
  • Enjoy coding
    Enjoy coding over 14 years
    Well.. I asked in general sql. But my problem is in teradata sql.
  • Enjoy coding
    Enjoy coding over 14 years
    Thanks. But what will happen if you do UNION on the same table?. Group by 1,2,3 means group by column1,column2, column3 in teradata sql. So the 1st and second query returns same.
  • Enjoy coding
    Enjoy coding over 14 years
    Thanks. Actually I am confused about the UNION on same table two times as in the table. How will the result set come when we do UNION on same table?
  • Enjoy coding
    Enjoy coding over 14 years
    Thank you very much. Your explanation of group by is fantastic. The last 3 paras of your answer was very much helpful. But What will happen when you do UNION on same table? Also "group by 1,2,3" and "distinct" will give the same results. Thanks
  • RaYell
    RaYell over 14 years
    Group by 1,2,3 is specific to some SQL dialects. Some may group by first, second and third column, others may ignore it. Union on the same table will combine the values from the three columns into one column that can be filtered later on. I think that those fields may have to be of the same/similar type. I'm not sure if you can make a union of INT and TEXT fields.
  • Brett McCann
    Brett McCann over 14 years
    In your example you were union'ing off of different columns. So you are in effect, taking row based data, and presenting it as column based. If you had a table with 4 text columns and 1 row of data, and you selected each column using unions, your result set would be 1 column with 4 rows of data.
  • DRapp
    DRapp over 14 years
    Don't know... never had an instance to need that... However, what you may do is a second select based on the result of the first using the same group by to "roll-up" common elements coming from the second instance table.
  • Charleh
    Charleh almost 12 years
    An important point to note is that a union discards duplicates, whilst a UNION ALL keeps duplicates - sqlfiddle.com/#!3/83d93/1