How SQL's DISTINCT clause works?

14,036

Solution 1

DISTINCT filters out duplicate values of your returned fields.

A really simplified way to look at it is:

  • It builds your overall result set (including duplicates) based on your FROM and WHERE clauses
  • It sorts that result set based on the fields you want to return
  • It removes any duplicate values in those fields

It's semantically equivalent to a GROUP BY where all returned fields are in the GROUP BY clause.

Solution 2

DISTINCT simply de-duplicates the resultant recordset after all other query operations have been performed. This article has more detail.

Share:
14,036
korzeniow
Author by

korzeniow

Updated on June 19, 2022

Comments

  • korzeniow
    korzeniow about 2 years

    I'm looking for the answer on how DISTINCT clause works in SQL (SQL Server 2008 if that makes a difference) on a query with multiple tables joined?

    I mean how the SQL engine handles the query with DISTINCT clause?

    The reason I'm asking is that I was told by my far more experienced colleague that SQL applies DISTINCT to every field of every table. It seems unlikely for me, but I want to make sure....

    For example having two tables:

    CREATE TABLE users
    (
    u_id INT PRIMARY KEY,
    u_name VARCHAR(30),
    u_password VARCHAR(30)
    )
    
    CREATE TABLE roles
    (
    r_id INT PRIMARY KEY,
    r_name VARCHAR(30)
    )
    
    CREATE TABLE users_l_roles
    (
    u_id INT FOREIGN KEY REFERENCES users(u_id) ,
    r_id INT FOREIGN KEY REFERENCES roles(r_id) 
    )
    

    And then having this query:

    SELECT          u_name
    FROM            users 
    INNER JOIN      users_l_roles ON users.u_id = users_l_roles.u_id
    INNER JOIN      roles ON users_l_roles.r_id = roles.r_id 
    

    Assuming there was user with two roles then the above query will return two records with the same user name.

    But this query with distinct:

    SELECT DISTINCT u_name
    FROM            users 
    INNER JOIN      users_l_roles ON users.u_id = users_l_roles.u_id
    INNER JOIN      roles ON users_l_roles.r_id = roles.r_id 
    

    will return only one user name.

    The question is whether SQL will compare all the fields from all the joined tables (u_id, u_name, u_password, r_id, r_name) or it will compare only named fields in the query (u_name) and distinct the results?

  • Steam
    Steam over 10 years
    I learned all this by making now, by making a mistake as shown here - stackoverflow.com/questions/20750181/count-with-distinct Finally, I ended up using GROUP BY instead of DISTINCT.