SQL/mysql - Select distinct/UNIQUE but return all columns?
Solution 1
You're looking for a group by:
select *
from table
group by field1
Which can occasionally be written with a distinct on statement:
select distinct on field1 *
from table
On most platforms, however, neither of the above will work because the behavior on the other columns is unspecified. (The first works in MySQL, if that's what you're using.)
You could fetch the distinct fields and stick to picking a single arbitrary row each time.
On some platforms (e.g. PostgreSQL, Oracle, T-SQL) this can be done directly using window functions:
select *
from (
select *,
row_number() over (partition by field1 order by field2) as row_number
from table
) as rows
where row_number = 1
On others (MySQL, SQLite), you'll need to write subqueries that will make you join the entire table with itself (example), so not recommended.
Solution 2
From the phrasing of your question, I understand that you want to select the distinct values for a given field and for each such value to have all the other column values in the same row listed. Most DBMSs will not allow this with neither DISTINCT
nor GROUP BY
, because the result is not determined.
Think of it like this: if your field1
occurs more than once, what value of field2
will be listed (given that you have the same value for field1
in two rows but two distinct values of field2
in those two rows).
You can however use aggregate functions (explicitely for every field that you want to be shown) and using a GROUP BY
instead of DISTINCT
:
SELECT field1, MAX(field2), COUNT(field3), SUM(field4), ....
FROM table GROUP BY field1
Solution 3
If I understood your problem correctly, it's similar to one I just had. You want to be able limit the usability of DISTINCT to a specified field, rather than applying it to all the data.
If you use GROUP BY without an aggregate function, which ever field you GROUP BY will be your DISTINCT filed.
If you make your query:
SELECT * from table GROUP BY field1;
It will show all your results based on a single instance of field1.
For example, if you have a table with name, address and city. A single person has multiple addresses recorded, but you just want a single address for the person, you can query as follows:
SELECT * FROM persons GROUP BY name;
The result will be that only one instance of that name will appear with its address, and the other one will be omitted from the resulting table. Caution: if your fileds have atomic values such as firstName, lastName you want to group by both.
SELECT * FROM persons GROUP BY lastName, firstName;
because if two people have the same last name and you only group by lastName, one of those persons will be omitted from the results. You need to keep those things into consideration. Hope this helps.
Solution 4
That's a really good question. I have read some useful answers here already, but probably I can add a more precise explanation.
Reducing the number of query results with a GROUP BY statement is easy as long as you don't query additional information. Let's assume you got the following table 'locations'.
--country-- --city--
France Lyon
Poland Krakow
France Paris
France Marseille
Italy Milano
Now the query
SELECT country FROM locations
GROUP BY country
will result in:
--country--
France
Poland
Italy
However, the following query
SELECT country, city FROM locations
GROUP BY country
...throws an error in MS SQL, because how could your computer know which of the three French cities "Lyon", "Paris" or "Marseille" you want to read in the field to the right of "France"?
In order to correct the second query, you must add this information. One way to do this is to use the functions MAX() or MIN(), selecting the biggest or smallest value among all candidates. MAX() and MIN() are not only applicable to numeric values, but also compare the alphabetical order of string values.
SELECT country, MAX(city) FROM locations
GROUP BY country
will result in:
--country-- --city--
France Paris
Poland Krakow
Italy Milano
or:
SELECT country, MIN(city) FROM locations
GROUP BY country
will result in:
--country-- --city--
France Lyon
Poland Krakow
Italy Milano
These functions are a good solution as long as you are fine with selecting your value from the either ends of the alphabetical (or numeric) order. But what if this is not the case? Let us assume that you need a value with a certain characteristic, e.g. starting with the letter 'M'. Now things get complicated.
The only solution I could find so far is to put your whole query into a subquery, and to construct the additional column outside of it by hands:
SELECT
countrylist.*,
(SELECT TOP 1 city
FROM locations
WHERE
country = countrylist.country
AND city like 'M%'
)
FROM
(SELECT country FROM locations
GROUP BY country) countrylist
will result in:
--country-- --city--
France Marseille
Poland NULL
Italy Milano
Solution 5
SELECT c2.field1 ,
field2
FROM (SELECT DISTINCT
field1
FROM dbo.TABLE AS C
) AS c1
JOIN dbo.TABLE AS c2 ON c1.field1 = c2.field1
aryaxt
Updated on June 08, 2020Comments
-
aryaxt almost 4 years
SELECT DISTINCT field1, field2, field3, ...... FROM table
I am trying to accomplish the following sql statement but I want it to return all columns is this possible? Something like:
SELECT DISTINCT field1, * from table
-
Conrad Frix almost 13 yearsI think you forgot an alias
row_number() over (partition by field1) row_number
-
Ankur-m over 11 yearsThat won't do the job. You have selected the distinct column in the subquery but the where clause gets all those columns with that value. So the query is as good as writing 'select * from table' unless 'field' column is a unique column in which case the distinct on that column isn't required at all.
-
Ankur-m over 11 yearsThe query won't parse for me and gives an error:
The ranking function "row_number" must have an ORDER BY clause
. We need to add order by clause after partition by field1. So the correct query will beselect * from ( select *, row_number() over (partition by field1 order by orderbyFieldName) as row_number from table ) as rows where row_number = 1
-
Robbert almost 11 yearsTo make this a good answer, you should include a little more detail about what you mean.
-
stalk almost 9 years+1 for this solution. So we can do
SELECT field1, MIN(field2), MIN(field3), MIN(field4), .... FROM table GROUP BY field1
, and field2, 3, 4,,, are not required to be integers (or other digits), they can be char fields as well -
Joaquin Iurchuk over 8 yearsThanks! I was in the same problem and the solution was the
GROUP BY
-
Viuu -a over 8 yearsGROUP BY clause must match selected fields. else it will throw error like
filed2 must appear in the GROUP BY clause or be used in an aggregate function
-
signonsridhar over 7 yearsWas working nicely until I got stuck at a boolean column. MIN(Dynamic) column values get modified to false even if it was true.. Any other aggregate function available to address boolean – signonsridhar 6 mins ago. Sum(dynamic) changed false to 1
-
Garrett Simpson over 7 yearsThis does not answer the question, the OP was trying to get all the data of the table but remove rows containing duplicates of a single field
-
Garrett Simpson over 7 yearsThis does not answer the question, the OP was trying to get all the data of the table but remove rows containing duplicates of a single field
-
Garrett Simpson over 7 yearsAs mentioned in the accepted answer, would work for most incarnations of SQL -- only for MYSQL
-
Garrett Simpson over 7 yearstoo complicated and specific to one implementation of SQL
-
Garrett Simpson over 7 yearsAs mentioned in the accepted answer, would work for most incarnations of SQL -- only for MYSQL
-
Garrett Simpson over 7 yearsGreat suggestion, led me to my solution which I think is more universal -- take a look!
-
meta4 over 7 yearsAlso in Oracle (Oracle SQL Developer) you can not specify
select *, row_number() over (partition by field1 order by field2) as row_number from table
. You have to explicitly use table name/alias in select queryselect **table**.*, row_number() over (partition by field1 order by field2) as row_number from table
-
Talha over 6 yearsWhy there is
C
alias
when it can work without it? in lineFROM dbo.TABLE AS C
-
jarlh about 6 years"Answers to questions tagged with SQL should use ISO/IEC standard SQL."
-
Denis de Bernardy about 6 years@jarlh: Might be ... today. As you may notice, this answer is almost 7 years old, a point in time where that wasn't the case insofar as I can recollect from back when I was active. You're welcome to retag and/or edit the answer if you feel it's necessary.
-
Stormy about 6 yearsI believe this is due to my use of RedGate SQLPrompt. The way I have it configured, it always adds aliases - even if unnecessary. It's there "just in case"
-
Drew about 6 years@signonsridhar cast your boolean to an int and use sum; e.g.
sum(cast(COL as int)) > 0
-
Dr. House about 6 years
select distinct on field1 * from table;
This doesn't seem like correct syntax for mysql - at least no anymore. -
Shin Kim over 5 yearsIt is not different with
SELECT * FROM table;
. Even more It is slow. -
Sherif over 5 yearsPlease, try your answer first.
-
Chilianu Bogdan almost 5 years
select distinct on (field1) * from table
; works also in PostgreSQL -
Brandon Printiss almost 4 yearsThis worked for me!! It's worth noting tho, if you are using fetch_array() then you will need to call each row via an index label rather than implicitly calling the row name. There aren't enough characters in this for me to write out the example I have :X sorry!!
-
Michael Fever almost 4 yearsThis looked promising for me but it still brought back all the rows, not the distinct field1. :(
-
Michael Fever almost 4 yearsWorks for MSSQL
-
NehaK about 3 yearsCannot group on fields selected with '*'.
-
coderboi about 3 yearsi think this only works in mysql, not postgresql
-
coderboi about 3 years@ChilianuBogdan thank you, you are a life saver! This should be the answer right here. Very compact.
-
ddruganov over 2 yearsdoesnt work in mysql5.5, gives: Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column