SQLITE: merging rows into single row if they share a column

17,614

Solution 1

Use the aggregate function group_concat(X) for that:

SELECT (a.LastName || " " || a.FirstName) AS AttendeeName
     , a.PhotoURI
     , group_concat(c.CompanyName) AS Companies
     , group_concat(c.CompanyId)   AS CompanyIds
FROM   Attendee AS a
JOIN   CompanyAttendeeRelation AS ca ON ca.AttendeeId = a.AttendeeId
JOIN   Company                 AS c  ON c.CompanyId = ca.CompanyId
GROUP  BY a.LastName, a.Firstname, a.PhotoURI;

(Using table aliases to make it shorter and easier to read.)

NULL values are excluded from the result. The manual:

the concatenation of all non-NULL values

The order of elements in CompanyIds and Companies is arbitrary, according to the manual:

The order of the concatenated elements is arbitrary.

Also note that "arbitrary" is not the same as "random". group_concat, like other aggregate functions, processes the set of rows in the order received. Without any ORDER BY, that order is dictated by whatever query plan is executed. There is no natural order in tables of relational databases (you cannot rely on insert order at all). But both instances of group_concat() in the same SELECT list process rows in the same order so that the 1st ID in CompanyIds corresponds to the 1st name in Companies.

You can impose your order with ORDER BY in a subquery. It's an implementation detail, but it's highly unlikely to change. Like:

SELECT (LastName || " " || FirstName) AS AttendeeName
     , PhotoURI
     , group_concat(CompanyName) AS Companies
     , group_concat(CompanyId)   AS CompanyIds
FROM  (
   SELECT a.LastName, a.FirstName, a.PhotoURI, c.CompanyName, c.CompanyId
   FROM   Attendee AS a
   JOIN   CompanyAttendeeRelation AS ca ON ca.AttendeeId = a.AttendeeId
   JOIN   Company                 AS c  ON c.CompanyId = ca.CompanyId
   ORDER  BY 1,2,3,4,5  -- or whatever you need
   ) AS sub
GROUP  BY LastName, Firstname, PhotoURI;

The manual about the (optional) ordinal numbers in ORDER BY:

If the ORDER BY expression is a constant integer K then the expression is considered an alias for the K-th column of the result set (columns are numbered from left to right starting with 1).

Use the GROUP BY list as leading ORDER BY expressions for best results.

Don't do anything with the derived table after ordering that might rearrange it (like joining the subquery to another table etc.)

Finally, note that similar aggregate functions in other RDBMS can behave slightly differently. Related:

Solution 2

The answer from this post will help you turn

Name     | company
---------+----------
Doe John | company A
Doe John | company B

into

Name     | company-1 | company-2
---------+-----------+----------
Doe John | company A | company B

Solution 3

I'm thinking a inner select might help, like:

CREATE View AttendeeTableView AS

SELECT  (LastName || " " || FirstName) as AttendeeName,  

(
  select CompanyName
FROM    Attendee A_innner 
JOIN    CompanyAttendeeRelation CAR  /* is this where company name is? */  
ON      on CAR.AttendeeId = A.AttendeeId /* if not remove the joins and CAR */
WHERE   A_inner.last_name = A_outer.last_name and
        A_inner.first_name = A_outer.first_name
),
PhotoURI,
CAR.CompanyId,
CAR.AttendeeId 


FROM    Attendee A_outer 
JOIN    CompanyAttendeeRelation CAR_outer  
ON      on CAR_outer.AttendeeId = A_outer.AttendeeId 

GROUP by LastName,FirstName
ORDER BY LastName, FirstName;
Share:
17,614
SaldaVonSchwartz
Author by

SaldaVonSchwartz

Updated on June 22, 2022

Comments

  • SaldaVonSchwartz
    SaldaVonSchwartz almost 2 years

    From a previous post, I have the following view in sqlite3:

    CREATE View AttendeeTableView AS
    
    SELECT  (LastName || " " || FirstName) as AttendeeName,  
            CompanyName, 
            PhotoURI,
            CompanyAttendeeRelation.CompanyId,
            CompanyAttendeeRelation.AttendeeId 
    
    FROM    Attendee 
    JOIN    CompanyAttendeeRelation on CompanyAttendeeRelation.AttendeeId = Attendee.AttendeeId 
    
    ORDER BY LastName;
    

    Now, since the data is generated from a many-to-many relation between Attendee and Company, I can get results such as:

    Doe John | company A | johnPic.png | 1 | 15
    Doe John | company B | johnPic.png | 2 | 15
    

    What I'd like to do is, in cases where there's more than one company (like above), create a query that outputs:

    Doe John | company A company B | johnPic.png | 1 2 | 15
    

    And another that outputs:

    Doe John | company A | company B | johnPic.png | 1 | 2 | 15
    

    So I need to know essentially how to merge a specific column for rows that have different values in that table.

    Any ideas?

    Just in case, company A company B in the first query is obviously text concatenation, That is, something along the lines of (row1.CompanyName || " " || row2.CompanyName)

  • Jason
    Jason about 5 years
    The manual said The order of the concatenated elements is arbitrary. Does that mean that the concatenated CompanyIds doesn't necessarily match in order the Companies?
  • Erwin Brandstetter
    Erwin Brandstetter about 5 years
    @Jason: I added more to address that (among other things).