LEFT JOIN after GROUP BY?

35,421

Solution 1

There've been some good answers so far, but I would adopt a slightly different method quite similar to what you described originally

SELECT
    songsWithTags.*,
    COALESCE(SUM(v.vote),0) AS votesUp,
    COALESCE(SUM(1-v.vote),0) AS votesDown
FROM (
    SELECT
        s.*,
        COLLATE(GROUP_CONCAT(st.id_tag),'') AS tags_ids
    FROM Songs s
    LEFT JOIN Songs_Tags st
        ON st.id_song = s.id
    GROUP BY s.id
) AS songsWithTags
LEFT JOIN Votes v
ON songsWithTags.id = v.id_song

GROUP BY songsWithTags.id DESC

In this the subquery is responsible for collating songs with tags into a 1 row per song basis. This is then joined onto Votes afterwards. I also opted to simply sum up the v.votes column as you have indicated it is 1 or 0 and therefore a SUM(v.votes) will add up 1+1+1+0+0 = 3 out of 5 are upvotes, while SUM(1-v.vote) will sum 0+0+0+1+1 = 2 out of 5 are downvotes.

If you had an index on votes with the columns (id_song,vote) then that index would be used for this so it wouldn't even hit the table. Likewise if you had an index on Songs_Tags with (id_song,id_tag) then that table wouldn't be hit by the query.

edit added solution using count

SELECT
    songsWithTags.*,
    COUNT(CASE WHEN v.vote=1 THEN 1 END) as votesUp,
    COUNT(CASE WHEN v.vote=0 THEN 1 END) as votesDown
FROM (
    SELECT
        s.*,
        COLLATE(GROUP_CONCAT(st.id_tag),'') AS tags_ids
    FROM Songs s
    LEFT JOIN Songs_Tags st
        ON st.id_song = s.id
    GROUP BY s.id
) AS songsWithTags
LEFT JOIN Votes v
ON songsWithTags.id = v.id_song

GROUP BY songsWithTags.id DESC

Solution 2

Try this:

SELECT
    s.*,
    GROUP_CONCAT(DISTINCT st.id_tag) AS tags_ids,
    COUNT(DISTINCT CASE WHEN v.vote=1 THEN id_vote ELSE NULL END) AS votesUp,
    COUNT(DISTINCT CASE WHEN v.vote=0 THEN id_vote ELSE NULL END) AS votesDown
FROM Songs s
    LEFT JOIN Songs_Tags st ON (s.id = st.id_song)
    LEFT JOIN Votes v ON (s.id=v.id_song)
GROUP BY s.id
ORDER BY id DESC

Solution 3

Your code results in a mini-Cartesian product because you are doing two Joins in 1-to-many relationships and the 1 table is on the same side of both joins.

Convert to 2 subqueries with groupings and then Join:

SELECT
    s.*,
    COALESCE(st.tags_ids, '') AS tags_ids,
    COALESCE(v.votesUp, 0)    AS votesUp,
    COALESCE(v.votesDown, 0)  AS votesDown
FROM 
        Songs AS s
    LEFT JOIN 
        ( SELECT 
              id_song,
              GROUP_CONCAT(id_tag) AS tags_ids
          FROM Songs_Tags 
          GROUP BY id_song
        ) AS st
      ON s.id = st.id_song
    LEFT JOIN 
        ( SELECT
              id_song,
              COUNT(CASE WHEN v.vote=1 THEN id_vote END) AS votesUp,
              COUNT(CASE WHEN v.vote=0 THEN id_vote END) AS votesDown
          FROM Votes 
          GROUP BY id_song
        ) AS v 
      ON s.id = v.id_song
ORDER BY s.id DESC
Share:
35,421
Lem0n
Author by

Lem0n

Updated on April 14, 2020

Comments

  • Lem0n
    Lem0n about 4 years

    I have a table of "Songs", "Songs_Tags" (relating songs with tags) and "Songs_Votes" (relating songs with boolean like/dislike).

    I need to retrieve the songs with a GROUP_CONCAT() of its tags and also the number of likes (true) and dislikes (false).

    My query is something like that:

    SELECT
        s.*,
        GROUP_CONCAT(st.id_tag) AS tags_ids,
        COUNT(CASE WHEN v.vote=1 THEN 1 ELSE NULL END) as votesUp,
        COUNT(CASE WHEN v.vote=0 THEN 1 ELSE NULL END) as votesDown,
    FROM Songs s
        LEFT JOIN Songs_Tags st ON (s.id = st.id_song)
        LEFT JOIN Votes v ON (s.id=v.id_song)
    GROUP BY s.id
    ORDER BY id DESC
    

    The problem is that when a Song has more than 1 tag, it gets returned more then once, so when I do the COUNT(), it returns more results.

    The best solution I could think is if it would be possible to do the last LEFT JOIN after the GROUP BY (so now there would be only one entry for each song). Then I'd need another GROUP BY m.id.

    Is there a way to accomplish that? Do I need to use a subquery?

  • Lem0n
    Lem0n about 12 years
    I like this solution, specially the fact that it doesn't hit the DB for tags or votes... but I'll only stick to the COUNT() instead of SUM() because semantically it makes more sense IMO (after all, I'm counting upvotes and downvotes)
  • Lem0n
    Lem0n about 12 years
    I'm not sure this is the best solution, but I like that it solves the problem with (apparently) the least ammount of changes/redesigning.
  • Lem0n
    Lem0n about 12 years
    isn't doing 3 SELECTs slower? or maybe this code will be faster when I have a lot of tags for the same song?
  • ypercubeᵀᴹ
    ypercubeᵀᴹ about 12 years
    Will you trust me if I say this is faster? Test (all queries that give correct results) with your data and distribution, your server and its settings, with various table sizes and then choose :)
  • Lem0n
    Lem0n about 12 years
    Actually, it doesn't solve it. If more people vote "like" (true), it will count as at most ONE like.
  • My Other Me
    My Other Me about 12 years
    Lem0n: It worked in my tests, but I might have slightly different data structures to you. Note that the count is of id_vote, not of 1...
  • Lem0n
    Lem0n about 12 years
    ah, it makes sense. but id_vote is actually just a boolean true/false (yeah, bad name). maybe I could COUNT the full row like (id_song, id_user, vote)?
  • Sliq
    Sliq almost 5 years
    @Lem0n 100 superfast queries can be for sure much much faster than 1 query! :)