How to use actual row count (COUNT(*)) in WHERE clause without writing the same query as subquery?

mysql sql sqlite count

26,301

Solution 1

In MySQL, you can only do what you tried:

SELECT id, fruit, pip 
FROM   plant 
WHERE  (
        SELECT COUNT(*) 
        FROM   plant
       ) = 2;

or this variation:

SELECT id, fruit, pip 
FROM   plant 
  JOIN
      (
        SELECT COUNT(*) AS cnt 
        FROM   plant
       ) AS c
    ON c.cnt = 2;

Whether the 1st or the 2nd is more efficient, depends on the version of MySQL (and the optimizer). I would bet on the 2nd one, on most versions.

In other DBMSs, that have window functions, you can also do the first query that @Andomar suggests.

Here is a suggestion to avoid the bottleneck of calculating the derived table twice, once to get the rows and once more to get the count. If the derived table is expensive to be calculated, and its rows are thousands or millions, calculating them twice only to throw them away, is a problem, indeed. This may improve efficiency as it will limit the intermediately (twice) calculated rows to 3:

SELECT  p.*
FROM
    ( SELECT id, fruit, pip 
      FROM   plant 
      LIMIT 3
    ) AS p
  JOIN
    ( SELECT COUNT(*) AS cnt
      FROM   
        ( SELECT 1 
          FROM   plant 
          LIMIT 3
        ) AS tmp
    ) AS c
    ON c.cnt = 2 ;

Solution 2

After re-reading your question, you're trying to return rows only if there are 2 rows in the entire table. In that case I think your own example query is already the best.

On another DBMS, you could use a Windowing function:

select  *
from    (
        select  *
        ,       count(*) over () as cnt
        from    plant
        ) as SubQueryAlias
where   cnt = 2

But the over clause is not supported on MySQL.

old wrong anser below

The where clause works before grouping. It works on single rows, not groups of rows, so you can't use aggregates like count or max in the where clause.

To set filters that work on groups of rows, use the having clause. It works after grouping and can be used to filter with aggregates:

SELECT id, fruit, pip 
FROM   plant 
GROUP BY
       id, fruit, pip 
HAVING COUNT(*) = 2;

Solution 3

The other answers do not fulfill the original question which was to filter the results "without using a subquery".

You can actually do this by using a variable in 2 consecutive MySQL statements:

SET @count=0;

SELECT * FROM
(
    SELECT id, fruit, pip, @count:=@count+1 AS count
    FROM   plant 
    WHERE
) tmp
WHERE @count = 2;

26,301

Author by

nawfal

You should accept the answer if it helped you by selecting the green tick mark on the top left of the answer. That's the encouraged practice at SO. Not only it helps future visitors, it encourages users to answer your questions. You should post what you have tried and tell us where you're stuck. That gives us more detail, and we can quickly give an answer by copy-pasting the code with minor modifications. Questions like I-need-this,-give-me-code doesnt work in SO. In the current format the question will be closed. You may read http://stackoverflow.com/faq additionally before posting. Do a good search on SO to ensure you're not asking a duplicated question, or else your question will be closed. Quick tag search

Updated on July 29, 2022

Comments

nawfal almost 2 years

I have something like this:

SELECT id, fruit, pip 
FROM   plant 
WHERE  COUNT(*) = 2;

This weird query is self explanatory I guess. COUNT(*) here means the number of rows in plant table. My requirement is that I need to retrieve values from specified fields only if total number of rows in table = 2. This doesn't work but: invalid use of aggregate function COUNT.

I cannot do this:

SELECT COUNT(*) as cnt, id, fruit, pip 
FROM   plant 
WHERE  cnt = 2;

for one, it limits the number of rows outputted to 1, and two, it gives the same error: invalid use of aggregate function.

What I can do is instead:

SELECT id, fruit, pip 
FROM   plant 
WHERE  (
        SELECT COUNT(*) 
        FROM   plant
       ) = 2;

But then that subquery is the main query re-run. I'm presenting here a small example of the larger part of the problem, though I know an additional COUNT(*) subquery in the given example isn't that big an overhead.

Edit: I do not know why the question is downvoted. The COUNT(*) I'm trying to get is from a view (a temporary table) in the query which is a large query with 5 to 6 joins and additional where clauses. To re-run the query as a subquery to get the count is inefficient, and I can see the bottleneck as well.

Here is the actual query:

SELECT U.UserName, E.Title, AE.Mode, AE.AttemptNo, 
   IF(AE.Completed = 1, 'Completed', 'Incomplete'), 
   (
    SELECT COUNT(DISTINCT(FK_QId)) 
    FROM   attempt_question AS AQ
    WHERE  FK_ExcAttemptId = @excAttemptId
   ) AS Inst_Count, 
   (
    SELECT    COUNT(DISTINCT(AQ.FK_QId)) 
    FROM      attempt_question AS AQ
    JOIN      `question` AS Q 
          ON  Q.PK_Id = AQ.FK_QId                      
    LEFT JOIN actions AS A                             
           ON A.FK_QId = AQ.FK_QId
    WHERE     AQ.FK_ExcAttemptId = @excAttemptId
         AND (
                  Q.Type = @descQtn 
              OR  Q.Type = @actQtn 
              AND A.type = 'CTVI.NotImplemented'       
              AND A.IsDelete = @status                 
              AND (
                    SELECT COUNT(*) 
                    FROM   actions 
                    WHERE  FK_QId = A.FK_QId 
                       AND type != 'CTVI.NotImplemented'
                       AND IsDelete = @status
                   ) = 0
             )
   ) AS NotEvalInst_Count,  
   (
    SELECT COUNT(DISTINCT(FK_QId)) 
    FROM   attempt_question AS AQ
    WHERE  FK_ExcAttemptId = @excAttemptId 
       AND Mark = @mark
   ) AS CorrectAns_Count, 
   E.AllottedTime, AE.TimeTaken
FROM   attempt_exercise AS AE
JOIN   ctvi_exercise_tblexercise AS E 
    ON AE.FK_EId = E.PK_EId
JOIN   ctvi_user_table AS U 
    ON AE.FK_UId = U.PK_Id
JOIN   ctvi_grade AS G 
    ON AE.FK_GId = G.PK_GId
WHERE  AE.PK_Id = @excAttemptId
-- AND COUNT(AE.*) = @number --the portion in contention.

Kindly ignore the above query and guide me to right direction from the small example query I posted, thanks.

nawfal over 11 years

Ando, yes and hence I canceled my vote. let me see again to your edit
nawfal over 11 years

As I mentioned, making it a subquery is so inelegant and could be inefficient with 5 to 6 joins and where clauses already running on the main query.
Andomar over 11 years

In MYSQL, if you use count without a group by, MySQL groups on the empty set. It will return only one row. So AFAIK, you can't get at the total rowcount and more than one row in the same query.
Bohemian over 11 years

+1 for showing compassion in wading through the copious and largely irrelevant content in the question
ypercubeᵀᴹ over 11 years

The second query with HAVING COUNT(*) = 2 is not going to return expected results. If id is the primary key, all COUNT(*) values will be 1, whatever the number of rows in the table.
nawfal over 11 years

@Bohemian I was forced to add irrelevant content when I got a negative vote to make things clearer for a few. My initial question was compact and concise I believe :)
nawfal over 11 years

what's the role of limit 3 here?
ypercubeᵀᴹ over 11 years

To avoid retrieving twice the thousand (or millions) rows of the derived table. It doesn't matter to you if they are 3 or 3000 or 1000000, right? You don't want them if they are 3 or more.
nawfal over 11 years

Haha, that's seems to be so innovative, I get it :)
ypercubeᵀᴹ over 11 years

@nawfal: Can you share details about the timings of the different queries?
nawfal over 11 years

I did some testing with few thousand records. They all performed more or less the same! If you want pinpoint accuracy after a number of testing, here is it: The last one you posted (with limits) performed marginally well with each query taking around 300ms. The first and second code blocks were ditto identical, but for a hair-breadth win sql1 beat sql2 by a score of 305 ms to 308 ms!!
nawfal over 8 years

This I believe will be inefficient since it will scan through the entire rows and at last check the @count against 2. Am I right?
Stack Man over 8 years

I'm not sure about the efficiency costs but I think the best approach is to only be concerned with efficiency when you actually encounter bottlenecks. Until then, in my opinion, readability and maintainability has higher priority and having similar subqueries creates unnecessary complexity.