SQL: tuple comparison

21,352

Solution 1

Just do:

SELECT colA
     , colB
     , colC
FROM mytable
WHERE ( ('A',  'B',  'C') <= (colA, colB, colC ) )
  AND ( (colA, colB, colC) <= ('D',  'E',  'F' ) )
ORDER BY colA, colB, colC
LIMIT 1
;

It works just fine. And I suspect is should be pretty fast, too.


This is equivalent but it may have better performance, depending on your tables:

SELECT m.colA
     , m.colB
     , m.colC
FROM mytable m
WHERE ( ('A',  'B',  'C') <= (m.colA, m.colB, m.colC) )
  AND ( (m.colA, m.colB, m.colC) <= ('D',  'E',  'F') )
  AND NOT EXISTS
  ( SELECT 1
    FROM mytable b
    WHERE (b.colA, b.colB, b.colC) < (m. colA, m.colB, m.colC)
      AND ( ('A',  'B',  'C') <= (b.colA, b.colB, b.colC) )
  );

Solution 2

---EDIT---: (Previous wrong trials removed)

2nd try (not really relational algebra).

This works but only when the fields are char(1):

SELECT colA, colB, colC
FROM mytable
WHERE CONCAT(colA, colB, colC)
      BETWEEN CONCAT('A', 'B', 'C')
      AND CONCAT('D', 'E', 'F')
ORDER BY colA, colB, colC
LIMIT 1 ; 

I thought that a view that shows all combinations of tuples from mytable that are less than or equal to tuples of the same table might be helpful, as it can be used for other comparisons:

CREATE VIEW lessORequal AS
( SELECT a.colA AS smallA
       , a.colB AS smallB
       , a.colC AS smallC
       , b.colA AS largeA
       , b.colB AS largeB
       , b.colC AS largeC
  FROM mytable a
    JOIN mytable b
      ON (a.colA < b.colA)
         OR ( (a.colA = b.colA)
               AND ( (a.colB < b.colB)
                     OR (a.colB = b.colB
                        AND a.colC <= b.colC)
                   )
            )
  ) ;

Using similar technique, this solves the question. It works with any kind of fields (int, float, char of any length). It's going to be kind of awkard and complicated though if one tries to add more fields.

SELECT colA, colB, colC
FROM mytable m
WHERE ( ('A' < colA)
        OR ( ('A' = colA)
              AND ( ('B' < colB)
                    OR ('B' = colB
                       AND 'C' <= colC)
                  )
           )
      )
  AND ( (colA < 'D')
         OR ( (colA = 'D')
              AND ( (colB < 'E')
                    OR (colB = 'E'
                       AND colC <= 'F')
                  )
            )
      )
ORDER BY colA, colB, colC
LIMIT 1 ; 

One also define a function:

CREATE FUNCTION IslessORequalThan( lowA CHAR(1)
                                 , lowB CHAR(1)
                                 , lowC CHAR(1)
                                 , highA CHAR(1)
                                 , highB CHAR(1)
                                 , highC CHAR(1)
                                 )
RETURNS boolean
RETURN ( (lowA < highA)
         OR ( (lowA = highA)
               AND ( (lowB < highB)
                     OR ( (lowB = highB)
                          AND (lowC <= highC)
                        )
                   )
            )
       );

and use it to solve the same or similar problems. This solves the question again. The query is elegant but a new function has to be created if the type or number of fields is changed.

SELECT colA
     , colB
     , colC
FROM mytable 
WHERE IslessORequalThan(  'A',  'B',  'C', colA, colB, colC )
  AND IslessORequalThan( colA, colB, colC,  'D',  'E',  'F' )
ORDER BY colA, colB, colC
LIMIT 1;

Until then, and because the condition

(colA, colB, colC) BETWEEN ('A', 'B', 'C') AND ('D', 'E', 'F')

was not allowed in MySQL, I thought that

('A', 'B', 'C') <= (colA, colB, colC)

was not allowed as well. But I was wrong.

Share:
21,352
bukzor
Author by

bukzor

Updated on April 13, 2020

Comments

  • bukzor
    bukzor about 4 years

    In my current application, I need to be able to do this type of query:

    SELECT MIN((colA, colB, colC)) 
    FROM mytable
    WHERE (colA, colB, colC) BETWEEN (200, 'B', 'C') AND (1000, 'E', 'F')
    

    and get the answer of (333, 'B', 'B'), given this data:

    +------+------+------+
    | colA | colB | colC |
    +------+------+------+
    |   99 | A    | A    |
    |  200 | A    | Z    |
    |  200 | B    | B    |
    |  333 | B    | B    |
    |  333 | C    | D    |
    |  333 | C    | E    |
    |  333 | D    | C    |
    | 1000 | E    | G    |
    | 1000 | F    | A    |
    +------+------+------+
    

    What is the most efficient way to accomplish this in real SQL? Please keep in mind that this is a toy example, and that my actual application has tables with varying columns and data types, and hundreds of million of rows. I use MySQL, if that helps. You can also assume that these columns have a PRIMARY or UNIQUE index on them.

    If the solution is easily extensible to more/less columns, that's even better.


    Tuple Comparison:

    Several have asked so I should put this in the question. Tuples are ordered lexicographically, meaning that the sequences are ordered the same as their first differing elements. For example, (1,2,x) < (1,2,y) returns the same as x < y.

    It's worth noting that SQL (or at least mysql) implements this correctly:

    mysql> select (200, 'B', 'C') < (333, 'B', 'B') and (333, 'B', 'B') < (1000, 'E', 'F');
    +--------------------------------------------------------------------------+
    | (200, 'B', 'C') < (333, 'B', 'B') and (333, 'B', 'B') < (1000, 'E', 'F') |
    +--------------------------------------------------------------------------+
    |                                                                        1 |
    +--------------------------------------------------------------------------+
    1 row in set (0.00 sec)
    

    Here's the necessary SQL to create the example:

    create table mytable select 333 colA, 'B' colB, 'B' colC;
    insert into mytable values (200, 'B', 'B'), (333, 'C', 'D'), (1000, 'E', 'G'), 
        (200, 'A', 'Z'), (1000, 'F', 'A'), (333, 'C', 'E'), (333, 'D', 'C'),
        (99, 'A', 'A');
    alter table mytable add unique index myindex (colA, colB, colC);
    

    Adding this index seems to cause the table to be sorted lexicographically, which is interesting. This isn't true in our production system.

  • ypercubeᵀᴹ
    ypercubeᵀᴹ about 13 years
    Count all rows that satisfy the same criteria?
  • bukzor
    bukzor about 13 years
    This answer works, but in my experience, OR's are a performance disaster. Do you think there's a better way?
  • ypercubeᵀᴹ
    ypercubeᵀᴹ about 13 years
    I think you should test it to find out if it's disaster or not. I can't think of anything else at the moment.
  • ypercubeᵀᴹ
    ypercubeᵀᴹ about 13 years
    Oh boy, I justtried something in MySQL and it works. I'll post a new answer!
  • bukzor
    bukzor about 13 years
    perfect! You should remove your other answer.
  • Cade Roux
    Cade Roux about 13 years
    If <= works on tuples, it seems like BETWEEN should work, because it's supposed to simply be equivalent to that operation.
  • ypercubeᵀᴹ
    ypercubeᵀᴹ about 13 years
    @Cade: my point exactly, "since BETWEEN did not work, I hadn't tried <= at first".