Finding distinct values of non Primary Key column in CQL Cassandra

17,893

Solution 1

User - Undefined_variable - makes two good points:

  • In Cassandra, you need to build your data model to match your query patterns. This sometimes means duplicating your data into additional tables, to attain the desired level of query flexibility.
  • DISTINCT only works on partition keys.

So, one way to get this to work, would be to build a specific table to support that query:

CREATE TABLE users_by_lname (
    lname text,
    fname text,
    user_id int,
    PRIMARY KEY (lname, fname, user_id)
);

Now after I run your INSERTs to this new query table, this works:

aploetz@cqlsh:stackoverflow> SELECT DISTINCT lname FROm users_by_lname ;

 lname
-------
 smith
   doe

(2 rows)

Notes: In this table, all rows with the same partition key (lname) will be sorted by fname, as fname is a clustering key. I added user_id as an additional clustering key, just to ensure uniqueness.

Solution 2

There is no such functionality in cassandra. DISTINCT is possible on partition key only. You should Design Your data model based on your requirements. You have to process the data in application logic (spark may be useful)

Share:
17,893
Avi
Author by

Avi

Updated on June 14, 2022

Comments

  • Avi
    Avi almost 2 years

    I use the following code for creating table:

    CREATE KEYSPACE mykeyspace
    WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
    USE mykeyspace;
    CREATE TABLE users (
      user_id int PRIMARY KEY,
      fname text,
      lname text
    );
    INSERT INTO users (user_id,  fname, lname)
      VALUES (1745, 'john', 'smith');
    INSERT INTO users (user_id,  fname, lname)
      VALUES (1744, 'john', 'doe');
    INSERT INTO users (user_id,  fname, lname)
      VALUES (1746, 'john', 'smith');
    

    I would like to find the distinct value of lname column (that is not a PRIMARY KEY). I would like to get the following result:

     lname
    -------
     smith
    

    By using SELECT DISTINCT lname FROM users; However since lname is not a PRIMARY KEY I get the following error:

    InvalidRequest: code=2200 [Invalid query] message="SELECT DISTINCT queries must
    only request partition key columns and/or static columns (not lname)"
    cqlsh:mykeyspace> SELECT DISTINCT lname FROM users;
    

    How can I get the distinct values from lname?

  • Avi
    Avi about 8 years
    Thanks. I do use SPARK. Can you please elaborate how can it be done? I think distinct is something very common. Isn't there a ready code in SPARK (Scala) I can use for it?
  • undefined_variable
    undefined_variable about 8 years
    Hi don't use scala but here are few links that may help stackoverflow.com/questions/30959955/… AND stackoverflow.com/questions/24312113/…