Hive collect_list() does not collect NULL values

17,900

This function works like this, but I've found the following workaround. Add a case when statement to your query to check and keep NULLs.

SELECT col1, 
    col2, 
    collect_list(CASE WHEN col3 IS NULL THEN 'NULL' ELSE col3 END) as col3
FROM (SELECT * FROM table_1 ORDER BY col1, col2, col3)
GROUP BY col1, col2

Now, because you had a string element ('NULL') the whole result set is an array of strings. At the end just convert the array of strings to an array of double values.

Share:
17,900
lalith kkvn
Author by

lalith kkvn

Updated on June 05, 2022

Comments

  • lalith kkvn
    lalith kkvn almost 2 years

    I am trying to collect a column with NULLs along with some values in that column...But collect_list ignores the NULLs and collects only the ones with values in it. Is there a way to retrieve the NULLs along with other values ?

    SELECT col1, col2, collect_list(col3) as col3
    FROM (SELECT * FROM table_1 ORDER BY col1, col2, col3)
    GROUP BY col1, col2;
    

    Actual col3 values

    0.9
    NULL
    NULL
    0.7
    0.6 
    

    Resulting col3 values

    [0.9, 0.7, 0.6]
    

    I was hoping that there is a hive solution that looks like this [0.9, NULL, NULL, 0.7, 0.6] after applying the collect_list.

  • lalith kkvn
    lalith kkvn over 8 years
    I used a different method that does the same thing. Its still probably an array of strings which is an issue. What happens when I convert the null's to doubles ? SELECT col1, col2, collect_list(coalesce(col3, "NULL") as col3 FROM (SELECT * FROM table_1 ORDER BY col1, col2, col3) GROUP BY col1, col2
  • Zsuzsa
    Zsuzsa over 8 years
    If you convert null's to doubles the result is None (with this statement: cast(null as double)). However the collect_list is ignoring both NULL and None values.