Hive collect_list() does not collect NULL values
17,900
This function works like this, but I've found the following workaround. Add a case when statement to your query to check and keep NULLs.
SELECT col1,
col2,
collect_list(CASE WHEN col3 IS NULL THEN 'NULL' ELSE col3 END) as col3
FROM (SELECT * FROM table_1 ORDER BY col1, col2, col3)
GROUP BY col1, col2
Now, because you had a string element ('NULL') the whole result set is an array of strings. At the end just convert the array of strings to an array of double values.
Author by
lalith kkvn
Updated on June 05, 2022Comments
-
lalith kkvn almost 2 years
I am trying to collect a column with
NULL
s along with some values in that column...Butcollect_list
ignores theNULL
s and collects only the ones with values in it. Is there a way to retrieve theNULL
s along with other values ?SELECT col1, col2, collect_list(col3) as col3 FROM (SELECT * FROM table_1 ORDER BY col1, col2, col3) GROUP BY col1, col2;
Actual col3 values
0.9 NULL NULL 0.7 0.6
Resulting col3 values
[0.9, 0.7, 0.6]
I was hoping that there is a hive solution that looks like this
[0.9, NULL, NULL, 0.7, 0.6]
after applying the collect_list. -
lalith kkvn over 8 yearsI used a different method that does the same thing. Its still probably an array of strings which is an issue. What happens when I convert the null's to doubles ?
SELECT col1, col2, collect_list(coalesce(col3, "NULL") as col3 FROM (SELECT * FROM table_1 ORDER BY col1, col2, col3) GROUP BY col1, col2
-
Zsuzsa over 8 yearsIf you convert null's to doubles the result is None (with this statement: cast(null as double)). However the collect_list is ignoring both NULL and None values.