Hive : casting array<string> to array<int> in query

16,646

Solution 1

Re-assemble array using explode() and collect_list().

Initial String array example:

hive> select array('1','2','3') string_array;
OK
string_array
["1","2","3"]
Time taken: 1.109 seconds, Fetched: 1 row(s)

Convert array:

hive> select collect_list(cast(array_element as int)) int_array --cast and collect array
       from( select explode(string_array) array_element         --explode array
               from (select array('1','2','3') string_array     --initial array
                    )s 
           )s;

Result:

OK
int_array
[1,2,3]
Time taken: 44.668 seconds, Fetched: 1 row(s)

And if you want to add more columns in your insert+select query then use lateral view [outer]:

select col1, col2, collect_list(cast(array_element as int)) int_array
 from
(
select col1, col2 , array_element         
  from table
       lateral view outer explode(string_array) a as array_element         
)s
group by col1, col2
;

Solution 2

Brickhouse jar will do this a lot faster than casting them and collecting it back as a list . Add this jar to a hdfs location.
Use the link below to download the brick house jar

add jar hdfs://hadoop-/pathtojar/brickhouse-0.7.1.jar;   
create temporary function cast_array as 'brickhouse.udf.collect.CastArrayUDF';   
select cast_array(columns, 'int') AS columname from table;  
select cast_array(columns, 'string') AS columname from table
Share:
16,646
Pierre Galland
Author by

Pierre Galland

Updated on June 12, 2022

Comments

  • Pierre Galland
    Pierre Galland almost 2 years

    I have two tables :

    create table a (
    `1` array<string>);
    
    create table b (
    `1` array<int>);
    

    and I want to put the table a in table b (table b is empty) :

    insert into table b
    select * from a;
    

    when doing so I get the following error :

    FAILED: SemanticException [Error 10044]: Line 1:18 Cannot insert into
    target table because column number/types are different 'b': Cannot
    convert column 0 from array<string> to array<int>.
    

    whereas I would not get this error if the fields were only of types string and int.

    Is there a way to do the cast with arrays ?

  • dolphinZhang
    dolphinZhang over 5 years
    Thank you! I'll try it!
  • dolphinZhang
    dolphinZhang over 5 years
    `select col1, col2, collect_list(cast(array_element as int)) int_array from ( select col1, col2 , array_element from table lateral view outer explode(string_array) a as array_element )s group by col1, col2 ;``` solve my problem
  • ningyuwhut
    ningyuwhut over 3 years
    select col1, col2, collect_list(cast(array_element as int)) int_array from ( select col1, col2 , array_element from table lateral view outer explode(string_array) a as array_element )s group by col1, col2 works. but I have a question that how the array element is kept the same order with the original array?
  • leftjoin
    leftjoin over 3 years
    @ningyuwhut In this example the order is not preserved (not guarantied). To have the same order use posexplode instead of explode to get position and use distribute by col1, col2 sort by position in the inner subquery. , like in this answer: stackoverflow.com/a/61936243/2700344