How to cast a string to array of struct in HiveQL

13,234

You don't need to "cast" anything, you just need to explode the array and then unpack the struct. I added an index to your data to make it more clear where things are ending up.

Data:

idx arr_of_structs
0   [{periode:20160118-20160205,nb:1},{periode:20161130-20161130,nb:1},{periode:20161130-20161221,nb:1}]
1   [{periode:20161212-20161217,nb:0}]

Query:

SELECT idx                          -- index
  , my_struct.periode AS periode    -- unpacks periode
  , my_struct.nb      AS nb         -- unpacks nb
FROM database.table
LATERAL VIEW EXPLODE(arr_of_structs) exptbl AS my_struct

Output:

idx     periode                 nb
0       20160118-20160205       1
0       20161130-20161130       1
0       20161130-20161221       1
1       20161212-20161217       0

It's a bit unclear from your question what the desired result is, but as soon as you update it I'll modify the query accordingly.


EDIT:

The above solution is incorrect, I didn't catch that your input is a STRING.

Query:

SELECT REGEXP_EXTRACT(tmp_arr[0], "([0-9]{8}-[0-9]{8})") AS periode
  , REGEXP_EXTRACT(tmp_arr[1], ":([0-9]*)")              AS nb
FROM (
  SELECT idx
    , pos
    , COLLECT_SET(tmp_col) AS tmp_arr
  FROM (
    SELECT idx
      , tmp_col
      , CASE WHEN PMOD(pos, 2) = 0 THEN pos+1 ELSE pos END AS pos
    FROM (
      SELECT *
        , ROW_NUMBER() OVER () AS idx
      FROM database.table ) x
    LATERAL VIEW POSEXPLODE(SPLIT(periode, ',')) exptbl AS pos, tmp_col ) y
  GROUP BY idx, pos) z

Output:

periode                 nb
20160118-20160205       1
20161130-20161130       1
20161130-20161221       1
20161212-20161217       0    
Share:
13,234

Related videos on Youtube

Sidi Mahmoud Ould Rhil
Author by

Sidi Mahmoud Ould Rhil

Updated on September 15, 2022

Comments

  • Sidi Mahmoud Ould Rhil
    Sidi Mahmoud Ould Rhil over 1 year

    I have a hive table with the column "periode", the type of the column is string.

    The column have values like the following:

    [{periode:20160118-20160205,nb:1},{periode:20161130-20161130,nb:1},{periode:20161130-20161221,nb:1}]
    [{periode:20161212-20161217,nb:0}]
    

    I want to cast this column in array<struct<periode:string, nb:int>>. The final goal is to have one raw by periode. For this I want to use lateral view with explode on the column periode. That's why I want to convert it to array<struct<string, int>>

    Thanks for help. Sidi

  • o-90
    o-90 over 7 years
    SPLIT(periode, "-") makes no sense; periode is a column name inside an array of structs.
  • hlagos
    hlagos about 7 years
    I have a hive table with the column "periode", the type of the column is string.....he has two columns..
  • o-90
    o-90 about 7 years
    No, this won't work. The OP specifically states their data is of type array<struct<periode:string, nb:int>>. You cannot call regex_replace() or split() on an array.
  • hlagos
    hlagos about 7 years
    he doesn't have an array... his data is everyrhing is one strng, you are assuming that it is in an array because the data looks like an array. He is saying that we would like cast the string to a structure... wait for more clarification if you need :)
  • hlagos
    hlagos about 7 years
    no problem, your solution is cleaner but much less efficient if we are talking about a lot of data, the solution without window functions doesn't reduce phase, which is an advantage if he is working with a lot of data