How to cast a string to array of struct in HiveQL
You don't need to "cast" anything, you just need to explode the array and then unpack the struct. I added an index to your data to make it more clear where things are ending up.
Data:
idx arr_of_structs
0 [{periode:20160118-20160205,nb:1},{periode:20161130-20161130,nb:1},{periode:20161130-20161221,nb:1}]
1 [{periode:20161212-20161217,nb:0}]
Query:
SELECT idx -- index
, my_struct.periode AS periode -- unpacks periode
, my_struct.nb AS nb -- unpacks nb
FROM database.table
LATERAL VIEW EXPLODE(arr_of_structs) exptbl AS my_struct
Output:
idx periode nb
0 20160118-20160205 1
0 20161130-20161130 1
0 20161130-20161221 1
1 20161212-20161217 0
It's a bit unclear from your question what the desired result is, but as soon as you update it I'll modify the query accordingly.
EDIT:
The above solution is incorrect, I didn't catch that your input is a STRING
.
Query:
SELECT REGEXP_EXTRACT(tmp_arr[0], "([0-9]{8}-[0-9]{8})") AS periode
, REGEXP_EXTRACT(tmp_arr[1], ":([0-9]*)") AS nb
FROM (
SELECT idx
, pos
, COLLECT_SET(tmp_col) AS tmp_arr
FROM (
SELECT idx
, tmp_col
, CASE WHEN PMOD(pos, 2) = 0 THEN pos+1 ELSE pos END AS pos
FROM (
SELECT *
, ROW_NUMBER() OVER () AS idx
FROM database.table ) x
LATERAL VIEW POSEXPLODE(SPLIT(periode, ',')) exptbl AS pos, tmp_col ) y
GROUP BY idx, pos) z
Output:
periode nb
20160118-20160205 1
20161130-20161130 1
20161130-20161221 1
20161212-20161217 0
Related videos on Youtube
Sidi Mahmoud Ould Rhil
Updated on September 15, 2022Comments
-
Sidi Mahmoud Ould Rhil over 1 year
I have a hive table with the column "periode", the type of the column is string.
The column have values like the following:
[{periode:20160118-20160205,nb:1},{periode:20161130-20161130,nb:1},{periode:20161130-20161221,nb:1}] [{periode:20161212-20161217,nb:0}]
I want to cast this column in
array<struct<periode:string, nb:int>>
. The final goal is to have one raw by periode. For this I want to use lateral view with explode on the column periode. That's why I want to convert it toarray<struct<string, int>>
Thanks for help. Sidi
-
o-90 over 7 years
SPLIT(periode, "-")
makes no sense;periode
is a column name inside an array of structs. -
hlagos about 7 yearsI have a hive table with the column "periode", the type of the column is string.....he has two columns..
-
o-90 about 7 yearsNo, this won't work. The OP specifically states their data is of type
array<struct<periode:string, nb:int>>
. You cannot callregex_replace()
orsplit()
on an array. -
hlagos about 7 yearshe doesn't have an array... his data is everyrhing is one strng, you are assuming that it is in an array because the data looks like an array. He is saying that we would like cast the string to a structure... wait for more clarification if you need :)
-
hlagos about 7 yearsno problem, your solution is cleaner but much less efficient if we are talking about a lot of data, the solution without window functions doesn't reduce phase, which is an advantage if he is working with a lot of data