AWS Glue transform a struct into dynamicframe
I don't think AWSGlue provide any mapping method for it. After some struggling, I found the transformation was relatively easy in the pyspark. Here is the pseudo code:
-
Retrieve datasource from database
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = ...)
-
Convert it into DF and transform it in spark
mapped_df = datasource0.toDF().select(explode(col("Datapoints")).alias("collection")).select("collection.*")
-
Convert back to DynamicFrame and continue the rest of ETL process
mapped_datasource0 = DynamicFrame.fromDF(mapped_df, glueContext, "mapped_datasource0");
Thanks to this reference
ryo
android Netbean swing Machine learning AI for Image Recognition, RBM,DBN,MLP Specialties Image Recognition: Back Propagation, Deep belief nets OS, Mac OS, Ubuntu, Windows
Updated on June 06, 2022Comments
-
ryo almost 2 years
I am a little new to AWSGlue. I am working on transform a raw cloudwatch json out into csv with AWSGlue. The transformation script is pretty straight forward, however documentation and example doesn't seem to be comprehensive. The data structure is something like this:
{ "Label": "RequestCount", "Datapoints": [ { "Timestamp": "2017-07-23T00:00:00Z", "Sum": 41960.0, "Unit": "Count" }, { "Timestamp": "2017-07-30T00:00:00Z", "Sum": 46065.0, "Unit": "Count" }, { "Timestamp": "2017-08-24T00:00:00Z", "Sum": 43915.0, "Unit": "Count" },
The tricky part is to transform it from single dynamic frame(lable,string, datapoint array) into dynamic frames (Timestamp,string,Sum,Double,Unit,String). I am not sure which method to use in dynamic dataframe.