AWS Glue: ETL to read S3 CSV files
I believe the issue here is that you have subfolders within testing-csv folder and since you did not specify recurse to be true, Glue is not able to find the files in the 2018-09-26 subfolder (or in fact any other subfolders).
You need to add the recurse option as follows
inputGDF = glueContext.create_dynamic_frame_from_options(connection_type = "s3", connection_options = {"paths": ["s3://pinfare-glue/testing-csv"], "recurse"=True}, format = "csv")
Also, regarding your question about crawlers in the comments, they help to infer the schema of your data files. So, in your case here does nothing since you are creating the dynamicFrame directly from s3.
Jiew Meng
Web Developer & Computer Science Student Tools of Trade: PHP, Symfony MVC, Doctrine ORM, HTML, CSS, jQuery/JS Looking at Python/Google App Engine, C#/WPF/Entity Framework I hope to develop usable web applications like Wunderlist, SpringPad in the future
Updated on June 12, 2022Comments
-
Jiew Meng almost 2 years
I want to use ETL to read data from S3. Since with ETL jobs I can set DPU to hopefully speed things up.
But how do I do it? I tried
import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job ## @params: [JOB_NAME] args = getResolvedOptions(sys.argv, ['JOB_NAME']) sc = SparkContext() glueContext = GlueContext(sc) spark = glueContext.spark_session job = Job(glueContext) job.init(args['JOB_NAME'], args) inputGDF = glueContext.create_dynamic_frame_from_options(connection_type = "s3", connection_options = {"paths": ["s3://pinfare-glue/testing-csv"]}, format = "csv") outputGDF = glueContext.write_dynamic_frame.from_options(frame = inputGDF, connection_type = "s3", connection_options = {"path": "s3://pinfare-glue/testing-output"}, format = "parquet")
But it appears there is nothing written. My folder looks like:
Whats incorrect? My output S3 only has a file like:
testing_output_$folder$
-
Shawnzam over 3 years
"recurse": True}