how to combine multiple s3 files into one using Glue

11,911

Solution 1

Review the AWS Glue examples, particularly the Join and Rationalize Data in S3 example. It shows you how to use a Python script to do joins and filters with transforms.

Solution 2

If the Column names are same in the file and number of columns are also same, Glue will automatically combine them.

Make sure the files you want to combine are in same folder on s3 and your glue crawler is pointing to the folder.

Share:
11,911
prakash
Author by

prakash

Updated on June 15, 2022

Comments

  • prakash
    prakash almost 2 years

    I need some help in combining multiple files in different company partition in S3 into one file with company name in the file as one of the column.

    I am new and I am not able to find any information also I did spoke to support and they say it is not supported. But in DataStage it is a basic function to combin multiple files into one. Please throw some light Regards, Prakash

  • prakash
    prakash about 6 years
    I did try the s_history = datasource0.toDF().repartition(1)
  • prakash
    prakash about 6 years
    I did try the s_history = datasource0.toDF().repartition(1) but it did not work. As I said there will be 60 files s3 folder and I have created job with book mark enabled. This job runs fine and created 60 files in the target directory. I want to combine all files into one file. In my guess job is processing files 1 by 1 not as a set. Please look into this scenario again