Not able to write Spark SQL DataFrame to S3

16,183

After you get your keys, this is how to write out to s3 in scala/spark2 on s3n.

spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "[access key]")
spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "[secret key]")
spark.sparkContext.hadoopConfiguration.set("fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")

df.write
.mode("overwrite")
.parquet("s3n://bucket/folder/parquet/myFile")

This is how to do it with s3a, which is preferred.

spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", "[access key]")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", "[secret key]")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")

df.write
.mode("overwrite")
.parquet("s3a://bucket/folder/parquet/myFile")

See this post to understand the differences between s3, s3n, and s3a.

Share:
16,183
Akki
Author by

Akki

Updated on June 04, 2022

Comments

  • Akki
    Akki over 1 year

    I have installed spark 2.0 on EC2 & I am using SparkSQL using Scala to retrieve records from DB2 & I want to write to S3, where I am passing access keys to the Spark Context..Following is my code :

    val df = sqlContext.read.format("jdbc").options(Map( "url" -> , "user" -> usernmae, "password" -> password, "dbtable" -> tablename, "driver" -> "com.ibm.db2.jcc.DB2Driver")).option("query", "SELECT * from tablename limit 10").load()
    df.write.save("s3n://data-analytics/spark-db2/data.csv")
    

    And it is throwing following exception :

    org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Service Error Message. -- ResponseCode: 403, ResponseStatus: Forbidden, XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>1E77C38FA2DB34DA</RequestId><HostId>V4O9sdlbHwfXNFtoQ+Y1XYiPvIL2nTs2PIye5JBqiskMW60yDhHhnBoCHPDxLnTPFuzyKGh1gvM=</HostId></Error>
    Caused by: org.jets3t.service.S3ServiceException: Service Error Message.
      at org.jets3t.service.S3Service.putObject(S3Service.java:2358)
      at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.storeEmptyFile(Jets3tNativeFileSystemStore.java:162)
    

    What is the exact problem occurring here as I am passing the Access Keys also to Sparkcontext ?? Any other way to write to S3??