Unable to load AWS credentials from any provider in the chain - error - when trying to load model from S3
Solution 1
You are using the wrong property names for the s3a connector.
see https://hadoop.apache.org/docs/current3/hadoop-aws/tools/hadoop-aws/#Authentication_properties
Specifically:
-
fs.s3a.access.key
your access key -
fs.s3a.secret.key
your secret key
Note in particular
- it's lower case
- there are dots/periods between access and key, secret and key
The mixedCaseOptions are from the s3n connector which is obsolete and has long been deleted from the hadoop codebase. the s3a connector will simply ignore them
Solution 2
The AWS Java SDK has a credential resolution logic/chain to properly resolve the AWS credentials to use when interfacing with AWS services.
See http://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html
This error means the SDK could not find credentials in any of the places the SDK looks at. Make sure the credentials exist in at least one of the places mentioned in the above link.
As a starting point, populate environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. The AWS SDK for Java uses the EnvironmentVariableCredentialsProvider class to load these credentials.
Related videos on Youtube
yguw
Updated on June 04, 2022Comments
-
yguw almost 2 years
I have an MLLib model saved in a folder on S3, say bucket-name/test-model. Now, I have a spark cluster (let's say on a single machine for now). I am running the following commands to load the model:
pyspark --packages com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.3
Then,
sc.setSystemProperty("com.amazonaws.services.s3.enableV4", "true") hadoopConf = sc._jsc.hadoopConfiguration() hadoopConf.set("fs.s3a.awsAccessKeyId", AWS_ACCESS_KEY) hadoopConf.set("fs.s3a.awsSecretAccessKey", AWS_SECRET_KEY) hadoopConf.set("fs.s3a.endpoint", "s3.us-east-1.amazonaws.com") hadoopConf.set("com.amazonaws.services.s3a.enableV4", "true") hadoopConf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") from pyspark.ml.classification import RandomForestClassifier, RandomForestClassificationModel m1 = RandomForestClassificationModel.load('s3a://test-bucket/test-model')
and I get the following error:
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/user/.local/lib/python3.6/site-packages/pyspark/ml/util.py", line 362, in load return cls.read().load(path) File "/home/user/.local/lib/python3.6/site-packages/pyspark/ml/util.py", line 300, in load java_obj = self._jread.load(path) File "/home/user/.local/lib/python3.6/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__ File "/home/user/.local/lib/python3.6/site-packages/pyspark/sql/utils.py", line 63, in deco return f(*a, **kw) File "/home/user/.local/lib/python3.6/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o35.load. : com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain at com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3521) at com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031) at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994) at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:258) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1343) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:363) at org.apache.spark.rdd.RDD.take(RDD.scala:1337) at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1378) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:363) at org.apache.spark.rdd.RDD.first(RDD.scala:1377) at org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:615) at org.apache.spark.ml.tree.EnsembleModelReadWrite$.loadImpl(treeModels.scala:427) at org.apache.spark.ml.classification.RandomForestClassificationModel$RandomForestClassificationModelReader.load(RandomForestClassifier.scala:316) at org.apache.spark.ml.classification.RandomForestClassificationModel$RandomForestClassificationModelReader.load(RandomForestClassifier.scala:306) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748)
Honestly, these lines of code are taken from the web and I have no idea about storing and loading MLLib models on to S3. Any help here will be appreciated and also the next step for me is to do the same on a cluster of machines. So any heads up will also be appreciated.
-
Harsh Bafna over 4 yearstry adding export AWS_ACCESS_KEY_ID="XXX" export AWS_SECRET_ACCESS_KEY="YYYYY" in spark-env.sh
-
-
Gerardo Figueroa almost 4 yearsAnd what are the correct property names for the s3a connector?
-
stevel almost 4 yearsthey're the ones in the documentation I linked to. copying them here only ensures that stack overflow posts end up full of out of date facts
-
stevel almost 2 yearsno, this doesn't work. you are setting the wrong key names. it only appears to work because spark launcher picks up those same env vars and uses them correctly. please read the hadoop s3a docs. thanks