Unable to load AWS credentials from any provider in the chain - error - when trying to load model from S3

16,615

Solution 1

You are using the wrong property names for the s3a connector.

see https://hadoop.apache.org/docs/current3/hadoop-aws/tools/hadoop-aws/#Authentication_properties

Specifically:

  • fs.s3a.access.key your access key
  • fs.s3a.secret.key your secret key

Note in particular

  1. it's lower case
  2. there are dots/periods between access and key, secret and key

The mixedCaseOptions are from the s3n connector which is obsolete and has long been deleted from the hadoop codebase. the s3a connector will simply ignore them

Solution 2

The AWS Java SDK has a credential resolution logic/chain to properly resolve the AWS credentials to use when interfacing with AWS services.

See http://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html

This error means the SDK could not find credentials in any of the places the SDK looks at. Make sure the credentials exist in at least one of the places mentioned in the above link.

As a starting point, populate environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. The AWS SDK for Java uses the EnvironmentVariableCredentialsProvider class to load these credentials.

Share:
16,615

Related videos on Youtube

yguw
Author by

yguw

Updated on June 04, 2022

Comments

  • yguw
    yguw almost 2 years

    I have an MLLib model saved in a folder on S3, say bucket-name/test-model. Now, I have a spark cluster (let's say on a single machine for now). I am running the following commands to load the model:

    pyspark --packages com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.3
    

    Then,

    sc.setSystemProperty("com.amazonaws.services.s3.enableV4", "true")
    hadoopConf = sc._jsc.hadoopConfiguration()
    hadoopConf.set("fs.s3a.awsAccessKeyId", AWS_ACCESS_KEY)
    hadoopConf.set("fs.s3a.awsSecretAccessKey", AWS_SECRET_KEY)
    hadoopConf.set("fs.s3a.endpoint", "s3.us-east-1.amazonaws.com")
    hadoopConf.set("com.amazonaws.services.s3a.enableV4", "true")
    hadoopConf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
    from pyspark.ml.classification import RandomForestClassifier, RandomForestClassificationModel
    m1 = RandomForestClassificationModel.load('s3a://test-bucket/test-model')
    

    and I get the following error:

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/user/.local/lib/python3.6/site-packages/pyspark/ml/util.py", line 362, in load
        return cls.read().load(path)
      File "/home/user/.local/lib/python3.6/site-packages/pyspark/ml/util.py", line 300, in load
        java_obj = self._jread.load(path)
      File "/home/user/.local/lib/python3.6/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
      File "/home/user/.local/lib/python3.6/site-packages/pyspark/sql/utils.py", line 63, in deco
        return f(*a, **kw)
      File "/home/user/.local/lib/python3.6/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
    py4j.protocol.Py4JJavaError: An error occurred while calling o35.load.
    : com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain
        at com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3521)
        at com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031)
        at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
        at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:258)
        at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
        at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:204)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
        at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1343)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
        at org.apache.spark.rdd.RDD.take(RDD.scala:1337)
        at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1378)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
        at org.apache.spark.rdd.RDD.first(RDD.scala:1377)
        at org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:615)
        at org.apache.spark.ml.tree.EnsembleModelReadWrite$.loadImpl(treeModels.scala:427)
        at org.apache.spark.ml.classification.RandomForestClassificationModel$RandomForestClassificationModelReader.load(RandomForestClassifier.scala:316)
        at org.apache.spark.ml.classification.RandomForestClassificationModel$RandomForestClassificationModelReader.load(RandomForestClassifier.scala:306)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)
    

    Honestly, these lines of code are taken from the web and I have no idea about storing and loading MLLib models on to S3. Any help here will be appreciated and also the next step for me is to do the same on a cluster of machines. So any heads up will also be appreciated.

    • Harsh Bafna
      Harsh Bafna over 4 years
      try adding export AWS_ACCESS_KEY_ID="XXX" export AWS_SECRET_ACCESS_KEY="YYYYY" in spark-env.sh
  • Gerardo Figueroa
    Gerardo Figueroa almost 4 years
    And what are the correct property names for the s3a connector?
  • stevel
    stevel almost 4 years
    they're the ones in the documentation I linked to. copying them here only ensures that stack overflow posts end up full of out of date facts
  • stevel
    stevel almost 2 years
    no, this doesn't work. you are setting the wrong key names. it only appears to work because spark launcher picks up those same env vars and uses them correctly. please read the hadoop s3a docs. thanks