List files on S3

12,768

Solution 1

Using the library here:

https://github.com/Rhinofly/play-s3

You should be able to do something like this:

import concurrent.ExecutionContext.Implicits._

val bucket = S3("bucketName")
val result = bucket.list
result.map {
  case Left(error) => throw new Exception("Error: " + x)
  case Right(list) => 
    list.foreach {
        case BucketItem(name, isVirtual) => //...
    }
}

You'll have to tweak this a bit in regards to your credentials, but the examples show how to do that.

Solution 2

With Scala you might now want to use Amazon's official SDK for Java which provides the AmazonS3::listObjects method:

import scala.collection.JavaConverters._
import com.amazonaws.services.s3.model.ObjectListing

def keys(bucket: String): List[String] = nextBatch(s3Client.listObjects(bucket))

private def nextBatch(listing: ObjectListing, keys: List[String] = Nil): List[String] = {

  val pageKeys = listing.getObjectSummaries.asScala.map(_.getKey).toList

  if (listing.isTruncated)
    nextBatch(s3Client.listNextBatchOfObjects(listing), pageKeys ::: keys)
  else
    pageKeys ::: keys
}

Note the recursion on ObjectListing objects:

Since the listing of keys in a bucket is done by batch (using a pagination system as documented here), only up to the first 1000 keys would be returned by s3Client.listObjects(bucket).getObjectSummaries.asScala.map(_.getKey).

Thus the recursive call in order to get all keys in a bucket by asking for the next page of keys while ObjectListing::isTruncated is true.

Beware of memory issues if your bucket is huge though.


s3Client can be built as such:

import com.amazonaws.services.s3.{AmazonS3, AmazonS3ClientBuilder}
import com.amazonaws.auth.{AWSStaticCredentialsProvider, BasicAWSCredentials}

val credentials = new BasicAWSCredentials(awsKey, awsAccessKey)
val s3Client: AmazonS3 = AmazonS3ClientBuilder.standard().withCredentials(new AWSStaticCredentialsProvider(credentials)).build()

with these requirements in build.sbt and the latest version:

libraryDependencies ++= Seq(
  "com.amazonaws" % "aws-java-sdk-bom" % "1.11.391",
  "com.amazonaws" % "aws-java-sdk-s3"  % "1.11.391"
)

Solution 3

def listS3Files() = Action {
Await.result(S3("bucketName").list, 15 seconds).fold(
{ error => {
  Logger.error("Error")
  Status(INTERNAL_SERVER_ERROR)
}},
  success => {
    Ok(success.seq.toString())
  }
 )
}

Here's my working solution. Thanks to @cmbaxter

Share:
12,768
malmling
Author by

malmling

Updated on June 12, 2022

Comments

  • malmling
    malmling almost 2 years

    I'm getting frustrated by not finding any good explanation on how to list all files in a S3 bucket.

    I have this bucket with about 20 images on. All I want to do is to list them. Someone says "just use the S3.list-method". But without any special library there is no S3.list-method. I have a S3.get-method, which I dont get to work. Arggh, would appreciate if someone told me how to simply get an list of all files(filenames) from an S3 bucket.

    val S3files = S3.get(bucketName: String, path: Option[String], prefix: Option[String], delimiter: Option[String])
    

    returns an Future[Response]

    I dont know how to use this S3.get. What would be the easiest way to list all files in my S3 bucket?

    Answers much appreciated!