Check if file exists in S3 Bucket

45,039

Solution 1

I was able to do it using rclone[1] as @derobert has suggested.

The command is very simple:

rclone check sourcepath remote:s3bucketname

Example:

Let's imagine you want to check if the S3 bucket (bucket name: tmp_data_test_bucket) has all the files that this directory has: /tmp/data/

Command:

rclone check /tmp/data/ remote:tmp_data_test_bucket

[1] http://rclone.org/

Solution 2

If you do aws s3 ls on the actual filename. If the filename exists, the exit code will be 0 and the filename will be displayed, otherwise, the exit code will not be 0:

aws s3 ls s3://bucket/filname
if [[ $? -ne 0 ]]; then
  echo "File does not exist"
fi

Solution 3

first answer is close but in cases where you use -e in shebang, the script will fail which you would most like not want. It is better to use wordcount. So you can use the below command:

wordcount=`aws s3 ls s3://${S3_BUCKET_NAME}/${folder}/|grep $${file}|wc -c`
echo wordcount=${wordcount}
if [[ "${wordcount}" -eq 0 ]]; then
do something
else
do something
fi

Solution 4

Try the following :

aws s3api head-object --bucket ${S3_BUCKET} --key ${S3_KEY}

It retrieves the metadata of the object without retrieving the object itself. READ(s3:GetObject) access is required. .

Share:
45,039

Related videos on Youtube

Patrick B.
Author by

Patrick B.

Updated on September 18, 2022

Comments

  • Patrick B.
    Patrick B. over 1 year

    This directory /data/files/ has thousands files like:

    1test
    2test
    3test
    
    [...]
    
    60000test
    60001test
    

    I'm also sending them to a S3 Bucket (AWS), using AWS CLI. However, sometimes the S3 bucket can be offline and because of that the file is skipped.

    How can I check if the file that exists in /data/files/ is also in the S3 Bucket? and if not copy the missing file to S3?

    I would prefer to do this using BASH. Also if I need to change the AWS CLI for another one, can be.

    • derobert
      derobert over 7 years
      There are a bunch of command-line tools that talk to S3 such as s3cmd and s4cmd and FUSE filesystems such as s3fs and s3ql. There are also things like rclone which probably solve your entire problem for you. What are you currently using to talk to S3?
    • Patrick B.
      Patrick B. over 7 years
      @derobert i'm using the aws cli - If you have an example to help please feel free to answer the question.
    • derobert
      derobert over 7 years
      I'd think rclone copy /data/files whatever: would do everything for you... But anyway, you should edit your question to clarify which software you're using to talk to AWS. And if you're open to switching.
  • Benjamin W.
    Benjamin W. over 5 years
    $${file} will expand to the PID and {file}, probably not what you meant.
  • Donnie Cameron
    Donnie Cameron about 5 years
    The problem with this is that s3 ls will list the file and give a return code of 0 (success) even if you provide a partial path. For example, aws s3 ls s3://bucket/filen will list the file s3://bucket/filename.
  • Mike Q
    Mike Q over 4 years
    rclone is not a native call, I don't consider this the best solution.
  • Alexis Wilke
    Alexis Wilke almost 4 years
    This is great, especially since it returns a JSON if you want to find a specific field, it's easy to grab the value.