How to select a file from aws s3 by using wild character

24,015

Solution 1

You may want to add the "--exclude" flag before your include filter.

The AWS CLI takes the filter "--include" to include it in your already existing search. Since all the files are being returned, you need to exclude all the files first, before including the 2015*.xlsx.

If you want files with only the format "201502_nts_*.xlsx", you can run aws s3 cp s3://bp-dev/bp_source_input/ C:\Business_Panorama\nts\data\in --recursive --exclude * --include "201502_nts_*.xlsx"

Solution 2

I had to add quotes around the --exclude * wildcard, so it'd look like:

aws s3 cp s3://bp-dev/bp_source_input/ C:\Business_Panorama\nts\data\in --recursive --exclude "*" --include "201502_nts_*.xlsx"

Solution 3

After doing many rounds of check and getting help from bsnchan, I am able to use exclude and include command in aws s3 cli. Please make sure that you put the spaces correctly.

for copy specific file:

aws s3 cp s3://itx-agj-cons-ww-bp-dev/bp_source_input/ C:\Business_Panorama\nts\data\in  --recursive --exclude "*" --include "*%mth_cd%_%source%_all.xlsx"

(Note mth_cd is parameter used in bat file)

For checking of file existance.

aws s3 ls s3://itx-agj-cons-ww-bp-dev/bp_source_input/ --recursive | FINDSTR  "201502_nts_.*.xlsx"

(Note: windows cli, for unix it will be grep)

Thanks a lot.

Share:
24,015
user3858193
Author by

user3858193

Updated on September 29, 2020

Comments

  • user3858193
    user3858193 over 3 years

    I have many a files in s3 bucket and I want to copy those files which have start date of 2012. This below command copies all the file.

    aws s3 cp s3://bp-dev/bp_source_input/ C:\Business_Panorama\nts\data\in --recursive  --include "201502_nts_*.xlsx"
    
  • user3858193
    user3858193 about 9 years
    Hey, That's worked for me. I have one more question . I want to do ls first to see if the file exist , then I should copy. This is throwiing error ..aws s3 ls s3://bp-dev/bp_source_input/ --recursive --exclude --include "201502_nts_.xlsx"
  • bsnchan
    bsnchan about 9 years
    The --exclude and --include filter flags only work for s3 object operations (such as cp, mv, rm). ls is a directory operation. You can run the ls command and pipe it to grep: aws s3 ls s3://bp-dev/bp_source_input/ --recursive | grep 201502_nts_*.xlsx
  • bsnchan
    bsnchan about 9 years
    grep is a unix command (I shouldn't have made the assumption that you were on a *nix system). What kind of machine are you running the aws cli from?
  • user3858193
    user3858193 about 9 years
    its from windows..sorry it was my bad.
  • bsnchan
    bsnchan about 9 years
    I think the equivalent for windows is findstr. aws s3 ls s3://bp-dev/bp_source_input/ --recursive | findstr 201502_nts_*.xlsx
  • user3858193
    user3858193 about 9 years
    aws s3 ls s3://bp-dev/bp_source_input/ --recursive | findstr 201502_nts_* is working fine but not the aws s3 ls s3://bp-dev/bp_source_input/ --recursive | findstr 201502_nts_*xlsx
  • bsnchan
    bsnchan about 9 years
    My mistake, wildcards in windows are different. I tried it out in a windows machine and findstr 201502_nts_.*.xlsx should work.
  • user3858193
    user3858193 about 9 years
    Hi @bsnchan, When I am using exclude it is not working. Can you suggest pls. C:\Users\admin_spanda20>aws s3 cp s3://bp-dev/bp_source_input/in C:\Business_Panorama\nts\data\in --recursive --exclude * --include "201502_nts_.xlsx" C:\Users\admin_spanda20> C:\Users\admin_spanda20>aws s3 cp s3://ibp-dev/bp_source_input/in C:\Business_Panorama\nts\data\in --recursive --include "201502_nts_act.xlsx" download: s3://bp-dev/bp_source_input/in/201502_nts_act_apac.xlsx to ..\..\Business_Panorama\nts\data\in\201502_nts_act_apac.xlsx
  • Pramit
    Pramit over 7 years
    Would also advise to use the flag --dryrun as it might be very helpful in avoiding mistakes aws s3 rm s3://mybucket/ --profile <profile_name> --exclude * --include "file_name_*" --dryrun