How to count number of file in a bucket-folder with gsutil

47,348

Solution 1

The gsutil ls command with options -l (long listing) and -R (recursive listing) will list the entire bucket recursively and then produce a total count of all objects, both files and directories, at the end:

$ gsutil ls -lR gs://pub
    104413  2011-04-03T20:58:02Z  gs://pub/SomeOfTheTeam.jpg
       172  2012-06-18T21:51:01Z  gs://pub/cloud_storage_storage_schema_v0.json
      1379  2012-06-18T21:51:01Z  gs://pub/cloud_storage_usage_schema_v0.json
   1767691  2013-09-18T07:57:42Z  gs://pub/gsutil.tar.gz
   2445111  2013-09-18T07:57:44Z  gs://pub/gsutil.zip
      1136  2012-07-19T16:01:05Z  gs://pub/gsutil_2.0.ReleaseNotes.txt
... <snipped> ...

gs://pub/apt/pool/main/p/python-socksipy-branch/:
     10372  2013-06-10T22:52:58Z  gs://pub/apt/pool/main/p/python-socksipy-branch/python-socksipy-branch_1.01_all.deb

gs://pub/shakespeare/:
        84  2010-05-07T23:36:25Z  gs://pub/shakespeare/rose.txt
TOTAL: 144 objects, 102723169 bytes (97.96 MB)

If you really just want the total, you can pipe the output to the tail command:

$ gsutil ls -lR gs://pub | tail -n 1
TOTAL: 144 objects, 102723169 bytes (97.96 MB)

UPDATE

gsutil now has a du command. This makes it even easier to get a count:

$ gsutil du gs://pub | wc -l
232

Solution 2

If you have the option to not use gsutil, the easiest way is to check it on Google Cloud Platform. Go to Monitoring > Metrics explorer :

  • Resource type : GCS Bucket
  • Metric : Object count Then, in the table below, you have for each bucket the number of document it contains.

Solution 3

You want to gsutil ls -count -recursive in gs://bucket/folder? Alright; gsutil ls gs://bucket/folder/** will list just full urls of the paths to files under gs://bucket/folder without the footer or the lines ending in a colon. Piping that to wc -l will give you the line-count of the result.

gsutil ls gs://bucket/folder/** | wc -l

Solution 4

gsutil ls -lR gs://Floder1/Folder2/Folder3/** |tail -n 1

Solution 5

As someone that had 4.5M objects in a bucket, I used gsutil du gs://bucket/folder | wc -l which took ~24 min

Share:
47,348
Admin
Author by

Admin

Updated on July 08, 2022

Comments

  • Admin
    Admin almost 2 years

    Is there an option to count the number of files in bucket-folders?

    Like:

    gsutil ls -count -recursive gs://bucket/folder
    
    Result:   666 files
    

    I just want an total number of files to compare the amount to the sync-folder on my server.

    I don't get it in the manual.

  • Admin
    Admin over 10 years
    Great, thanks ... just a liddle bit slow for 4 mio files .. Is this Operation 1 Call or counted as numbers of bucket elements? ... could become expensive .. :-)
  • jterrace
    jterrace over 10 years
    It does an object listing on the bucket, and pages through the results, I think 1000 at a time, so it will make N/1000 calls, where N is the number of objects you have. This is a class A operation per the pricing page.
  • Syed Mudabbir
    Syed Mudabbir over 8 years
    Hello just logged in to say thanks this helped. I was trying to use find but that was not supported so when searching for an alternative stumbled upon your answer. Its been a great help.
  • booleys1012
    booleys1012 over 8 years
    the gsutil solution works great in gsutil v 4.15, @jterrace, but only if there are no "subdirectories" in the bucket/path you are listing. If there are subdirectories, du will roll up the size of the files below that directory and print a line to stdout for that directory (making the file count incorrect). Sorry for the late update to an old question.
  • mobcdi
    mobcdi almost 8 years
    While gsutil ls -lworks is there a way in Windows (no tail or ws) to get a summary without needing to list the entire bucket contents
  • dlamblin
    dlamblin about 7 years
    du and ls aren't counting as much as wc -l is.
  • Yogesh Patil
    Yogesh Patil over 6 years
    @jterrace Great, thanks. It also includes directory as an object and adds to count. Can we somehow only consider files count excluding directories.
  • northtree
    northtree over 5 years
    Why use ** not just *?
  • dlamblin
    dlamblin over 5 years
    @northtree I think in this case it might be equivalent, but ** does work for multiple levels at once, so I think /folder/**/*.js would find all js files under any depth of directories after folder (except in folder itself) while /folder/*/*.js would only work for js files within a directory in folder.
  • REdim.Learning
    REdim.Learning about 5 years
    @jterrace looks like du is giving file sizes, not counts!
  • jterrace
    jterrace about 5 years
    @REdim.Learning - yes, but it prints one per line, which is why I pipe to wc -l
  • Miles Erickson
    Miles Erickson over 4 years
    @mobcdi If you have Git for Windows, you have Git Bash. Use that.
  • nroose
    nroose about 3 years
    Clearly GCP is using this to get more money from us. They clearly know the size and count. It should be available in the API. We should not accept less.
  • Yevgen Safronov
    Yevgen Safronov over 2 years
    this is an underappreciated answer.
  • ingernet
    ingernet over 2 years
    This is WAY faster than using gsutil if you aren't doing something programmatically and you just need the count, AND it doesn't dip into your Class A Operations quota.
  • Vishwas M.R
    Vishwas M.R about 2 years
    Especially helpful when your bucket has more than a million objects and the total size exceeds a few GBs.
  • Jérémy
    Jérémy about 2 years
    Of course, this only works if you want to count the amount of files in the entire bucket. You can't use this to check the amount of files in a specific folder inside the bucket.