Google Cloud Storage + Python : Any way to list obj in certain folder in GCS?
Solution 1
Update: the below is true for the older "Google API Client Libraries" for Python, but if you're not using that client, prefer the newer "Google Cloud Client Library" for Python ( https://googleapis.dev/python/storage/latest/index.html ). For the newer library, the equivalent to the below code is:
from google.cloud import storage
client = storage.Client()
for blob in client.list_blobs('bucketname', prefix='abc/myfolder'):
print(str(blob))
Answer for older client follows.
You may find it easier to work with the JSON API, which has a full-featured Python client. It has a function for listing objects that takes a prefix parameter, which you could use to check for a certain directory and its children in this manner:
from apiclient import discovery
# Auth goes here if necessary. Create authorized http object...
client = discovery.build('storage', 'v1') # add http=whatever param if auth
request = client.objects().list(
bucket="mybucket",
prefix="abc/myfolder")
while request is not None:
response = request.execute()
print json.dumps(response, indent=2)
request = request.list_next(request, response)
Fuller documentation of the list call is here: https://developers.google.com/storage/docs/json_api/v1/objects/list
And the Google Python API client is documented here: https://code.google.com/p/google-api-python-client/
Solution 2
This worked for me:
client = storage.Client()
BUCKET_NAME = 'DEMO_BUCKET'
bucket = client.get_bucket(BUCKET_NAME)
blobs = bucket.list_blobs()
for blob in blobs:
print(blob.name)
The list_blobs() method will return an iterator used to find blobs in the bucket. Now you can iterate over blobs and access every object in the bucket. In this example I just print out the name of the object.
This documentation helped me alot:
I hope I could help!
Solution 3
You might also want to look at gcloud-python and documentation.
from gcloud import storage
connection = storage.get_connection(project_name, email, private_key_path)
bucket = connection.get_bucket('my-bucket')
for key in bucket:
if key.name == 'abc.txt':
print 'Found it!'
break
However, you might be better off just checking if the file exists:
if 'abc.txt' in bucket:
print 'Found it!'
Solution 4
Install python package google-cloud-storage by pip or pycharm and use below code
from google.cloud import storage
client = storage.Client()
for blob in client.list_blobs(BUCKET_NAME, prefix=FOLDER_NAME):
print(str(blob))
Solution 5
I know this is an old question, but I stumbled over this because I was looking for the exact same answer. Answers from Brandon Yarbrough and Abhijit worked for me, but I wanted to get into more detail.
When you run this:
from google.cloud import storage
storage_client = storage.Client()
blobs = list(storage_client.list_blobs(bucket_name, prefix=PREFIX, fields="items(name)"))
You will get Blob objects, with just the name field of all files in the given bucket, like this:
[<Blob: BUCKET_NAME, PREFIX, None>,
<Blob: xml-BUCKET_NAME, [PREFIX]claim_757325.json, None>,
<Blob: xml-BUCKET_NAME, [PREFIX]claim_757390.json, None>,
...]
If you are like me and you want to 1) filter out the first item in the list because it does NOT represent a file - its just the prefix, 2) just get the name string value, and 3) remove the PREFIX from the file name, you can do something like this:
blob_names = [blob_name.name[len(PREFIX):] for blob_name in blobs if blob_name.name != folder_name]
Complete code to get just the string files names from a storage bucket:
from google.cloud import storage
storage_client = storage.Client()
blobs = list(storage_client.list_blobs(bucket_name, prefix=PREFIX, fields="items(name)"))
blob_names = [blob_name.name[len(PREFIX):] for blob_name in blobs if blob_name.name != folder_name]
print(f"blob_names = {blob_names}")
Related videos on Youtube
Comments
-
Reed_Xia almost 2 years
I'm going to write a Python program to check if a file is in certain folder of my Google Cloud Storage, the basic idea is to get the
list
of all objects in a folder, a file namelist
, then check if the fileabc.txt
is in the file namelist
.Now the problem is, it looks Google only provide the one way to get
obj
list
, which isuri.get_bucket()
, see below code which is from https://developers.google.com/storage/docs/gspythonlibrary#listing-objectsuri = boto.storage_uri(DOGS_BUCKET, GOOGLE_STORAGE) for obj in uri.get_bucket(): print '%s://%s/%s' % (uri.scheme, uri.bucket_name, obj.name) print ' "%s"' % obj.get_contents_as_string()
The defect of
uri.get_bucket()
is, it looks it is getting all of the object first, this is what I don't want, I just need get theobj
namelist
of particular folder(e.ggs//mybucket/abc/myfolder
) , which should be much quickly.Could someone help answer? Appreciate every answer!
-
Reed_Xia about 10 yearsCould you advise how to define client? I already import json and apiclient, but it will throw NameError: name 'client' is not defined, I checked the doc and did not found this part of code, thank you!
-
Reed_Xia about 10 yearsI'm working on Windows 7, I failed to easy_install gcloud, finally it would end with warning: GMP or MPIR library not found; Not building Crypto.PublicKey._fastmath. error: Setup script exited with error: Unable to find vcvarsall.bat, could you advise? Thank you!
-
JJ Geewax about 10 yearsDo you have PyCrypto and all those installed? Windows installers for those are available online I believe.
-
ShanEllis about 10 yearsAdded a bit above with example syntax.
-
John about 5 yearsAnd if you want to filter files in particular folder use
bucket.list_blobs(prefix="path")
-
CpILL about 4 yearsis there anyway to speed this up? It slow for millions of blobs
-
Ema Il over 2 yearsWorks great for older versions of the google api. Thanks