Python Google Drive API - list the entire drive file tree

13,843

Solution 1

In order to build a representation of a tree in your app, you need to do this ...

  1. Run a Drive List query to retrieve all Folders
  2. Iterate the result array and examine the parents property to build an in-memory hierarchy
  3. Run a second Drive List query to get all non-folders (ie. files)
  4. For each file returned, place it in your in-memory tree

If you simply want to check if file-A exists in folder-B, the approach depends on whether the name "folder-B" is guaranteed to be unique.

If it's unique, just do a FilesList query for title='file-A', then do a Files Get for each of its parents and see if any of them are called 'folder-B'.

You don't say if these files and folders are being created by your app, or by the user with the Google Drive Webapp. If your app is the creator of these files/folders there is a trick you can use to restrict your searches to a single root. Say you have

MyDrive/app_root/folder-C/folder-B/file-A

you can make all of folder-C, folder-B and file-A children of app_root

That way you can constrain all of your queries to include

and 'app_root_id' in parents

NB. A previous version of this answer highlighted that Drive folders were not constrained to an inverted tree hierarchy, because a single folder could have multiple parents. As of 2021, this is no longer true and a Drive File (including Folders, which are simply special files) can only be created with a single parent.

Solution 2

Will never work like that except for very small trees. You have to rethink your entire algorithm for a cloud app (you have written it like a desktop app where you own the machine) since it will timeout easily. You need to mirror the tree beforehand (taskqueues and datastore) not just to avoid timeouts but also to avoid drive rate limits, and keep it in sync somehow (register for push etc). Not easy at all. Ive done a drive tree viewer before.

Solution 3

An easy way to check if a file exist in a specific path is: drive_service.files().list(q="'THE_ID_OF_SPECIFIC_PATH' in parents and title='a file'").execute()

To walk all folders and files:

import sys, os
import socket

import googleDriveAccess

import logging
logging.basicConfig()

FOLDER_TYPE = 'application/vnd.google-apps.folder'

def getlist(ds, q, **kwargs):
  result = None
  npt = ''
  while not npt is None:
    if npt != '': kwargs['pageToken'] = npt
    entries = ds.files().list(q=q, **kwargs).execute()
    if result is None: result = entries
    else: result['items'] += entries['items']
    npt = entries.get('nextPageToken')
  return result

def uenc(u):
  if isinstance(u, unicode): return u.encode('utf-8')
  else: return u

def walk(ds, folderId, folderName, outf, depth):
  spc = ' ' * depth
  outf.write('%s+%s\n%s  %s\n' % (spc, uenc(folderId), spc, uenc(folderName)))
  q = "'%s' in parents and mimeType='%s'" % (folderId, FOLDER_TYPE)
  entries = getlist(ds, q, **{'maxResults': 200})
  for folder in entries['items']:
    walk(ds, folder['id'], folder['title'], outf, depth + 1)
  q = "'%s' in parents and mimeType!='%s'" % (folderId, FOLDER_TYPE)
  entries = getlist(ds, q, **{'maxResults': 200})
  for f in entries['items']:
    outf.write('%s -%s\n%s   %s\n' % (spc, uenc(f['id']), spc, uenc(f['title'])))

def main(basedir):
  da = googleDriveAccess.DAClient(basedir) # clientId=None, script=False
  f = open(os.path.join(basedir, 'hierarchy.txt'), 'wb')
  walk(da.drive_service, 'root', u'root', f, 0)
  f.close()

if __name__ == '__main__':
  logging.getLogger().setLevel(getattr(logging, 'INFO'))
  try:
    main(os.path.dirname(__file__))
  except (socket.gaierror, ), e:
    sys.stderr.write('socket.gaierror')

using googleDriveAccess github.com/HatsuneMiku/googleDriveAccess

Share:
13,843
Matteo Hertel
Author by

Matteo Hertel

Updated on June 05, 2022

Comments

  • Matteo Hertel
    Matteo Hertel almost 2 years

    I'm building a python application that uses the Google drive APIs, so fare the development is good but I have a problem to retrieve the entire Google drive file tree, I need that for two purposes:

    1. Check if a path exist, so if i want upload test.txt under root/folder1/folder2 I want to check if the file already exist and in the case update it
    2. Build a visual file explorer, now I know that google provides his own (I can't remember the name now, but I know that exist) but I want to restrict the file explorer to specific folders.

    For now I have a function that fetch the root of Gdrive and I can build the three by recursive calling a function that list me the content of a single folder, but it is extremely slow and can potentially make thousand of request to google and this is unacceptable.

    Here the function to get the root:

    def drive_get_root():
        """Retrieve a root list of File resources.
           Returns:
             List of dictionaries.
        """
        
        #build the service, the driveHelper module will take care of authentication and credential storage
        drive_service = build('drive', 'v2', driveHelper.buildHttp())
        # the result will be a list
        result = []
        page_token = None
        while True:
            try:
                param = {}
                if page_token:
                    param['pageToken'] = page_token
                files = drive_service.files().list(**param).execute()
                #add the files in the list
                result.extend(files['items'])
                page_token = files.get('nextPageToken')
                if not page_token:
                    break
            except errors.HttpError, _error:
                print 'An error occurred: %s' % _error
            break
        return result
    

    and here the one to get the file from a folder

    def drive_files_in_folder(folder_id):
        """Print files belonging to a folder.
           Args:
             folder_id: ID of the folder to get files from.
        """
        #build the service, the driveHelper module will take care of authentication and credential storage
        drive_service = build('drive', 'v2', driveHelper.buildHttp())
        # the result will be a list
        result = []
        #code from google, is working so I didn't touch it
        page_token = None
        while True:
            try:
                param = {}
    
                if page_token:
                    param['pageToken'] = page_token
    
                children = drive_service.children().list(folderId=folder_id, **param).execute()
    
                for child in children.get('items', []):
                    result.append(drive_get_file(child['id']))
    
                page_token = children.get('nextPageToken')
                if not page_token:
                    break
            except errors.HttpError, _error:
                print 'An error occurred: %s' % _error
                break       
        return result
    

    and for example now to check if a file exist I'm using this:

    def drive_path_exist(file_path, list = False):
        """
        This is a recursive function to che check if the given path exist
        """
    
        #if the list param is empty set the list as the root of Gdrive
        if list == False:
            list = drive_get_root()
    
        #split the string to get the first item and check if is in the root
        file_path = string.split(file_path, "/")
    
        #if there is only one element in the filepath we are at the actual filename
        #so if is in this folder we can return it
        if len(file_path) == 1:
            exist = False
            for elem in list:
                if elem["title"] == file_path[0]:
                    #set exist = to the elem because the elem is a dictionary with all the file info
                    exist = elem
    
            return exist
        #if we are not at the last element we have to keep searching
        else:
            exist = False
            for elem in list:
                #check if the current item is in the folder
                if elem["title"] == file_path[0]:
                    exist = True
                    folder_id = elem["id"]
                    #delete the first element and keep searching
                    file_path.pop(0)
    
            if exist:
                #recursive call, we have to rejoin the filpath as string an passing as list the list
                #from the drive_file_exist function
                return drive_path_exist("/".join(file_path), drive_files_in_folder(folder_id))
    

    any idea how to solve my problem? I saw a few discussion here on overflow and in some answers people wrote that this is possible but of course the didn't said how!

    Thanks

  • Matteo Hertel
    Matteo Hertel about 10 years
    In fact this is a desktop app, I know that my actual code will never work, but it must be a easy way to check if a file exist in a specific path, how did you did yours?
  • Zig Mandel
    Zig Mandel almost 10 years
    this doesnt answer the question, as it doesnt cover files in subsequent subdirectories.
  • boatcoder
    boatcoder about 3 years
    Doesn't answer the question at all.
  • user14073111
    user14073111 almost 3 years
    Hi, @RTmy, how your code can be modified to return all the files in a dataframe with comlums: id and name, instead of printing each line?
  • Luk Aron
    Luk Aron almost 3 years
    how is this an answer? just state the question is difficult....
  • Luk Aron
    Luk Aron almost 3 years
    seems in newest version of drive, multiple parent is not allowed, so it is a tree
  • pinoyyid
    pinoyyid almost 3 years
    @LukAron Yes. This change came at the end of 2020. I'll update my answer. Although, this currently only applies to new Files. So an existing file created before 2021 could still have multiple parents. Google will, at some point, convert excess parents to shortcuts.