Reading all files in all directories

19,561

Python doesn't support wildcards directly in filenames to the open() call. You'll need to use the glob module instead to load files from a single level of subdirectories, or use os.walk() to walk an arbitrary directory structure.

Opening all text files in all subdirectories, one level deep:

import glob

for filename in glob.iglob(os.path.join('Test', '*', '*.txt')):
    with open(filename) as f:
        # one file open, handle it, next loop will present you with a new file.

Opening all text files in an arbitrary nesting of directories:

import os
import fnmatch

for dirpath, dirs, files in os.walk('Test'):
    for filename in fnmatch.filter(files, '*.txt'):
        with open(os.path.join(dirpath, filename)):
            # one file open, handle it, next loop will present you with a new file.
Share:
19,561
Relative0
Author by

Relative0

Updated on June 09, 2022

Comments

  • Relative0
    Relative0 almost 2 years

    I have the code working to read in the values of a single text file but am having difficulties reading all files from all directories and putting all of the contents together.

    Here is what I have:

    filename = '*'
    filesuffix = '*'
    location = os.path.join('Test', filename + "." + filesuffix)
    Document = filename
    thedictionary = {}
    with open(location) as f:
     file_contents = f.read().lower().split(' ') # split line on spaces to make a list
     for position, item in enumerate(file_contents): 
         if item in thedictionary:
          thedictionary[item].append(position)
         else:
          thedictionary[item] = [position]
    wordlist = (thedictionary, Document)
    #print wordlist
    #print thedictionary
    

    note that I am trying to stick the wildcard * in for the filename as well as the wildcard for the filesuffix. I get the following error:

    "IOError: [Errno 2] No such file or directory: 'Test/.'"

    I am not sure if this is even the right way to do it but it seems that if I somehow get the wildcards working - it should work.

    I have gotten this example to work: Python - reading files from directory file not found in subdirectory (which is there)

    Which is a little different - but don't know how to update it to read all files. I am thinking that in this initial set of code:

    previous_dir = os.getcwd()
    os.chdir('testfilefolder')
    #add something here?
    for filename in os.listdir('.'):
    

    That I would need to add something where I have an outer for loop but don't quite know what to put in it..

    Any thoughts?

  • Relative0
    Relative0 about 11 years
    Thank you Martijn for that. I will try it out and see what happens. I am curious though to as why they make two different functions glob and the os.walk. On a little reading I do see that glob will let you use wildcards, but os.walk will not - instead you need to filter the results. I don't understand what is really going on as when I am thinking filter the results I thought that is what wildcard expressions did. I found this post: stackoverflow.com/questions/8931099/quicker-to-os-walk-or-gl‌​ob If you have any insight and time, any thoughts are appreciated.
  • Martijn Pieters
    Martijn Pieters about 11 years
    glob() does not support arbitrary nested subdirectories (yet). That's the only difference here. os.walk() does but requires more filtering. Note that glob() uses the same filter method (the fnmatch module) already in it's own implementation.