How can I traverse a file system with a generator?
Solution 1
Why reinvent the wheel when you can use os.walk
import os
for root, dirs, files in os.walk(path):
for name in files:
print os.path.join(root, name)
os.walk is a generator that yields the file names in a directory tree by walking the tree either top-down or bottom-up
Solution 2
I agree with the os.walk solution
For pure pedantic purpose, try iterate over the generator object, instead of returning it directly:
def grab_files(directory):
for name in os.listdir(directory):
full_path = os.path.join(directory, name)
if os.path.isdir(full_path):
for entry in grab_files(full_path):
yield entry
elif os.path.isfile(full_path):
yield full_path
else:
print('Unidentified name %s. It could be a symbolic link' % full_path)
Solution 3
As of Python 3.4, you can use the glob()
method from the built-in pathlib module:
import pathlib
p = pathlib.Path('.')
list(p.glob('**/*')) # lists all files recursively
Solution 4
Starting with Python 3.4, you can use the Pathlib module:
In [48]: def alliter(p):
....: yield p
....: for sub in p.iterdir():
....: if sub.is_dir():
....: yield from alliter(sub)
....: else:
....: yield sub
....:
In [49]: g = alliter(pathlib.Path("."))
In [50]: [next(g) for _ in range(10)]
Out[50]:
[PosixPath('.'),
PosixPath('.pypirc'),
PosixPath('.python_history'),
PosixPath('lshw'),
PosixPath('.gstreamer-0.10'),
PosixPath('.gstreamer-0.10/registry.x86_64.bin'),
PosixPath('.gconf'),
PosixPath('.gconf/apps'),
PosixPath('.gconf/apps/gnome-terminal'),
PosixPath('.gconf/apps/gnome-terminal/%gconf.xml')]
This is essential the object-oriented version of sjthebats answer.
Note that the Path.glob **
pattern returns only directories!
Solution 5
os.scandir()
is a "function returns directory entries along with file attribute information, giving better performance [than os.listdir()
] for many common use cases." It's an iterator that does not use os.listdir()
interally.
Comments
-
Evan Kroske over 3 years
I'm trying to create a utility class for traversing all the files in a directory, including those within subdirectories and sub-subdirectories. I tried to use a generator because generators are cool; however, I hit a snag.
def grab_files(directory): for name in os.listdir(directory): full_path = os.path.join(directory, name) if os.path.isdir(full_path): yield grab_files(full_path) elif os.path.isfile(full_path): yield full_path else: print('Unidentified name %s. It could be a symbolic link' % full_path)
When the generator reaches a directory, it simply yields the memory location of the new generator; it doesn't give me the contents of the directory.
How can I make the generator yield the contents of the directory instead of a new generator?
If there's already a simple library function to recursively list all the files in a directory structure, tell me about it. I don't intend to replicate a library function.