glob exclude pattern

146,307

Solution 1

The pattern rules for glob are not regular expressions. Instead, they follow standard Unix path expansion rules. There are only a few special characters: two different wild-cards, and character ranges are supported [from pymotw: glob – Filename pattern matching].

So you can exclude some files with patterns.
For example to exclude manifests files (files starting with _) with glob, you can use:

files = glob.glob('files_path/[!_]*')

Solution 2

You can deduct sets:

set(glob("*")) - set(glob("eph*"))

Solution 3

You can't exclude patterns with the glob function, globs only allow for inclusion patterns. Globbing syntax is very limited (even a [!..] character class must match a character, so it is an inclusion pattern for every character that is not in the class).

You'll have to do your own filtering; a list comprehension usually works nicely here:

files = [fn for fn in glob('somepath/*.txt') 
         if not os.path.basename(fn).startswith('eph')]

Solution 4

Late to the game but you could alternatively just apply a python filter to the result of a glob:

files = glob.iglob('your_path_here')
files_i_care_about = filter(lambda x: not x.startswith("eph"), files)

or replacing the lambda with an appropriate regex search, etc...

EDIT: I just realized that if you're using full paths the startswith won't work, so you'd need a regex

In [10]: a
Out[10]: ['/some/path/foo', 'some/path/bar', 'some/path/eph_thing']
In [11]: filter(lambda x: not re.search('/eph', x), a)
Out[11]: ['/some/path/foo', 'some/path/bar']

Solution 5

Compared with glob, I recommend pathlib. Filtering one pattern is very simple.

from pathlib import Path
p = Path(YOUR_PATH)
filtered = [x for x in p.glob("**/*") if not x.name.startswith("eph")]

And if you want to filter a more complex pattern, you can define a function to do that, just like:

def not_in_pattern(x):
    return (not x.name.startswith("eph")) and not x.name.startswith("epi")
filtered = [x for x in p.glob("**/*") if not_in_pattern(x)]

Using that code, you can filter all files that start with eph or start with epi.

Share:
146,307

Related videos on Youtube

Anastasios Andronidis
Author by

Anastasios Andronidis

Updated on November 16, 2021

Comments

  • Anastasios Andronidis
    Anastasios Andronidis about 1 year

    I have a directory with a bunch of files inside: eee2314, asd3442 ... and eph.

    I want to exclude all files that start with eph with the glob function.

    How can I do it?

  • Anastasios Andronidis
    Anastasios Andronidis almost 9 years
    Really interesting solution! But my case is going to be extremely slow to make a read twice. Also if the content of a folder is big on an network directory, is going to be slow again. But in any case, really handy.
  • Jaszczur
    Jaszczur over 8 years
    @TomBusby Try converting them to sets: set(glob("*")) - set(glob("eph*")) (and notice * at the end of "eph*")
  • Eugene Pankov
    Eugene Pankov about 8 years
    Use iglob here to avoid storing the full list in memory
  • Martijn Pieters
    Martijn Pieters about 8 years
    @Hardex: internally, iglob produces lists anyway; all you do is lazily evaluate the filter. It won't help to reduce the memory footprint.
  • Martijn Pieters
    Martijn Pieters about 8 years
    @Hardex: if you use a glob in the directory name then you'd have a point, then at most one os.listdir() result is kept in memory as you iterate. But somepath/*.txt has to read all filenames in one directory in memory, then reduce that list down to only those that match.
  • Eugene Pankov
    Eugene Pankov about 8 years
    you're right, it's not that important, but in stock CPython, glob.glob(x) = list(glob.iglob(x)). Not much of an overhead but still good to know.
  • Vitaly Zdanevich
    Vitaly Zdanevich over 6 years
    This must be at official documentation, please somebody add this to docs.python.org/3.5/library/glob.html#glob.glob
  • Nathan Smith
    Nathan Smith over 5 years
    Just as a side note, glob returns lists and not sets, but this kind of operation only works on sets, hence why neutrinus cast it. If you need it to remain a list, simply wrap the entire operation in a cast: list(set(glob("*")) - set(glob("eph")))
  • Ridhuvarshan
    Ridhuvarshan about 4 years
    Doesn't this iterate twice?. Once through the files to get the list and the second through the list itself? If so, is it not possible to do it in one iteration?
  • Martijn Pieters
    Martijn Pieters about 4 years
    @Ridhuvarshan: No, the list comprehension does just the one iteration. But if all you are going to do with the files list is iterate, then you could just as well make it a generator expression.
  • Martijn Pieters
    Martijn Pieters almost 4 years
    Note that glob patterns can't directly fullfill the requirement set out by the OP: to exclude only files that start with eph but can start with anything else. [!e][!p][!h] will filter out files that start with eee for example.
  • Felix Phl almost 3 years
    Correct me if I am wrong, but shouldn't this be glob.glob()? At least that's how I got it working.
  • Martijn Pieters
    Martijn Pieters almost 3 years
    @FelixPhl: depends on how you import it. If you use from glob import glob then the global name glob in your module is the function. If you use import glob then the global name is the module, and you need to use glob.glob().
  • SpinUp __ A Davis
    SpinUp __ A Davis almost 2 years
    Note if you're used to specifying your shell glob exclusions as [^_], this won't work in python's glob. Must use !
  • Wasi Master
    Wasi Master over 1 year
    @VitalyZdanevich it is in the documentation for fnmatch: docs.python.org/3/library/fnmatch.html#module-fnmatch

Related