How can I make GetFiles() exclude files with extensions that start with the search extension?

15,253

Solution 1

Try this, filtered using file extension.

  FileInfo[] files = nodeDirInfo.GetFiles("*", SearchOption.TopDirectoryOnly).
            Where(f=>f.Extension==".sbs").ToArray<FileInfo>();

Solution 2

The issue you're experiencing is a limitation of the search pattern, in the Win32 API.

A searchPattern with a file extension (for example *.txt) of exactly three characters returns files having an extension of three or more characters, where the first three characters match the file extension specified in the searchPattern.

My solution is to manually filter the results, using Linq:

nodeDirInfo.GetFiles("*.sbs", option).Where(s => s.EndsWith(".sbs"),
    StringComparison.InvariantCultureIgnoreCase));

Solution 3

That's the behaviour of the Win32 API (FindFirstFile) that is underneath GetFiles() being reflected on to you.

You'll need to do your own filtering if you must use GetFiles(). For instance:

GetFiles("*", searchOption).Where(s => s.EndsWith(".sbs", 
    StringComparison.InvariantCultureIgnoreCase));

Or more efficiently:

EnumerateFiles("*", searchOption).Where(s => s.EndsWith(".sbs", 
    StringComparison.InvariantCultureIgnoreCase));

Note that I use StringComparison.InvariantCultureIgnoreCase to deal with the fact that Windows file names are case-insensitive.

If performance is an issue, that is if the search has to process directories with large numbers of files, then it is more efficient to perform the filtering twice: once in the call to GetFiles or EnumerateFiles, and once to clean up the unwanted file names. For example:

GetFiles("*.sbs", searchOption).Where(s => s.EndsWith(".sbs", 
    StringComparison.InvariantCultureIgnoreCase));
EnumerateFiles("*.sbs", searchOption).Where(s => s.EndsWith(".sbs", 
    StringComparison.InvariantCultureIgnoreCase));
Share:
15,253
topofsteel
Author by

topofsteel

Updated on June 03, 2022

Comments

  • topofsteel
    topofsteel almost 2 years

    I am using the following line to return specific files...

    FileInfo file in nodeDirInfo.GetFiles("*.sbs", option)
    

    But there are other files in the directory with the extension .sbsar, and it is getting them, too. How can I differentiate between .sbs and .sbsar in the search pattern?

  • David Heffernan
    David Heffernan over 10 years
    @Joey That just feels a little dirty to me, duplicating the filter. But perhaps it would have a perf implication. If not then I'd rather have just the one filter.
  • varocarbas
    varocarbas over 10 years
    It would be marvellous if this would be true. Unfortunately, it is not and and here comes a new episode of the poor descriptions of searchPattern in MSDN :) I felt curious and did some tests and here come my conclusions...
  • Anirudha
    Anirudha over 10 years
    @varocarbas indeed..wonder where to use ?.OP can use *a?.sbs..Though that would require a to be somewhere in the file name
  • Rich
    Rich over 10 years
    It's faster, though ;-) In my small test here (running over our complete source folder, searching for *.cpp) it's about 10–25 % faster to specify the filter in GetFiles too. EnumerateFiles is slightly slower, but probably uses much less memory, especially for large result sets.
  • varocarbas
    varocarbas over 10 years
    nodeDirInfo.GetFiles("5?.txt"); returns any file with just .txt (not .txtwhatever) containing two characters in the name, one of them being a 5. nodeDirInfo.GetFiles("?.txt"); Returns any .txt file with just one character in its name (not including .txtwhatever). You can get only *.txt by using a ????.txt approach if you know the maximum length of the file names you are looking for (??.txt retuns all the files with 1 or 2 characters in its name; ???.txt all the ones with 1,2 and 3, etc.).
  • David Heffernan
    David Heffernan over 10 years
    @Joey Yes, I think that's reasonable. I guess it comes down to a balance between perf and purity! I've covered this in the answer now.
  • David Heffernan
    David Heffernan over 10 years
    You don't account for letter case here.
  • Rich
    Rich over 10 years
    Oh, and I guess .EndsWith(".sbs", StringComparison.InvariantCultureIgnoreCase) would be a better option that's also resistant to culture, as the file system ignores the culture for its case-insensitivity.
  • David Heffernan
    David Heffernan over 10 years
    @Joey Thanks. Showing my ignorance with ToLower()!
  • topofsteel
    topofsteel over 10 years
    this was the answer I was hoping would work. But '?.sbs' returned nothing and '*?.sbs" returned all files with 'sbs' in the extension. The only thing these file names have in common is the extension. I imagine that would be the case with many such searches. I agree with varocarbas, the docs are not clear.