Can I use a mask to iterate files in a directory with Boost?

43,682

Solution 1

EDIT: As noted in the comments, the code below is valid for versions of boost::filesystem prior to v3. For v3, refer to the suggestions in the comments.


boost::filesystem does not have wildcard search, you have to filter files yourself.

This is a code sample extracting the content of a directory with a boost::filesystem's directory_iterator and filtering it with boost::regex:

const std::string target_path( "/my/directory/" );
const boost::regex my_filter( "somefiles.*\.txt" );

std::vector< std::string > all_matching_files;

boost::filesystem::directory_iterator end_itr; // Default ctor yields past-the-end
for( boost::filesystem::directory_iterator i( target_path ); i != end_itr; ++i )
{
    // Skip if not a file
    if( !boost::filesystem::is_regular_file( i->status() ) ) continue;

    boost::smatch what;

    // Skip if no match for V2:
    if( !boost::regex_match( i->leaf(), what, my_filter ) ) continue;
    // For V3:
    //if( !boost::regex_match( i->path().filename().string(), what, my_filter ) ) continue;

    // File matches, store it
    all_matching_files.push_back( i->leaf() );
}

(If you are looking for a ready-to-use class with builtin directory filtering, have a look at Qt's QDir.)

Solution 2

There is a Boost Range Adaptors way:

#define BOOST_RANGE_ENABLE_CONCEPT_ASSERT 0
#include <boost/filesystem.hpp>
#include <boost/range/adaptors.hpp>

namespace bfs = boost::filesystem;
namespace ba = boost::adaptors;

const std::string target_path( "/my/directory/" );
const boost::regex my_filter( "somefiles.*\.txt" );
boost::smatch what;

for (auto &entry: boost::make_iterator_range(bfs::directory_iterator(target_path), {})
| ba::filtered(static_cast<bool (*)(const bfs::path &)>(&bfs::is_regular_file))
| ba::filtered([&](const bfs::path &path){ return boost::regex_match(path.filename().string(), what, my_filter); })
)
{
  // There are only files matching defined pattern "somefiles*.txt".
  std::cout << entry.path().filename() << std::endl;
}

Solution 3

My solution is essentially the same as Julien-L, but encapsulated in the include file it is nicer to use. Implemented using boost::filesystem v3. I guess that something like that is not included in the boost::filesystem directly because it would introduce dependency on boost::regex.

#include "FilteredDirectoryIterator.h"
std::vector< std::string > all_matching_files;
std::for_each(
        FilteredDirectoryIterator("/my/directory","somefiles.*\.txt"),
        FilteredDirectoryIterator(),
        [&all_matching_files](const FilteredDirectoryIterator::value_type &dirEntry){
                all_matching_files.push_back(dirEntry.path());
            }
        );

alternatively use FilteredRecursiveDirectoryIterator for recursive sub directories search:

#include "FilteredDirectoryIterator.h"
std::vector< std::string > all_matching_files;
std::for_each(
        FilteredRecursiveDirectoryIterator("/my/directory","somefiles.*\.txt"),
        FilteredRecursiveDirectoryIterator(),
        [&all_matching_files](const FilteredRecursiveDirectoryIterator::value_type &dirEntry){
                all_matching_files.push_back(dirEntry.path());
            }
        );

FilteredDirectoryIterator.h

#ifndef TOOLS_BOOST_FILESYSTEM_FILTEREDDIRECTORYITERATOR_H_
#define TOOLS_BOOST_FILESYSTEM_FILTEREDDIRECTORYITERATOR_H_

#include "boost/filesystem.hpp"
#include "boost/regex.hpp"
#include <functional>

template <class NonFilteredIterator = boost::filesystem::directory_iterator>
class FilteredDirectoryIteratorTmpl
:   public std::iterator<
    std::input_iterator_tag, typename NonFilteredIterator::value_type
    >
{
private:
    typedef std::string string;
    typedef boost::filesystem::path path;
    typedef
        std::function<
            bool(const typename NonFilteredIterator::value_type &dirEntry)
            >
        FilterFunction;

    NonFilteredIterator it;

    NonFilteredIterator end;

    const FilterFunction filter;

public:

    FilteredDirectoryIteratorTmpl();

    FilteredDirectoryIteratorTmpl(
        const path &iteratedDir, const string &regexMask
        );

    FilteredDirectoryIteratorTmpl(
        const path &iteratedDir, const boost::regex &mask
        );

    FilteredDirectoryIteratorTmpl(
        const path &iteratedDir,
        const FilterFunction &filter
        );

    //preincrement
    FilteredDirectoryIteratorTmpl<NonFilteredIterator>& operator++() {
        for(++it;it!=end && !filter(*it);++it);
        return *this;
    };

    //postincrement
    FilteredDirectoryIteratorTmpl<NonFilteredIterator> operator++(int) {
        for(++it;it!=end && !filter(*it);++it);
        return FilteredDirectoryIteratorTmpl<NonFilteredIterator>(it,filter);
    };
    const boost::filesystem::directory_entry &operator*() {return *it;};
    bool operator!=(const FilteredDirectoryIteratorTmpl<NonFilteredIterator>& other)
    {
        return it!=other.it;
    };
    bool operator==(const FilteredDirectoryIteratorTmpl<NonFilteredIterator>& other)
    {
        return it==other.it;
    };
};

typedef
    FilteredDirectoryIteratorTmpl<boost::filesystem::directory_iterator>
    FilteredDirectoryIterator;

typedef
    FilteredDirectoryIteratorTmpl<boost::filesystem::recursive_directory_iterator>
    FilteredRecursiveDirectoryIterator;

template <class NonFilteredIterator>
FilteredDirectoryIteratorTmpl<NonFilteredIterator>::FilteredDirectoryIteratorTmpl()
:   it(),
    filter(
        [](const boost::filesystem::directory_entry& /*dirEntry*/){return true;}
        )
{

}

template <class NonFilteredIterator>
FilteredDirectoryIteratorTmpl<NonFilteredIterator>::FilteredDirectoryIteratorTmpl(
    const path &iteratedDir,const string &regexMask
    )
:   FilteredDirectoryIteratorTmpl(iteratedDir, boost::regex(regexMask))
{
}

template <class NonFilteredIterator>
FilteredDirectoryIteratorTmpl<NonFilteredIterator>::FilteredDirectoryIteratorTmpl(
    const path &iteratedDir,const boost::regex &regexMask
    )
:   it(NonFilteredIterator(iteratedDir)),
    filter(
        [regexMask](const boost::filesystem::directory_entry& dirEntry){
            using std::endl;
            // return false to skip dirEntry if no match
            const string filename = dirEntry.path().filename().native();
            return boost::regex_match(filename, regexMask);
        }
        )
{
    if (it!=end && !filter(*it)) ++(*this);
}

template <class NonFilteredIterator>
FilteredDirectoryIteratorTmpl<NonFilteredIterator>::FilteredDirectoryIteratorTmpl(
    const path &iteratedDir, const FilterFunction &filter
    )
:   it(NonFilteredIterator(iteratedDir)),
    filter(filter)
{
    if (it!=end && !filter(*it)) ++(*this);
}

#endif

Solution 4

I believe the directory_iterators will only provide all files in a directory. It up to you to filter them as necessary.

Solution 5

The accepted answer did not compile for me even when I used i->path().extension() instead of leaf(). What did work for me was an example from this website. Here's the code, modified, to apply a filter:

vector<string> results;
filesystem::path filepath(fullpath_to_file);
filesystem::directory_iterator it(filepath);
filesystem::directory_iterator end;
const boost::regex filter("myfilter(capturing group)");
BOOST_FOREACH(filesystem::path const &p, make_pair(it, end))
{
     if(is_regular_File(p))
     {
          match_results<string::const_iterator> what;
          if (regex_search(it->path().filename().string(), what, pidFileFilter, match_default))
          {
               string res = what[1];
               results.push_back(res);
          }
     }
}

I'm using Boost version: 1.53.0.

Why we don't all just use glob() and some regex is beyond me.

Share:
43,682
scottm
Author by

scottm

Software engineer with skills in highly available systems and software solutions.

Updated on July 08, 2022

Comments

  • scottm
    scottm almost 2 years

    I want to iterate over all files in a directory matching something like somefiles*.txt.

    Does boost::filesystem have something built in to do that, or do I need a regex or something against each leaf()?

  • alfC
    alfC over 12 years
    thanks for the very complete answer. two notes for others: 1) leaf is deprecated in filesystem v3 (current default), use path().filename() instead 2) if the filter criterion is the extension (very common) it is easier to use i->path().extension() == ".txt" [for example] than regex
  • Fuzz
    Fuzz almost 12 years
    leaf() is now deprecated. i->leaf() can be replaced by i->path().string() or i->path().filename().string() if you just want the filenames
  • David L.
    David L. about 10 years
    The backslash in regex has to be escaped "somefiles.*\\.txt"
  • berkus
    berkus over 8 years
    Good one! Range Adaptors to the rescue.
  • David Doria
    David Doria over 8 years
    @Julien-L I actually had to change the filter to const boost::regex my_filter( tarrget_path + "somefiles.*\.txt" );
  • gilgamash
    gilgamash over 7 years
    For those who run into this nowadays: std::regex_match as well as boost::regex_match do no longer accept temporary string. For the given example above this means that the lambda body needs to be adjusted to something like auto oFile=path,filename().string(); return xxx::regex_match(oFile, ...).