How do I "normalize" a pathname using boost::filesystem?

30,464

Solution 1

Boost v1.48 and above

You can use boost::filesystem::canonical:

path canonical(const path& p, const path& base = current_path());
path canonical(const path& p, system::error_code& ec);
path canonical(const path& p, const path& base, system::error_code& ec);

http://www.boost.org/doc/libs/1_48_0/libs/filesystem/v3/doc/reference.html#canonical

v1.48 and above also provide the boost::filesystem::read_symlink function for resolving symbolic links.

Boost versions prior to v1.48

As mentioned in other answers, you can't normalise because boost::filesystem can't follow symbolic links. However, you can write a function that normalises "as much as possible" (assuming "." and ".." are treated normally) because boost offers the ability to determine whether or not a file is a symbolic link.

That is to say, if the parent of the ".." is a symbolic link then you have to retain it, otherwise it is probably safe to drop it and it's probably always safe to remove ".".

It's similar to manipulating the actual string, but slightly more elegant.

boost::filesystem::path resolve(
    const boost::filesystem::path& p,
    const boost::filesystem::path& base = boost::filesystem::current_path())
{
    boost::filesystem::path abs_p = boost::filesystem::absolute(p,base);
    boost::filesystem::path result;
    for(boost::filesystem::path::iterator it=abs_p.begin();
        it!=abs_p.end();
        ++it)
    {
        if(*it == "..")
        {
            // /a/b/.. is not necessarily /a if b is a symbolic link
            if(boost::filesystem::is_symlink(result) )
                result /= *it;
            // /a/b/../.. is not /a/b/.. under most circumstances
            // We can end up with ..s in our result because of symbolic links
            else if(result.filename() == "..")
                result /= *it;
            // Otherwise it should be safe to resolve the parent
            else
                result = result.parent_path();
        }
        else if(*it == ".")
        {
            // Ignore
        }
        else
        {
            // Just cat other path entries
            result /= *it;
        }
    }
    return result;
}

Solution 2

With version 3 of boost::filesystem you can also try to remove all the symbolic links with a call to canonical. This can be done only for existing paths so a function that also works for non-existing ones would require two steps (tested on MacOS Lion and updated for Windows thanks to @void.pointer's comment):

boost::filesystem::path normalize(const boost::filesystem::path &path) {
    boost::filesystem::path absPath = absolute(path);
    boost::filesystem::path::iterator it = absPath.begin();
    boost::filesystem::path result = *it++;

    // Get canonical version of the existing part
    for (; exists(result / *it) && it != absPath.end(); ++it) {
        result /= *it;
    }
    result = canonical(result);

    // For the rest remove ".." and "." in a path with no symlinks
    for (; it != absPath.end(); ++it) {
        // Just move back on ../
        if (*it == "..") {
            result = result.parent_path();
        }
        // Ignore "."
        else if (*it != ".") {
            // Just cat other path entries
            result /= *it;
        }
    }

    // Make sure the dir separators are correct even on Windows
    return result.make_preferred();
}

Solution 3

Your complaints and/or wishes about canonical have been addressed by Boost 1.60 [1] with

path lexically_normal(const path& p);

Solution 4

the explanation is at http://www.boost.org/doc/libs/1_40_0/libs/filesystem/doc/design.htm :

Work within the realities described below.

Rationale: This isn't a research project. The need is for something that works on today's platforms, including some of the embedded operating systems with limited file systems. Because of the emphasis on portability, such a library would be much more useful if standardized. That means being able to work with a much wider range of platforms that just Unix or Windows and their clones.

where the "reality" applicable to removal of normalize is:

Symbolic links cause canonical and normal form of some paths to represent different files or directories. For example, given the directory hierarchy /a/b/c, with a symbolic link in /a named x pointing to b/c, then under POSIX Pathname Resolution rules a path of "/a/x/.." should resolve to "/a/b". If "/a/x/.." were first normalized to "/a", it would resolve incorrectly. (Case supplied by Walter Landry.)

the library cannot really normalize a path without access to the underlying filesystems, which makes the operation a) unreliable b) unpredictable c) wrong d) all of the above

Solution 5

It's still there. Keep using it.

I imagine they deprecated it because symbolic links mean that the collapsed path isn't necessarily equivalent. If c:\full\path were a symlink to c:\rough, then c:\full\path\.. would be c:\, not c:\full.

Share:
30,464
Mike Willekes
Author by

Mike Willekes

Updated on May 04, 2020

Comments

  • Mike Willekes
    Mike Willekes almost 4 years

    We are using boost::filesystem in our application. I have a 'full' path that is constructed by concatenating several paths together:

    #include <boost/filesystem/operations.hpp>
    #include <iostream>
         
    namespace bf = boost::filesystem;
    
    int main()
    {
        bf::path root("c:\\some\\deep\\application\\folder");
        bf::path subdir("..\\configuration\\instance");
        bf::path cfgfile("..\\instance\\myfile.cfg");
    
        bf::path final ( root / subdir / cfgfile);
    
        cout << final.file_string();
    }
    

    The final path is printed as:

    c:\some\deep\application\folder\..\configuration\instance\..\instance\myfile.cfg
    

    This is a valid path, but when I display it to the user I'd prefer it to be normalized. (Note: I'm not even sure if "normalized" is the correct word for this). Like this:

    c:\some\deep\application\configuration\instance\myfile.cfg
    

    Earlier versions of Boost had a normalize() function - but it seems to have been deprecated and removed (without any explanation).

    Is there a reason I should not use the BOOST_FILESYSTEM_NO_DEPRECATED macro? Is there an alternative way to do this with the Boost Filesystem library? Or should I write code to directly manipulating the path as a string?

  • Kieveli
    Kieveli over 14 years
    I think wanting to normalize the path is sane, natural, and expected behaviour. Looks like they have over-thought this one and erred on the side of wrong.
  • just somebody
    just somebody over 14 years
    Boost.Filesystem aiming at inclusion in the C++ standard, which is why they removed the features that are useful on some of the platforms. there's already a de-facto and de-iure standard for the feature you're longing, its realpath() in POSIX: The realpath() function shall derive, from the pathname pointed to by file_name, an absolute pathname that resolves to the same directory entry, whose resolution does not involve '.' , '..' , or symbolic links. % cd /home/foo/tmp % ln -s foo .. % echo $PWD/foo/.. /home/foo/tmp/foo/.. % realpath $PWD/foo/.. /home/foo
  • Matthieu M.
    Matthieu M. over 14 years
    This part of symbolic links always bugged me, that's quite a violation of the Principle of Least Astonishment :/
  • just somebody
    just somebody over 14 years
    which part? AFAICS "this part" is the whole point of symlinks, no?
  • Mike Willekes
    Mike Willekes over 14 years
    At the very least the macro to re-enable this functionality should have been called BOOST_FILESYSTEM_NOT_NECESSARILY_PORTABLE (or something like that). Calling the code 'deprecated' makes one think that it could be dropped from a future release.
  • Krish1992
    Krish1992 about 13 years
    Sucks majorly, interesting to claim "this isn't a research project" and then pretty much directly after come up with an excuse which leads everyone to believe that it is. Surely a better solution would've been to just implement it in terms of for example realpath() on posix, and whatever is needed on windows, and then on unsupported platforms throw an exception?
  • just somebody
    just somebody over 11 years
    not sure why this answer has gotten a downvote as it's a copy/paste straight from the horse's mouth.
  • jarzec
    jarzec over 11 years
    Sorry, a ++ was missing in line 4 above.
  • jarzec
    jarzec about 11 years
    canonical works only for existing files. I needed something that also works for non-existing paths (canonical is used by normalize for the existing bit of the path).
  • zett42
    zett42 almost 5 years
    Note that boost::filesystem::canonicalize() requires a path, that actually exists in the filesystem. So you cannot use it to normalize a path, that may point to a filesystem item that currently does not exist, such as a path on a removable medium or disconnected network drive. In these cases the function will report an error. Compare with boost::filesystem::path::lexically_normal
  • zett42
    zett42 almost 5 years
    This doesn't really answer the question.
  • void.pointer
    void.pointer over 4 years
    This doesn't work right on Windows. If I pass in "E:\\foo\\.\\bar", I get back "E:/foo\\bar". The slashes are inconsistent. Change the return expression to return result.make_preferred() and it fixes the issue. Now I get "E:\\foo\\bar".
  • jarzec
    jarzec over 4 years
    @void.pointer Thanks a lot. I ever had chance to test this on Windows.
  • Evgen
    Evgen about 4 years
    Typo in "make_prefered()" in the example. Also note that canonical has problems with Windows links and junctions, at least as of Boost 1.72. See github.com/boostorg/filesystem/issues
  • Evgen
    Evgen about 4 years
    Note that canonical has problems with Windows links and junctions, at least as of Boost 1.72. See github.com/boostorg/filesystem/issues Same for weakly_canonical and read_symlink
  • jarzec
    jarzec almost 4 years
    @Evgen Thanks. I fixed the typo.