dirname and basename vs parameter expansion

155

Solution 1

Both have their quirks, unfortunately.

Both are required by POSIX, so the difference between them isn't a portability concern¹.

The plain way to use the utilities is

base=$(basename -- "$filename")
dir=$(dirname -- "$filename")

Note the double quotes around variable substitutions, as always, and also the -- after the command, in case the file name begins with a dash (otherwise the commands would interpret the file name as an option). This still fails in one edge case, which is rare but might be forced by a malicious user²: command substitution removes trailing newlines. So if a filename is called foo/bar␤ then base will be set to bar instead of bar␤. A workaround is to add a non-newline character and strip it after the command substitution:

base=$(basename -- "$filename"; echo .); base=${base%.}
dir=$(dirname -- "$filename"; echo .); dir=${dir%.}

With parameter substitution, you don't run into edge cases related to expansion of weird characters, but there are a number of difficulties with the slash character. One thing that is not an edge case at all is that computing the directory part requires different code for the case where there is no /.

base="${filename##*/}"
case "$filename" in
  */*) dirname="${filename%/*}";;
  *) dirname=".";;
esac

The edge case is when there's a trailing slash (including the case of the root directory, which is all slashes). The basename and dirname commands strip off trailing slashes before they do their job. There's no way to strip the trailing slashes in one go if you stick to POSIX constructs, but you can do it in two steps. You need to take care of the case when the input consists of nothing but slashes.

case "$filename" in
  */*[!/]*)
    trail=${filename##*[!/]}; filename=${filename%%"$trail"}
    base=${filename##*/}
    dir=${filename%/*};;
  *[!/]*)
    trail=${filename##*[!/]}
    base=${filename%%"$trail"}
    dir=".";;
  *) base="/"; dir="/";;
esac

If you happen to know that you aren't in an edge case (e.g. a find result other than the starting point always contains a directory part and has no trailing /) then parameter expansion string manipulation is straightforward. If you need to cope with all the edge cases, the utilities are easier to use (but slower).

Sometimes, you may want to treat foo/ like foo/. rather than like foo. If you're acting on a directory entry then foo/ is supposed to be equivalent to foo/., not foo; this makes a difference when foo is a symbolic link to a directory: foo means the symbolic link, foo/ means the target directory. In that case, the basename of a path with a trailing slash is advantageously ., and the path can be its own dirname.

case "$filename" in
  */) base="."; dir="$filename";;
  */*) base="${filename##*/}"; dir="${filename%"$base"}";;
  *) base="$filename"; dir=".";;
esac

The fast and reliable method is to use zsh with its history modifiers (this first strips trailing slashes, like the utilities):

dir=$filename:h base=$filename:t

¹ Unless you're using pre-POSIX shells like Solaris 10 and older's /bin/sh (which lacked parameter expansion string manipulation features on machines still in production — but there's always a POSIX shell called sh in the installation, only it's /usr/xpg4/bin/sh, not /bin/sh).
² For example: submit a file called foo␤ to a file upload service that doesn't protect against this, then delete it and cause foo to be deleted instead

Solution 2

Both are in POSIX, so portability "should" be of no concern. The shell substitutions should be presumed to run faster.

However - it depends on what you mean by portable. Some (not necessariy) old systems did not implement those features in their /bin/sh (Solaris 10 and older come to mind), while on the other hand, a while back, developers were cautioned that dirname was not as portable as basename.

For reference:

In considering portability, I would have to take into account all of the systems where I maintain programs. Not all are POSIX, so there are tradeoffs. Your tradeoffs may differ.

Solution 3

There is also:

mkdir '
';    dir=$(basename ./'
');   echo "${#dir}"

0

Weird stuff like that happens because there's a lot of interpreting and parsing and the rest that needs to happen when two processes talk. Command substitutions will strip trailing newlines. And NULs (though that's obviously not relevant here). basename and dirname will also strip trailing newlines in any case because how else do you talk to them? I know, trailing newlines in a filename are kind of anathema anyway, but you never know. And it doesn't make sense to go the possibly flawed way when you could do otherwise.

Still... ${pathname##*/} != basename and likewise ${pathname%/*} != dirname. Those commands are specified to carry out a mostly well-defined sequence of steps to arrive at their specified results.

The spec is below, but first here's a terser version:

basename()
    case   $1   in
    (*[!/]*/)     basename         "${1%"${1##*[!/]}"}"   ${2+"$2"}  ;;
    (*/[!/]*)     basename         "${1##*/}"             ${2+"$2"}  ;;
  (${2:+?*}"$2")  printf  %s%b\\n  "${1%"$2"}"       "${1:+\n\c}."   ;;
    (*)           printf  %s%c\\n  "${1##///*}"      "${1#${1#///}}" ;;
    esac

That's a fully POSIX compliant basename in simple sh. It's not difficult to do. I merged a couple branches I use below there because I could without affecting results.

Here's the spec:

basename()
    case   $1 in
    ("")            #  1. If  string  is  a null string, it is 
                    #     unspecified whether the resulting string
                    #     is '.' or a null string. In either case,
                    #     skip steps 2 through 6.
                  echo .
     ;;             #     I feel like I should flip a coin or something.
    (//)            #  2. If string is "//", it is implementation-
                    #     defined whether steps 3 to 6 are skipped or
                    #     or processed.
                    #     Great. What should I do then?
                  echo //
     ;;             #     I guess it's *my* implementation after all.
    (*[!/]*/)       #  3. If string consists entirely of <slash> 
                    #     characters, string shall be set to a sin‐
                    #     gle <slash> character. In this case, skip
                    #     steps 4 to 6.
                    #  4. If there are any trailing <slash> characters
                    #     in string, they shall be removed.
                  basename "${1%"${1##*[!/]}"}" ${2+"$2"}  
      ;;            #     Fair enough, I guess.
     (*/)         echo /
      ;;            #     For step three.
     (*/*)          #  5. If there are any <slash> characters remaining
                    #     in string, the prefix of string up to and 
                    #     including the last <slash> character in
                    #     string shall be removed.
                  basename "${1##*/}" ${2+"$2"}
      ;;            #      == ${pathname##*/}
     ("$2"|\
      "${1%"$2"}")  #  6. If  the  suffix operand is present, is not
                    #     identical to the characters remaining
                    #     in string, and is identical to a suffix of
                    #     the characters remaining  in  string, the
                    #     the  suffix suffix shall be removed from
                    #     string.  Otherwise, string is not modi‐
                    #     fied by this step. It shall not be
                    #     considered an error if suffix is not 
                    #     found in string.
                  printf  %s\\n "$1"
     ;;             #     So far so good for parameter substitution.
     (*)          printf  %s\\n "${1%"$2"}"
     esac           #     I probably won't do dirname.

...maybe the comments are distracting....

Solution 4

You can get a boost from in-process basename and dirname (I don't understand why these aren't builtins -- if these aren't candidates, I don't know what is) but the implementation needs to handle things like:

path         dirname    basename
"/usr/lib"    "/usr"    "lib"
"/usr/"       "/"       "usr"
"usr"         "."       "usr"
"/"           "/"       "/"
"."           "."       "."
".."          "."       ".."

^From basename(3)

and other edge cases.

I've been using:

basename(){ 
  test -n "$1" || return 0
  local x="$1"; while :; do case "$x" in */) x="${x%?}";; *) break;; esac; done
  [ -n "$x" ] || { echo /; return; }
  printf '%s\n' "${x##*/}"; 
}

dirname(){ 
  test -n "$1" || return 0
  local x="$1"; while :; do case "$x" in */) x="${x%?}";; *) break;; esac; done
  [ -n "$x" ] || { echo /; return; }
  set -- "$x"; x="${1%/*}"
  case "$x" in "$1") x=.;; "") x=/;; esac
  printf '%s\n' "$x"
}

( My latest implementation of GNU basename and dirname adds some special fancy command line switches for stuff such as handling multiple arguments or suffix stripping, but that's super easy to add in the shell. )

It's not that difficult to make these into bash builtins either (by making use of the underlying system implementation), but the above function need not be compiled, and they provide some boost also.

Share:
155

Related videos on Youtube

tomet
Author by

tomet

Updated on September 18, 2022

Comments

  • tomet
    tomet almost 2 years

    Using Iron Router, I want to access specific data depending on the page the user is on. This tutorial uses a function of Iron Router called "data" which provides data depending on what page the user is on. But there is also another method using waitOn and subscribing.

    What exactly is the difference between those methods?

  • tomet
    tomet almost 10 years
    Thank you for your answer, that already helps a lot. I have read the article you recommended but I'm still not so sure what the data-property does and whether or not it is necessary to subscribe to a collection before using it for data.
  • Kuba Wyrobek
    Kuba Wyrobek almost 10 years
    Remove package autopublish by executing : mrt remove autopublish, then don't subscribe in waitOn and then try to exectute Authors.find() in data. You will not receive any data, as they are not send to client with subscription.
  • Wildcard
    Wildcard over 8 years
    The list of edge cases is actually very helpful. Those are all very good points. The list actually seems fairly complete; are there really any other edge cases?
  • PSkocik
    PSkocik over 8 years
    My former implementation didn't handle things like x// correctly, but I've fixed for you before answering. I hope that's it.
  • PSkocik
    PSkocik over 8 years
    You can run a script to compare what the functions and and the executables do on these examples. I'm getting a 100% match.
  • Wildcard
    Wildcard over 8 years
    Wow, good point about trailing newlines in filenames. What a can of worms. I don't think I really understand your script, though. I've never seen [!/] before, is that like [^/]? But your comment alongside that doesn't seem to match it....
  • mikeserv
    mikeserv over 8 years
    @Wildcard - well.. it's not my comment. That's the standard. The POSIX spec for basename is a set of instructions on how to do it with your shell. But [!charclass] is the portable way to do that with globs [^class] is for regex - and shells aren't spec'd for regex. About the matching the comment... case filters, so if I match a string which contains a trailing slash / and a !/ then if the next case pattern below matches any trailing / slashes at all they can only be all slashes. And one below that can't have any trailing /
  • Wildcard
    Wildcard over 8 years
    Wow. So it sounds like (in any POSIX shell) the most robust way is the second one you mention? base=$(basename -- "$filename"; echo .); base=${base%.}; dir=$(dirname -- "$filename"; echo .); dir=${dir%.}? I was reading carefully and I didn't notice you mentioning any drawbacks.
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' over 8 years
    @Wildcard A drawback is that it treats foo/ like foo, not like foo/., which isn't consistent with POSIX-compliant utilities.
  • Wildcard
    Wildcard over 8 years
    Got it, thanks. I think I still prefer that method because I would know if I'm trying to deal with directories and I could just tack on (or "tack back on") a trailing / if I need it.
  • Tavian Barnes
    Tavian Barnes over 5 years
    "e.g. a find result, which always contains a directory part and has no trailing /" Not quite true, find ./ will output ./ as the first result.
  • Sam Thomas
    Sam Thomas over 5 years
    @Gilles The newline character example just blew my mind. Thanks for the answer
  • Stéphane Chazelas
    Stéphane Chazelas about 4 years
    You can remove trailing / in one go POSIXly with ${p%"${p##*[!/]}"}
  • mtraceur
    mtraceur about 3 years
    "Required by POSIX" doesn't guarantee real-world portability, sadly. I've seen systems without basename and dirname commands. Routers or phones, mostly.