What's a safe and portable way to split a string in shell programming?
Solution 1
Just set IFS
according to you needs and let the shell perform word splitting:
IFS=':'
for dir in $PATH; do
[ -x "$dir"/"$1" ] && echo $dir
done
This works in bash
, dash
and ksh
, but tested only with the latest versions.
Solution 2
The obvious solution would be to use the shell word splitting, but beware of a few gotchas:
IFS=:
set -o noglob
for dir in $PATH''; do
dir=${dir:-.}
[ -x "${dir%/}/$1" ] && printf "%s\n" "$dir"
done
You need set -o noglob
because when a variable is left unquoted, both word splitting and filename generation (globbing) are performed on it and here you only want word splitting (for instance, in the unlikely event that $PATH
contains /usr/local/*bin*
, you want it do look in the /usr/local/*bin*
folder, not in /usr/local/bin
and /usr/local/sbin
..., and if PATH
contains /*/*/*/../../../*/*/*/*/../../../*/*/*/*
, you don't want it to bring your machine down)
An empty $PATH
component means the current directory (.
), not /
. $dir/$1
wouldn't be correct in that case. The work around is either to write $dir${dir:+/}$1
or to change $dir
to .
in that case (which gives a more useful output when displayed with printf '%s\n' "$dir"
.
//foo
is not necessarily the same as /foo
, so if /
is in $PATH
, you don't want $dir/$1
, which would be //$1
. Hence the ${dir%/}
to strip a trailing slash.
Then, there are a few other problems:
For $PATH
, ":"
is a field separator while for $IFS
, it is a field terminator (yes, I know, S
is for Separator, blame ksh and POSIX for standardizing the ksh behaviour).
So if $PATH
is /usr/bin:/bin:
(which is bad practice but still commonly found), that means "/usr/bin"
, "/bin"
and ""
(that is, the current directory), while the shell word splitting (all POSIX shells except zsh
) will split that into /usr/bin
and /bin
only.
If $PATH
is set but empty, that means: "look in the current directory only".
While shells (including those that treat $IFS
as a separator) will expand it to an empty list.
Appending the ''
to $PATH
above works around both issues.
Last but not least. If $PATH
is unset, then that has a special meaning which is: look in the system default search list, which unfortunately means something different depending on who (what command) you ask.
$ env -u PATH bash -c 'type usbipd'
usbipd is /usr/local/sbin/usbipd
$ env -u PATH ksh -c 'type usbipd'
ksh: whence: usbipd: not found
And basically, in your script, you'd have to guess what that default search path is in the context that matters to you.
Note that POSIX leaves the behaviour unspecified when $PATH
is unset or empty, so won't help you there. That also means that what I said above may not apply to some past, current or future POSIX/Unix systems.
In short, parsing $PATH
to try and find out where a command would be run from is a tricky business.
There is a standard command for that, which is command
:
ls_path=$(command -v ls)
But what one may ask is: why do you want to know?
Now onto restoring IFS to its default value:
oldIFS=$IFS
IFS=:
...
IFS=$oldIFS
will work in practice in most cases but is not guaranteed to work by POSIX.
The reason is that if $IFS
was previously unset which means default splitting behaviour (that is in POSIX shells, split on space, tab or newline), after those commands, it will end up set but empty (which means no splitting).
Another potential problem is if you generalise that approach and use it in a lot of different functions, then if in the ...
part above, you're calling a function that does the same thing (makes a copy of $IFS
in $oldIFS
), then you're going to loose the original $oldIFS
and restore the wrong $IFS
.
Instead you could use subshells when possible:
(
IFS=:
...
)
# only the subshell's IFS was affected, the parent still has its own IFS
My approach is to set $IFS (and turn set -o noglob
on or off) every time I need word splitting (which is rare) and not bother restoring the previous value. Of course, that doesn't work if your script calls someone else's code that doesn't follow that practice and assumes a default word splitting behaviour.
Related videos on Youtube
rahmu
Updated on September 18, 2022Comments
-
rahmu over 1 year
When writing a shell script, I often want to split a string. Here's a very simple example:
for dir in $(echo $PATH | tr : " "); do [[ -x "$dir"/"$1" ]] && echo $dir done
This will search each directory in the $PATH for an executable with the same name as
$1
. Pretty straightforward, it runs well, but breaks if a directory in my $PATH contains a whitespace in its name.What's the recommended way to split a string at the occurrence of a recurrent separator?
Ideally, the solution would be able to run on (fairly) old shells, namely ksh88.
-
Admin about 11 yearsSee How to iterate through a comma-separated list and execute a command for each entry (which doesn't address the specificities of
$PATH
).
-
-
rahmu about 11 yearsThanks! How can I set back
IFS
to its original default values, once the processing is done? -
rahmu about 11 yearsNevermind, I store the default value of
IFS
in a temp variable, which allows me to restore IFS easily. Thank you for the answer. -
manatwork about 11 yearsEither that, or force the shell to execute the given piece of code in a separate shell instance:
(IFS=:; for … done)
. Of course, this is useful only if you not need anything later from whatever was set inside the loop. -
rahmu about 11 yearsParsing
$PATH
was the simplest short example I could come up with. Splitting a string with non-whitespace delimiters is a common problem I run into. I wanted to know how members here dealt with it in a robust and portable way. -
Stéphane Chazelas about 11 yearsWell, at least, you'll have learnt that if the string ends with a delimiter, you won't get an empty element, and that you need
set -f
to avoid the other side effect of leaving a variable unquoted. A lot of this applies to other variables of the same form like$MANPATH
,$LD_LIBRARY_PATH
... -
rahmu about 11 yearsYes, definitely. Thank you very much for the answer :)
-
Stéphane Chazelas about 11 years@ruakh, while it is possible and allowed by POSIX, and it would make sense for a shell to have
$IFS
unset by default, it is not the case in any shell that I know. All the Bourne like shells I know haveIFS=$' \t\n'
in their initial IFS (with the exception ofzsh
which also has\0
(since it can)) -
Gilles 'SO- stop being evil' about 11 years
PATH="$PWD/*:/bin:/usr/bin"; mkdir \*; cp /bin/ls \*/foo
and try your snippet withls
. You missedset -f
, see Stephane Chazelas's answer. -
chepner about 11 yearsThere may be a better way, but you can distinguish between a null parameter and an unset parameter by comparing
${FOO:-x}
and${FOO-x}
. The two are equivalent for an unset parameter, but not a null parameter. -
Stéphane Chazelas about 11 years@chepner, a common trick to save and restore IFS, is to write it
oIFS=$IFS; ${IFS+:} unset oIFS
and then same to restore:IFS=$oIFS; ${oIFS+:} unset IFS
, but there's still an issue if there are nested functions using that trick. -
chepner about 11 yearsClever; it took me a moment to parse that.