Is there something like JavaScript's "split()" in the shell?
Solution 1
Bourne/POSIX-like shells have a split+glob operator and it's invoked every time you leave a parameter expansion ($var
, $-
...), command substitution ($(...)
), or arithmetic expansion ($((...))
) unquoted in list context.
Actually, you invoked it by mistake when you did for name in ${array[@]}
instead of for name in "${array[@]}"
. (Actually, you should beware that invoking that operator like that by mistake is source of many bugs and security vulnerabilities).
That operator is configured with the $IFS
special parameter (to tell what characters to split on (though beware that space, tab and newline receive a special treatment there)) and the -f
option to disable (set -f
) or enable (set +f
) the glob
part.
Also note that while the S
in $IFS
was originally (in the Bourne shell where $IFS
comes from) for Separator, in POSIX shells, the characters in $IFS
should rather be seen as delimiters or terminators (see below for an example).
So to split on _
:
string='var1_var2_var3'
IFS=_ # delimit on _
set -f # disable the glob part
array=($string) # invoke the split+glob operator
for i in "${array[@]}"; do # loop over the array elements.
To see the distinction between separator and delimiter, try on:
string='var1_var2_'
That will split it into var1
and var2
only (no extra empty element).
So, to make it similar to JavaScript's split()
, you'd need an extra step:
string='var1_var2_var3'
IFS=_ # delimit on _
set -f # disable the glob part
temp=${string}_ # add an extra delimiter
array=($temp) # invoke the split+glob operator
(note that it would split an empty $string
into 1 (not 0) element, like JavaScript's split()
).
To see the special treatments tab, space and newline receive, compare:
IFS=' '; string=' var1 var2 '
(where you get var1
and var2
) with
IFS='_'; string='_var1__var2__'
where you get: ''
, var1
, ''
, var2
, ''
.
Note that the zsh
shell doesn't invoke that split+glob operator implicitly like that unless in sh
or ksh
emulation. There, you have to invoke it explicitely. $=string
for the split part, $~string
for the glob part ($=~string
for both), and it also has a split operator where you can specify the separator:
array=(${(s:_:)string})
or to preserve the empty elements:
array=("${(@s:_:)string}")
Note that there s
is for splitting, not delimiting (also with $IFS
, a known POSIX non-conformance of zsh
). It's different from JavaScript's split()
in that an empty string is split into 0 (not 1) element.
A notable difference with $IFS
-splitting is that ${(s:abc:)string}
splits on the abc
string, while with IFS=abc
, that would split on a
, b
or c
.
With zsh
and ksh93
, the special treatment that space, tab or newline receive can be removed by doubling them in $IFS
.
As a historic note, the Bourne shell (the ancestor or modern POSIX shells) always stripped the empty elements. It also had a number of bugs related to splitting and expansion of $@ with non-default values of $IFS
. For instance IFS=_; set -f; set -- $@
would not be equivalent to IFS=_; set -f; set -- $1 $2 $3...
.
Splitting on regexps
Now for something closer to JavaScript's split()
that can split on regular expressions, you'd need to rely on external utilities.
In the POSIX tool-chest,awk
has a split
operator that can split on extended regular expressions (those are more or less a subset of the Perl-like regular expressions supported by JavaScript).
split() {
awk -v q="'" '
function quote(s) {
gsub(q, q "\\" q q, s)
return q s q
}
BEGIN {
n = split(ARGV[1], a, ARGV[2])
for (i = 1; i <= n; i++) printf " %s", quote(a[i])
exit
}' "$@"
}
string=a__b_+c
eval "array=($(split "$string" '[_+]+'))"
The zsh
shell has builtin support for Perl-compatible regular expressions (in its zsh/pcre
module), but using it to split a string, though possible is relatively cumbersome.
Solution 2
Yes, use IFS
and set it to _
. Then use read -a
to store into an array (-r
turns off backslash expansion). Note that this is specific to bash; ksh and zsh have similar features with slightly different syntax, and plain sh doesn't have array variables at all.
$ r="var1_var2_var3"
$ IFS='_' read -r -a array <<< "$r"
$ for name in "${array[@]}"; do echo "+ $name"; done
+ var1
+ var2
+ var3
From man bash
:
read
-a aname
The words are assigned to sequential indices of the array variable aname, starting at 0. aname is unset before any new values are assigned. Other name arguments are ignored.
IFS
The Internal Field Separator that is used for word splitting after expansion and to split lines into words with the read builtin command. The default value is ``''.
Note that read
stops at the first newline. Pass -d ''
to read
to avoid that, but in that case, there will be an extra newline at the end due to the <<<
operator. You can remove it manually:
IFS='_' read -r -d '' -a array <<< "$r"
array[$((${#array[@]}-1))]=${array[$((${#array[@]}-1))]%?}
Related videos on Youtube
![Tommy](https://lh3.googleusercontent.com/-DUshWzaXQHs/AAAAAAAAAAI/AAAAAAAADXw/4eCJN41Q1wQ/photo.jpg?sz=256)
Tommy
Something for nothing. Bite me if you can score 9+ in a CPS Test.
Updated on September 18, 2022Comments
-
Tommy almost 2 years
It's very easy to use
split()
in JavaScript to break a string into an array.What about shell script?
Say I want to do this:
$ script.sh var1_var2_var3
When the user give such string
var1_var2_var3
to the script.sh, inside the script it will convert the string into an array likearray=( var1 var2 var3 ) for name in ${array[@]}; do # some code done
-
gwillie almost 9 yearswhat
shell
are you using, withbash
you can doIFS='_' read -a array <<< "${string}"
-
Sobrique almost 9 years
perl
can do that too. It's not "pure" shell, but it's quite common. -
Sobrique almost 9 yearsI tend to work on 'is it probably installed on my linux box by default' and don't fret the minutiae :)
-
-
Stéphane Chazelas almost 9 yearsThat assumes
$r
doesn't contain newline characters or backslashes. Also note that it will only work in recent versions of thebash
shell. -
cuonglm almost 9 yearsIs there any reason for special treatments with tab, space and newline?
-
Stéphane Chazelas almost 9 years@cuonglm, generally you want to split on words when the delimiters are blanks, in the case of non-blank delimiters (like to split
$PATH
on:
) on the contrary, you generally want to preserve empty elements. Note that in the Bourne shell, all characters were receiving the special treatment,ksh
changed that to have only the blank ones (only space, tab and newline though) treated specially. -
fedorqui almost 9 years@StéphaneChazelas good point. Yes, this is the "basic" case of a string. For the rest, everyone should go for your comprehensive answer. Regarding the versions of
bash
,read -a
was introduced in bash 4, right? -
cuonglm almost 9 yearsWell, the recent added Bourne shell note surprised me. And for completing, should you add the note for
zsh
treatment with string contains 2 or more characters in${(s:string:)var}
? If added, I can delete my answer :) -
Stéphane Chazelas almost 9 yearssorry my bad, I thought
<<<
was added only recently tobash
but it seems it's been there since 2.05b (2002).read -a
is even older than that.<<<
comes fromzsh
and is supported byksh93
(and mksh and yash) as well butread -a
is bash-specific (it's-A
in ksh93, yash and zsh). -
fedorqui almost 9 years@StéphaneChazelas is there any "easy" way to find when these changes happened? I say "easy" not to dig into the release files, maybe a page showing them all.
-
Stéphane Chazelas almost 9 yearsI look at change logs for that. zsh also has a git repository with history as far back as 3.1.5 and its mailing list is used for tracking changes as well.
-
terdon almost 9 yearsWhat do you mean by "Also note that the S in $IFS is for Delimiter, not Separator."? I understand the mechanics and that it ignores trailing separators but the
S
stands for Separator, not delimiter. At least, that's what my bash's manual says. -
Stéphane Chazelas almost 9 years@terdon,
$IFS
comes from the Bourne shell where it was separator, ksh changed the behaviour without changing the name. I mention that to stress thatsplit+glob
(except in zsh or pdksh) doesn't simply split anymore. -
Stéphane Chazelas almost 9 years@Gilles, note that
bash
now supports${array[-1]}
likezsh
(also as lvalue). Older versions also support${array[@]: -1}
likeksh93
. Those also work for sparse arrays. -
fra-san almost 4 years(I'm looking for a clear explanation of the difference between "delimiter" and "separator", which seems surprisingly hard to find.) Would it be correct to say that the
IFS
characters in the Bourne shell were separators because in that shell the empty elements were always stripped? And that, conversely, in POSIX shells they are delimiters/terminators because any single instance of them delimits (terminates) a possibly empty element? -
Stéphane Chazelas almost 4 years@fra-san, in the Bourne shell,
:a::b:
withIFS=:
was split intoa
andb
. In shells that treatIFS
as separator, it's split into""
,a
,""
,b
and""
. In shells that treat it as delimiter, same without the last""
. That also applied toread
. See Shell read *sometimes* strips trailing delimiter