Extracting tokens from a line of text
Solution 1
UPDATE Please note that making an array this way is suitable only when IFS is a single non-whitespace character and there are no multiple-consecutive delimiters in the data string.
For a way around this issue, and a similar solution, go to this Unix & Linux question ... (and it is worth the read just to get more of an insight into IFS.
Use bash (and other POSIX shells, e.g. ash, ksh, zsh)'s IFS
(Internal Field Seperator).
Using IFS avoids an external call, and it simply allows for embeded spaces.
# ==============
A='token0:token1:token2.y token2.z '
echo normal. $A
# Save IFS; Change IFS to ":"
SFI=$IFS; IFS=: ##### This is the important bit part 1a
set -f ##### ... and part 1b: disable globbing
echo changed $A
B=($A) ### this is now parsed at : (not at the default IFS whitespace)
echo B...... $B
echo B[0]... ${B[0]}
echo B[1]... ${B[1]}
echo B[2]... ${B[2]}
echo B[@]... ${B[@]}
# Reset the original IFS
IFS=$SFI ##### Important bit part 2a
set +f ##### ... and part 2b
echo normal. $A
# Output
normal. token0:token1:token2.y token2.z
changed token0 token1 token2.y token2.z
B...... token0
B[0]... token0
B[1]... token1
B[2]... token2.y token2.z
B[@]... token0 token1 token2.y token2.z
normal. token0:token1:token2.y token2.z
Solution 2
There are major two approaches. One is IFS
, demonstrated by fred.bear. This has the advantage of not requiring a separate process, but it can be tricky to get right when your input might have characters that have special meaning to the shell. The other approach is to use a text processing utility. Field splitting is built into awk
.
input="token1;token2;token3;token4"
awk -vinput="$input" 'BEGIN {
count = split(input, a, ";");
print "first field: " a[1];
print "second: field" a[2];
print "number of fields: " count;
exit;
}'
Awk is particularly appropriate when processing multiple inputs.
command_producing_semicolon_separated_data |
awk -F ';' '{
print "first field: " $1;
print "second field: " $2;
print "number of fields: " NF;
}'
Related videos on Youtube
Jas
Updated on September 17, 2022Comments
-
Jas over 1 year
Using bash scripting and grep/awk/sed, how can I split a line matching a known pattern with a single character delimiter into an array, e.g. convert
token1;token2;token3;token4
intoa[0] = token1
…a[3]=token4
?-
alex about 13 yearsYou answer yourself with the question tags: sed, awk, regex :)
-
Jas about 13 years@Patkos - bash scripting + grep/awk/sed , whichever works best...
-
Kusalananda over 5 yearsUnclear: It is unclear whether
a[0]
,a[1]
etc. refers to an array in the shell or inawk
.
-
-
ddeimeke about 13 yearsWhat do you do if token2 contains a whitespace?
-
Smiley about 13 yearsIn that case you better take the approach as fred.bear has suggested. However, please remember to restore your
IFS
to the original value in that case. -
Admin about 13 yearsDo NOT underestimate the importance of "Important bit part 2". I've seen extraordinarily hard to debug problems arise from getting Important bit part 2 wrong.
-
Peter.O about 13 years@Gilles: Your mod to the code (
set -f
,set +f
) puzzles me; I don't see the connection between field seperators and globbing, but I'm happy to learn.. I am even more puzzled by the fact that when I introduce" * "
to the first line, I get globbing ofecho normal. $A
which is the normal expectation.. However what has me completely baffled is that I get no globbing in any of the present lines when IFS=; This applies whether globbing is on or off.. And with globbing off in the same block, a new lineecho *
does expand!.. What's going on here? Globing and no globbing together. -
Gilles 'SO- stop being evil' about 13 years@fred.bear: It's not about separators, it's about unprotected variable substitution (
$A
). Two things happen to$A
: field splitting (onIFS
) and pathname expansion (globbing). Comparesh -c 'set -f; echo $0' '/*'
withsh -c 'echo $0' '/*'
. I don't know what precise command has you confused, post a standalone example if you want me to look at it. -
Peter.O about 13 yearsYou really got me thinking this time!... and I've finally worked it out! ... The seemingly eratic behaviour I observed comes from "observational habit" (If it looks like a duck, quacks like a duck, and walks like a duck, it's a duck! ... however all bets are off with space (the duck) when IFS=: ... the space may still be used by people as a visual delimiter, but globbing sees it only as just another character, and globbing needs a delimiter!... So
echo *
will expand "normally", butA=' *'
;echo $A` will only expand for a file whose name has a leading space... Mystery unravelled! ;) -
Peter.O about 13 yearsPS... but I still don't see why I need to turn globbing off... (must go now.. I'll think about it as I drive... and read your reference links later...
-
Peter.O about 13 years@Gilles: If I'm wrong here, please let me know... (To glob or not to glob; that is the question) ... The anser is "Turn it off!", in all cases, unless you specifically and quite intentionally need it... As I've just found out, IFS can be deceptive because of its unusual/unfamiliar behaviour... I was focusing my question about globbing to the specific data in this example (which won't glob)... but now I'm a convert... globbing off (in the vast majority of cases)...
-
Michael Mrozek about 13 yearsThere's a suggested code change pending here; I'll let you approve/reject it