using awk to split a line on single spaces not multiples
Solution 1
The LINE parameter isn't quoted so wordsplitting
happens upon the expansion of $LINE
in echo $LINE
and by the time awk
receives any input, you have 7 words
(as seen by the shell) all separated by a single space. You want echo to output it as one word
(again, as seen by the shell) so the whitespace in your line isn't mangled before awk can process it. That is what quoting the parameter prevents.
# How you want it to be given to awk:
$ printf '<%s> ' "$LINE"; echo
<field1 field2 field3 field4 field5 field6 field9>
# Your attempt:
$ printf '<%s> ' $LINE; echo
<field1> <field2> <field3> <field4> <field5> <field6> <field9>
Notice how the extra whitespace is gone between field6 and 9.
You should always quote expansions, you will more likely break something by not quoting expansions than by quoting them.
Solution 2
A very useful parameter in awk when dealing with variable input length is the NF one, the number of fields.
lastword=`echo $LINE | awk '{ print $NF }'`
That will always print the last column, irrespective of the missing ones. If some fields in the middle are missing, counting back from the last field works pretty well too.
A sample file where missing/empty columns filled with spaces like in your example:
line1 field1 field2 field3 field4 field5 field6 field7 field8 field9
line2 field1 field2 field3 field4 field5 field6 field8 field9
line3 field1 field2 field3 field4 field5 field8 field9
and
awk '{print $1 " " $2 " " $(NF-1) " " $NF}' file
line1 field1 field8 field9
line2 field1 field8 field9
line3 field1 field8 field9
Solution 3
To do it in ksh93
:
set -f
IFS=' ' # two spaces
set -- $LINE
printf '%s\n' "$9"
Doubling the space removes the special behaviour by which sequences of spaces are considered as one and leading and trailing spaces are ignored like in zsh
.
Solution 4
In my case I decided to just pipe it through tr
first. Just map the whitespaces to a character that's unlikely to appear in our input (in this case the bell code \a
):
❯ echo 'a b d' | tr ' ' '\a' | awk -F'\a' '{print "1="$1, "2="$2, "3="$3, "4="$4}'
1=a 2=b 3= 4=d
Note how the third field $3
is now empty.
Hello again an hour later. 👋
Here's are two better ways without any intermediate translation:
❯ echo 'a b d' | awk -F'[[:space:]]' '{print "1="$1, "2="$2, "3="$3, "4="$4}'
1=a 2=b 3= 4=d
❯ echo 'a b d' | awk -vFS='[ ]' '{print "1="$1, "2="$2, "3="$3, "4="$4}'
1=a 2=b 3= 4=d
dazedandconfused
Updated on September 18, 2022Comments
-
dazedandconfused over 1 year
I'm trying to split a line that I have no control over the format of. If parameter 7 and 8 are missing which is possible they will be replaced by a space so I would end up with,
field1 field2 field3 field4 field5 field6 field9
At the moment in this situation field 9 is being read as field 7. Much searching has led me to believe that the following should work but it isn't doing. It's probably some minor syntax error on my part but I can't seem to spot it.
word1=`echo $LINE | awk 'BEGIN { FS="[ ]" } ; { print $9 }'`
-
dazedandconfused over 10 yearsThanks. One day I'll understand awk. Probably about the same time they replace it with something better.
-
Stéphane Chazelas over 10 years@dazedandconfused, the problem is not with
awk
but with you not quoting$LINE
passed toecho
(and usingecho
btw) -
llua over 10 years@dazedandconfused I went into a little more detail with my answer.