using awk to split a line on single spaces not multiples

35,099

Solution 1

The LINE parameter isn't quoted so wordsplitting happens upon the expansion of $LINE in echo $LINE and by the time awk receives any input, you have 7 words(as seen by the shell) all separated by a single space. You want echo to output it as one word(again, as seen by the shell) so the whitespace in your line isn't mangled before awk can process it. That is what quoting the parameter prevents.

# How you want it to be given to awk:
$ printf '<%s> ' "$LINE"; echo
<field1 field2 field3 field4 field5 field6   field9> 
# Your attempt:
$ printf '<%s> ' $LINE; echo
<field1> <field2> <field3> <field4> <field5> <field6> <field9> 

Notice how the extra whitespace is gone between field6 and 9.

You should always quote expansions, you will more likely break something by not quoting expansions than by quoting them.

Solution 2

A very useful parameter in awk when dealing with variable input length is the NF one, the number of fields.

lastword=`echo $LINE | awk '{ print $NF }'`

That will always print the last column, irrespective of the missing ones. If some fields in the middle are missing, counting back from the last field works pretty well too.

A sample file where missing/empty columns filled with spaces like in your example:

line1 field1 field2 field3 field4 field5 field6 field7 field8 field9
line2 field1 field2 field3 field4 field5 field6  field8 field9
line3 field1 field2 field3 field4 field5   field8 field9

and

awk '{print $1 " " $2 " " $(NF-1) " " $NF}' file

    line1 field1 field8 field9
    line2 field1 field8 field9
    line3 field1 field8 field9

Solution 3

To do it in ksh93:

set -f
IFS='  ' # two spaces
set -- $LINE
printf '%s\n' "$9"

Doubling the space removes the special behaviour by which sequences of spaces are considered as one and leading and trailing spaces are ignored like in zsh.

Solution 4

In my case I decided to just pipe it through tr first. Just map the whitespaces to a character that's unlikely to appear in our input (in this case the bell code \a):

❯ echo 'a b  d' | tr ' ' '\a' | awk -F'\a' '{print "1="$1, "2="$2, "3="$3, "4="$4}'
1=a 2=b 3= 4=d

Note how the third field $3 is now empty.


Hello again an hour later. 👋

Here's are two better ways without any intermediate translation:

❯ echo 'a b  d' | awk -F'[[:space:]]' '{print "1="$1, "2="$2, "3="$3, "4="$4}'
1=a 2=b 3= 4=d
❯ echo 'a b  d' | awk -vFS='[ ]' '{print "1="$1, "2="$2, "3="$3, "4="$4}'
1=a 2=b 3= 4=d
Share:
35,099
dazedandconfused
Author by

dazedandconfused

Updated on September 18, 2022

Comments

  • dazedandconfused
    dazedandconfused over 1 year

    I'm trying to split a line that I have no control over the format of. If parameter 7 and 8 are missing which is possible they will be replaced by a space so I would end up with,

    field1 field2 field3 field4 field5 field6   field9
    

    At the moment in this situation field 9 is being read as field 7. Much searching has led me to believe that the following should work but it isn't doing. It's probably some minor syntax error on my part but I can't seem to spot it.

    word1=`echo $LINE | awk 'BEGIN { FS="[ ]" } ; { print $9 }'`
    
  • dazedandconfused
    dazedandconfused over 10 years
    Thanks. One day I'll understand awk. Probably about the same time they replace it with something better.
  • Stéphane Chazelas
    Stéphane Chazelas over 10 years
    @dazedandconfused, the problem is not with awk but with you not quoting $LINE passed to echo (and using echo btw)
  • llua
    llua over 10 years
    @dazedandconfused I went into a little more detail with my answer.