How do I get awk to NOT use space as a delimeter?

10,488

Solution 1

It's not awk, but the shell (the default value of IFS) that's causing word splitting.

You could fix that by saying:

while read -r i; do
  USERNAME=$(echo "$i" | awk 'BEGIN{FS="[|,:]"} ; {print $1}');
  echo "username: $USERNAME";
done < $INPUT

In order to verify how the shell is reading the input, add

echo "This is a line: ${i}"

in the loop.

Solution 2

You can use any regex field separator in awk, eg using optional comma followed by double quote:

awk -F ',?"' '{print $2, $4, $6, $8, $10, $12, "<" $14 ">"}' f1
john beatles.com arse [email protected] 1 1 <on holiday>
paul beatles.com bung  0 1 <also on holiday>

Enclose last field $14 n < and > to showcase how it gets in a single awk variable.

Share:
10,488
vmos
Author by

vmos

(me about your is currently blank)

Updated on June 04, 2022

Comments

  • vmos
    vmos almost 2 years

    I've got a CSV that I'm trying to process, but some of my fields contain commas, line breaks and spaces and now that I think about it, there's probably some apostrophes in there too.

    For the commas and line breaks, I've converted them to other strings at the output phase and convert them back at the end (yes it's messy but I only need to run this once) I realise that I may have to do this with the spaces too but I've broken the problem down to it's basic parts to see if I can work around it

    Here's an input.csv

    "john","beatles.com","arse","[email protected]","1","1","on holiday"
    "paul","beatles.com","bung","","0","1","also on holiday"
    

    (I've tried with and without quotes)

    here's the script

    INPUT="input.csv"
    
    for i in `cat ${INPUT}`
    
    do
    #USERNAME=`echo $i | awk -v  FS=',' '{print $1}'`
    USERNAME=`echo $i | awk 'BEGIN{FS="[|,:]"} ; {print $1}'`
    echo "username: $USERNAME"
    
    done
    

    So that should just input john and paul but instead I get

    username: "john"
    username: holiday"
    username: "paul"
    username: on
    username: holiday"
    

    because it sees the spaces and interprets them as new rows.

    Can I get it to stop that?