KSH scripting: how to split on ',' when values have escaped commas?

10,528

Solution 1

You can also change the \, pattern to something else that is known not to appear in any of your strings, and then change it back after you've split the input into an array. You can use the ksh builtin pattern-substitution syntax to do this, you don't need to use sed or awk or anything.

read l
l=${l//\\,/!!}
IFS=","
set -A nvls $l
unset IFS
echo ${nvls[2]/!!/,}

Solution 2

As it often happens, I deviced an answer minutes after asking the question in public forum :(

I worked around the quoting/unquoting issue by piping the input file through the following sed script:

sed -e 's/\([^\]\),/\1\
/g;s/$/\
/

It converted the input into:

NAME1.1 VALUE1.1
NAME1.2 VALUE1.2_1\,VALUE1.2_2
NAME1.3 VALUE1.3
<empty line>
NAME2.1 VALUE2.1
<second record continues>

Now, I can parse this input like this:

while read name value ; do
  echo "$name => $value"
done

Value will have its commas unquoted by "read", and I can stuff "name" and "value" in some associative array, if I like.

PS Since I cant accept my own answer, should I delete the question, or ...?

Share:
10,528
Umar Niazi
Author by

Umar Niazi

Freelance telecom expert, functional programming geek, linux user

Updated on June 04, 2022

Comments

  • Umar Niazi
    Umar Niazi almost 2 years

    I try to write KSH script for processing a file consisting of name-value pairs, several of them on each line.

    Format is:

    NAME1 VALUE1,NAME2 VALUE2,NAME3 VALUE3, etc
    

    Suppose I write:

    read l
    IFS=","
    set -A nvls $l
    echo "$nvls[2]"
    

    This will give me second name-value pair, nice and easy. Now, suppose that the task is extended so that values could include commas. They should be escaped, like this:

    NAME1 VALUE1,NAME2 VALUE2_1\,VALUE2_2,NAME3 VALUE3, etc
    

    Obviously, my code no longer works, since "read" strips all quoting and second element of array will be just "NAME2 VALUE2_1".

    I'm stuck with older ksh that does not have "read -A array". I tried various tricks with "read -r" and "eval set -A ....", to no avail. I can't use "read nvl1 nvl2 nvl3" to do unescaping and splitting inside read, since I dont know beforehand how many name-value pairs are in each line.

    Does anyone have a useful trick up their sleeve for me?

    PS I know that I have do this in a nick of time in Perl, Python, even in awk. However, I have to do it in ksh (... or die trying ;)

  • Jonathan Leffler
    Jonathan Leffler over 15 years
    Does using sed count? You could also use awk or perl or ... to do the munging. The sed regex surprises me slightly; I would have used two backslashes inside the square brackets, but I guess that is not actually necessary.
  • Jonathan Leffler
    Jonathan Leffler over 15 years
    As to deleting the question - I don't know what the recommended procedure is, but I doubt that destroying your words of wisdom is really what they want. If the worst comes to the worst, I could copy your answer for you and let you select that - but it is a complete cheat.
  • Umar Niazi
    Umar Niazi over 15 years
    Oh. I just stumbled upon stackoverflow.com/questions/209329/…. Seems like it's better to leave it as it is. Maybe someone will found this useful and upvote it :)
  • Umar Niazi
    Umar Niazi over 15 years
    The only caveat here is that older KSH (as still found on SunOS, for example) does not have that nifty substitution function.