How can I parse an ini file whose values may contain certain characters?

8,266

Solution 1

The fact that you can do something in bash doesn't mean that you should.

sh (and bash etc) scripts are best suited to be relatively simple wrappers to launch programs or around text-processing commands. For more complicated tasks, including parsing ini files and acting on them, other languages are more appropriate. Have you considered writing your script in perl or python? Both have good .ini file parsers - I've used perl's Config::INI module several times when I've needed to parse an ini file.

But if you insist on doing it in bash, you should use an associative array instead of setting individual variables.

Start with something like this:

#! /bin/bash

inifile='user1074170.ini' 

# declare $config to be an associative array
declare -A config

while IFS='=' read -r key val ; do 
    config["$key"]="$val"
done <  <(sed -E -e '/^\[/d
                     s/#.*//
                     s/[[:blank:]]+$|^[[:blank:]]+//g' "$inifile" )

# now print out the config array
set | grep '^config='

The sed script deletes the [Section1] line (actually, all lines beginning with an open-square-bracket [ - you will want to handle this differently^[1] in an ini file with multiple sections), and removes comments as well as leading and trailing blanks. The while loop reads in each line, using = as a field delimiter, and assigns the contents to variables $key and $val, which are then added to the $config array.

Output:

config=([value1]="abc\`def" [value3]="mno\$pqr" [value2]="ghi>jkl" [value4]="stu;vwx" )

You can use the array entries later in your script like this:

$ echo value1 is "${config[value1]}"
value1 is abc`def

$ [ "${config[value4]}" = 'stu;vwx' ] && echo true
true

[1] awk or perl have conveniently easy ways of reading files in "paragraph" mode. A paragraph being defined as a block of text separated from other text blocks by one or more blank lines.

e.g. to work with only [Section1], insert the awk script below immediately before the sed script feeding into the while loop above:

awk -v RS= -v ORS='\n\n' '/\[Section1\]/' "$inifile" | sed ...

(and remove "$inifile" from the end of the sed command line, of course - you don't want to feed the file in again after you've gone to the trouble of extracting only [Section1] from it).

Setting ORS isn't strictly necessary if you're only extracting one section from the ini file - but will be useful to maintain paragraph separation if you're extracting two or more sections.

Solution 2

I know it's an incomplete answer but the MySQL.lns in augeas seems to be able to parse most of that. In augtool:

augtool> set /augeas/load/testini/incl "/root/test.ini"
augtool> set /augeas/load/testini/lens "MySQL.lns"
augtool> load
augtool> ls /files/root/
.ssh/      test.ini/
augtool> ls /files/root/test.ini
target/ = Section1
augtool> ls /files/root/test.ini/target/
value1/ = abc`def
value2/ = ghi>jkl
value3/ = mno$pqr
value4/ = stu

The only one it messed up on is the last one and TBH I don't think that's an error. In .ini files the semi-colon marks the beginning of a comment. I'd also like to ask if your data actually looks like that.

If it does, you may just do some sed prior to it that sets ; to some unused character value and then transform it back post-processing. Ultimately, you're going to need some standards, though, in order for the file to be capable of having any discernible structure.

EDIT:

I tested it out with the PHP lens and got the whole thing as long as the values were quoted:

[root@vlzoreman ~]# augtool
augtool> set /augeas/load/testini/lens "PHP.lns"
augtool> set /augeas/load/testini/incl "/root/test.ini"
augtool> load
augtool>  ls /files/root/test.ini/Section1/
value1 = abc`def
value2 = ghi>jkl
value3 = mno$pqr
value4 = stu;vwx

Otherwise it got as far as the MySQL lens did.

EDIT #2:

I'm sure there's a cleaner way to write this but this is example usage:

[root@vlp-foreman ~]# bash bash.sh
Values for: Section1:
        :: value1 is abc`def
        :: value2 is ghi>jkl
        :: value3 is mno$pqr
        :: value4 is stu;vwx
Values for: Section2:
        :: value1 is abc`def

Script is:

#!/bin/bash

sections=$(augtool -A --transform "PHP.lns incl /root/test.ini" ls /files/root/test.ini | cut -f1 -d/)

for currentSection in $sections; do

  echo "Values for: $currentSection:"

  fields=$(augtool -A --transform "PHP.lns incl /root/test.ini" ls /files/root/test.ini/$currentSection | awk '{print $1}')

  for currentField in $fields; do

    currentValue=$(augtool -A --transform "PHP.lns incl /root/test.ini" print /files/root/test.ini/$currentSection/$currentField | cut -f2 -d=)
    currentValue=$(echo $currentValue | sed -e 's/^[ \t]*//' -e 's/[ \t]*$//' | sed -e 's/^"//' -e 's/"$//')

    echo -e "\t:: $currentField is $currentValue"

  done

done

8,266

Alpesh Sorathiya

Updated on September 18, 2022

Comments

Alpesh Sorathiya over 1 year

I have looked at a couple bash ini parsing scripts and I've seen this one used a few times here so I'm trying to see if it will work for me. It looks like it reads the ini file line by line multiple times and with each pass it progressively constructs a function that finally gets eval'd. It works fine for some special characters but not others. If a value in the file contains a single quote or greater/less than symbol, the script returns syntax errors. Other symbols create unexpected results as well. How can I handle these characters as the are encountered?

This is the function that parses the ini.

#!/usr/bin/env bash
cfg_parser ()
{
    ini="$(<$1)"                # read the file
    ini="${ini//[/\[}"          # escape [
    ini="${ini//]/\]}"          # escape ]
    IFS=$'\n' && ini=( ${ini} ) # convert to line-array
    ini=( ${ini[*]//;*/} )      # remove comments with ;
    ini=( ${ini[*]/\    =/=} )  # remove tabs before =
    ini=( ${ini[*]/=\   /=} )   # remove tabs be =
    ini=( ${ini[*]/\ =\ /=} )   # remove anything with a space around =
    ini=( ${ini[*]/#\\[/\}$'\n'cfg.section.} ) # set section prefix
    ini=( ${ini[*]/%\\]/ \(} )    # convert text2function (1)
    ini=( ${ini[*]/=/=\( } )    # convert item to array
    ini=( ${ini[*]/%/ \)} )     # close array parenthesis
    ini=( ${ini[*]/%\\ \)/ \\} ) # the multiline trick
    ini=( ${ini[*]/%\( \)/\(\) \{} ) # convert text2function (2)
    ini=( ${ini[*]/%\} \)/\}} ) # remove extra parenthesis
    ini[0]="" # remove first element
    ini[${#ini[*]} + 1]='}'    # add the last brace
    eval "$(echo "${ini[*]}")" # eval the result
}

ini file

[Section1]
value1=abc`def # unexpected EOF while looking for matching ``'
value2=ghi>jkl # syntax error near unexpected token `>'
value3=mno$pqr # executes ok but outputs "mnoqr"
value4=stu;vwx # executes ok but outputs "stu"

Admin almost 8 years

Have you tried parsing it with sed and running eval on the result? Or just running source on the result, either via a temp file or via process substitution? For line-based search and replace, use sed.

Bratchley almost 8 years

Is it really a good idea to delete the section head? I mean real world examples can have identically named fields in different sections but you're going to want a specify value, not just any value for that field. I'm also not sure we should assume whitespace. I know I for one put whitespace around the section header and sometimes the fields as well, depending on what they're being set to be. It seems a regex that just looks for the section head and takes the first instance of the field that doesn't appear after a [ on a new line would be more reliable.
Bratchley almost 8 years

I will second the notion on using a different language though. ConfigParser in python appears able to parse it without issue, complication, or modification.
Alessio almost 8 years

@Bratchley - in that case, extraction the section name and use "$section.$key" as the hash key rather than just "$key". The code in my answer was meant to be a starting point to illustrate the basic method, not a complete and perfect solution.
Bratchley almost 8 years

The complete answer would probably be to use something that's already solved this problem rather than re-inventing something that's already been re-invented several times.
Alessio almost 8 years

@Bratchley - I couldn't agree more. as i said, bash is not the best language for a task like this - using one of the existing INI-parsing libs for perl, python, or whatever would be best. but if the OP really insists on doing it in bash, then some approaches to the problem (a hashed array) are better than others (eval arbitrary strings to set individual variables).
Alpesh Sorathiya almost 8 years

This answer is a good starting point. I ended up using a multidimensional array because the data had identical fields as @Bratchley pointed out. Actually the section values were all unique but the value fields were all Identical. Thanks, now I have to figure out how to write/delete whole sections in my ini. Maybe the subject of another question we'll see. I realize there are other solutions available but this is allowing me to learn bash in more depth which is becoming a hobby for me.
Alessio almost 8 years

By a weird coincidence, the OP just accepted my perl-based answer to an INI-file related question from Nov last year. I highly recommend looking at it and considering the use of perl (or python) instead of bash: unix.stackexchange.com/a/240844/7696
Alessio almost 8 years

BTW, the awk part of my answer here can be used to extract just the section(s) you want. The first line of each paragraph will be the section name. strip off the [ and ], save it in a variable (e.g. $section) and use "$section.$key" or similar as the hash key rather than just "$key"....that'll result in $config elements like config[Section1.value1]="abc'def"
Alessio almost 8 years

Also BTW, the descripton in the man page for the Config::TIny perl module says Config::Tiny - Read/Write .ini style files with as little code as possible. It lives up to that description, it's very simple and easy to use.