How can I parse an ini file whose values may contain certain characters?
Solution 1
The fact that you can do something in bash
doesn't mean that you should.
sh
(and bash
etc) scripts are best suited to be relatively simple wrappers to launch programs or around text-processing commands. For more complicated tasks, including parsing ini files and acting on them, other languages are more appropriate. Have you considered writing your script in perl
or python
? Both have good .ini file parsers - I've used perl's Config::INI
module several times when I've needed to parse an ini file.
But if you insist on doing it in bash, you should use an associative array instead of setting individual variables.
Start with something like this:
#! /bin/bash
inifile='user1074170.ini'
# declare $config to be an associative array
declare -A config
while IFS='=' read -r key val ; do
config["$key"]="$val"
done < <(sed -E -e '/^\[/d
s/#.*//
s/[[:blank:]]+$|^[[:blank:]]+//g' "$inifile" )
# now print out the config array
set | grep '^config='
The sed
script deletes the [Section1]
line (actually, all lines beginning with an open-square-bracket [
- you will want to handle this differently[1] in an ini file with multiple sections), and removes comments as well as leading and trailing blanks. The while
loop reads in each line, using =
as a field delimiter, and assigns the contents to variables $key and $val, which are then added to the $config array.
Output:
config=([value1]="abc\`def" [value3]="mno\$pqr" [value2]="ghi>jkl" [value4]="stu;vwx" )
You can use the array entries later in your script like this:
$ echo value1 is "${config[value1]}"
value1 is abc`def
$ [ "${config[value4]}" = 'stu;vwx' ] && echo true
true
[1] awk
or perl
have conveniently easy ways of reading files in "paragraph" mode. A paragraph being defined as a block of text separated from other text blocks by one or more blank lines.
e.g. to work with only [Section1]
, insert the awk
script below immediately before the sed
script feeding into the while
loop above:
awk -v RS= -v ORS='\n\n' '/\[Section1\]/' "$inifile" | sed ...
(and remove "$inifile"
from the end of the sed
command line, of course - you don't want to feed the file in again after you've gone to the trouble of extracting only [Section1]
from it).
Setting ORS
isn't strictly necessary if you're only extracting one section from the ini file - but will be useful to maintain paragraph separation if you're extracting two or more sections.
Solution 2
I know it's an incomplete answer but the MySQL.lns
in augeas seems to be able to parse most of that. In augtool
:
augtool> set /augeas/load/testini/incl "/root/test.ini"
augtool> set /augeas/load/testini/lens "MySQL.lns"
augtool> load
augtool> ls /files/root/
.ssh/ test.ini/
augtool> ls /files/root/test.ini
target/ = Section1
augtool> ls /files/root/test.ini/target/
value1/ = abc`def
value2/ = ghi>jkl
value3/ = mno$pqr
value4/ = stu
The only one it messed up on is the last one and TBH I don't think that's an error. In .ini
files the semi-colon marks the beginning of a comment. I'd also like to ask if your data actually looks like that.
If it does, you may just do some sed
prior to it that sets ;
to some unused character value and then transform it back post-processing. Ultimately, you're going to need some standards, though, in order for the file to be capable of having any discernible structure.
EDIT:
I tested it out with the PHP lens and got the whole thing as long as the values were quoted:
[root@vlzoreman ~]# augtool
augtool> set /augeas/load/testini/lens "PHP.lns"
augtool> set /augeas/load/testini/incl "/root/test.ini"
augtool> load
augtool> ls /files/root/test.ini/Section1/
value1 = abc`def
value2 = ghi>jkl
value3 = mno$pqr
value4 = stu;vwx
Otherwise it got as far as the MySQL lens did.
EDIT #2:
I'm sure there's a cleaner way to write this but this is example usage:
[root@vlp-foreman ~]# bash bash.sh
Values for: Section1:
:: value1 is abc`def
:: value2 is ghi>jkl
:: value3 is mno$pqr
:: value4 is stu;vwx
Values for: Section2:
:: value1 is abc`def
Script is:
#!/bin/bash
sections=$(augtool -A --transform "PHP.lns incl /root/test.ini" ls /files/root/test.ini | cut -f1 -d/)
for currentSection in $sections; do
echo "Values for: $currentSection:"
fields=$(augtool -A --transform "PHP.lns incl /root/test.ini" ls /files/root/test.ini/$currentSection | awk '{print $1}')
for currentField in $fields; do
currentValue=$(augtool -A --transform "PHP.lns incl /root/test.ini" print /files/root/test.ini/$currentSection/$currentField | cut -f2 -d=)
currentValue=$(echo $currentValue | sed -e 's/^[ \t]*//' -e 's/[ \t]*$//' | sed -e 's/^"//' -e 's/"$//')
echo -e "\t:: $currentField is $currentValue"
done
done
Related videos on Youtube
Alpesh Sorathiya
Updated on September 18, 2022Comments
-
Alpesh Sorathiya over 1 year
I have looked at a couple bash ini parsing scripts and I've seen this one used a few times here so I'm trying to see if it will work for me. It looks like it reads the ini file line by line multiple times and with each pass it progressively constructs a function that finally gets eval'd. It works fine for some special characters but not others. If a value in the file contains a single quote or greater/less than symbol, the script returns syntax errors. Other symbols create unexpected results as well. How can I handle these characters as the are encountered?
This is the function that parses the ini.
#!/usr/bin/env bash cfg_parser () { ini="$(<$1)" # read the file ini="${ini//[/\[}" # escape [ ini="${ini//]/\]}" # escape ] IFS=$'\n' && ini=( ${ini} ) # convert to line-array ini=( ${ini[*]//;*/} ) # remove comments with ; ini=( ${ini[*]/\ =/=} ) # remove tabs before = ini=( ${ini[*]/=\ /=} ) # remove tabs be = ini=( ${ini[*]/\ =\ /=} ) # remove anything with a space around = ini=( ${ini[*]/#\\[/\}$'\n'cfg.section.} ) # set section prefix ini=( ${ini[*]/%\\]/ \(} ) # convert text2function (1) ini=( ${ini[*]/=/=\( } ) # convert item to array ini=( ${ini[*]/%/ \)} ) # close array parenthesis ini=( ${ini[*]/%\\ \)/ \\} ) # the multiline trick ini=( ${ini[*]/%\( \)/\(\) \{} ) # convert text2function (2) ini=( ${ini[*]/%\} \)/\}} ) # remove extra parenthesis ini[0]="" # remove first element ini[${#ini[*]} + 1]='}' # add the last brace eval "$(echo "${ini[*]}")" # eval the result }
ini file
[Section1] value1=abc`def # unexpected EOF while looking for matching ``' value2=ghi>jkl # syntax error near unexpected token `>' value3=mno$pqr # executes ok but outputs "mnoqr" value4=stu;vwx # executes ok but outputs "stu"
-
Admin almost 8 yearsHave you tried parsing it with
sed
and runningeval
on the result? Or just runningsource
on the result, either via a temp file or via process substitution? For line-based search and replace, usesed
.
-
-
Bratchley almost 8 yearsIs it really a good idea to delete the section head? I mean real world examples can have identically named fields in different sections but you're going to want a specify value, not just any value for that field. I'm also not sure we should assume whitespace. I know I for one put whitespace around the section header and sometimes the fields as well, depending on what they're being set to be. It seems a regex that just looks for the section head and takes the first instance of the field that doesn't appear after a
[
on a new line would be more reliable. -
Bratchley almost 8 yearsI will second the notion on using a different language though.
ConfigParser
in python appears able to parse it without issue, complication, or modification. -
Alessio almost 8 years@Bratchley - in that case, extraction the section name and use "$section.$key" as the hash key rather than just "$key". The code in my answer was meant to be a starting point to illustrate the basic method, not a complete and perfect solution.
-
Bratchley almost 8 yearsThe complete answer would probably be to use something that's already solved this problem rather than re-inventing something that's already been re-invented several times.
-
Alessio almost 8 years@Bratchley - I couldn't agree more. as i said, bash is not the best language for a task like this - using one of the existing INI-parsing libs for perl, python, or whatever would be best. but if the OP really insists on doing it in bash, then some approaches to the problem (a hashed array) are better than others (eval arbitrary strings to set individual variables).
-
Alpesh Sorathiya almost 8 yearsThis answer is a good starting point. I ended up using a multidimensional array because the data had identical fields as @Bratchley pointed out. Actually the section values were all unique but the value fields were all Identical. Thanks, now I have to figure out how to write/delete whole sections in my ini. Maybe the subject of another question we'll see. I realize there are other solutions available but this is allowing me to learn bash in more depth which is becoming a hobby for me.
-
Alessio almost 8 yearsBy a weird coincidence, the OP just accepted my perl-based answer to an INI-file related question from Nov last year. I highly recommend looking at it and considering the use of perl (or python) instead of bash: unix.stackexchange.com/a/240844/7696
-
Alessio almost 8 yearsBTW, the
awk
part of my answer here can be used to extract just the section(s) you want. The first line of each paragraph will be the section name. strip off the[
and]
, save it in a variable (e.g.$section
) and use"$section.$key"
or similar as the hash key rather than just"$key"
....that'll result in $config elements likeconfig[Section1.value1]="abc'def"
-
Alessio almost 8 yearsAlso BTW, the descripton in the man page for the
Config::TIny
perl module saysConfig::Tiny - Read/Write .ini style files with as little code as possible
. It lives up to that description, it's very simple and easy to use.