Extract xml tag value using awk command

41,638

Solution 1

You can use awk as shown below, however, this is NOT a robust solution and will fail if the xml is not formatted correctly e.g. if there are multiple elements on the same line.

$ dt=$(awk -F '[<>]' '/IntrBkSttlmDt/{print $3}' file)
$ echo $dt
1967-08-13

I suggest you use a proper xml processing tool, like xmllint.

$ dt=$(xmllint --shell file <<< "cat //IntrBkSttlmDt/text()" | grep -v "^/ >")
$ echo $dt
1967-08-13

Solution 2

The following gawk command uses a record separator regex pattern to match the XML tags. Anything starting with a < followed by at least one non-> and terminated by a > is considered to be a tag. Gawk assigns each RS match into the RT variable. Anything between the tags will be parsed as the record text which gawk assigns to $0.

gawk 'BEGIN { RS="<[^>]+>" } { print RT, $0 }' myfile
Share:
41,638
user1929905
Author by

user1929905

Updated on July 09, 2022

Comments

  • user1929905
    user1929905 almost 2 years

    I have a xml like below

    <root>    
    <FIToFICstmrDrctDbt>
                <GrpHdr>
                    <MsgId>A</MsgId>
                    <CreDtTm>2001-12-17T09:30:47</CreDtTm>
                    <NbOfTxs>0</NbOfTxs>
                    <TtlIntrBkSttlmAmt Ccy="EUR">0.0</TtlIntrBkSttlmAmt>
                    <IntrBkSttlmDt>1967-08-13</IntrBkSttlmDt>
                    <SttlmInf>
                        <SttlmMtd>CLRG</SttlmMtd>
                        <ClrSys>
                            <Prtry>xx</Prtry>
                        </ClrSys>
                    </SttlmInf>
                    <InstgAgt>
                        <FinInstnId>
                            <BIC>AAAAAAAAAAA</BIC>
                        </FinInstnId>
                    </InstgAgt>
                </GrpHdr>
        </FIToFICstmrDrctDbt>
    </root>
    

    I need to extract the value of each tag value in separate variables using awk command. how to do it?