How to grep an xml block in an xml file using a keyword in ksh

14,659

Solution 1

With the conditions:

  1. I cannot use any XML parser tool as I don't have permission , read only

  2. My xmllint version does not support xpath, and I cannot update it , read only

  3. I dont have xmlstarlet and cannot install it

I resorted to finding other unconventional solutions. This awk command got me what I needed

awk '
  /<service.*name=/ { f=1 ; m=0 ; res="" }
  f { res = res $0 ORS }
  f && /mqListener queue="ABC.getme2"/ { m=1 }
  /<\/service>/ { f=0 ; if (m) print res $0 }
' Sample.xml

Special thanks to @Janis for helping me here -How to implement awk range pattern in fetching an xml block when the input parameter is in the middle of the block

Solution 2

As mentioned in the above comment, xmllint can be used like

xmllint --xpath '//service/[@name="GETME"]' Sample.xml

The option is available at least as of libxml version 20903.

A primer on xpath syntax can be found here: http://www.w3schools.com/xpath/xpath_syntax.asp or more authoritatively https://www.w3.org/Consortium/Offices/Presentations/XSLT_XPATH/#(23)

Solution 3

If you are using the latest ksh - by which I mean a recent build of ksh93 - you can actually just use it. ksh93 supports compound variable types - which are a little like a C struct - or an XML node-tree. It doesn't natively support XML at the moment - though I believe that it is planned - but it does support json right now.

I used some free online converter thing to get a json output of your sample. Still though, after cleaning up your sample a little (the p in </connectionpools> should be upper-cased, by the way) I could do:

print -j queue.services.[@name]

...and was rewarded with...

GETME

I could also do:

print -j queue.services[1].[@name]

...to get instead...

GETME2

At the linked conversion site I had to select Tab-delimited to keep it from sticking in a lot of non-breaking spaces, but other than that it seems to have come off ok. Surely there are tools you can use as easily to locally to do similar conversions.

In any case, with ksh you can read in a json tree after copying it to your clipboard just like I did, like:

read -m json queue <<<"$(xsel -bo)"

After doing that I could view the whole structure like...

print -j queue

...which printed...

{
    "batchServices": [
        {
            "@name": "batch1",
            "executor": {
                "@className": "com.abc.xyz.qwer.qweqwewqe.ffdsdfsdfsdfsdf"
            }
        },
        {
            "@name": "batch2",
            "executor": {
                "@className": "com.abc.xyz.qwer.qweqwewqe.zxcsadsad"
            }
        }
    ],
    "configfile": "sample.xml",
    "connectionPools": [
        {
            "driver": "oracle.jdbc.driver.OracleDriver",
            "maxConnections": "10",
            "minConnections": "0",
            "name": "asdasd",
            "password": "$asdasd_PASSWORD",
            "poolUrl": "jdbc:asdsad:asdasdsad",
            "testSql": "select * from abc",
            "url": "$asdasd_URL",
            "userId": "$asdasd_USER"
        }
    ],
    "exceptionsFilterConfigFile": "asdasd.xml",
    "keyInfoConfigFile": "asdasd.xml",
    "services": [
        {
            "@backend": "ABC",
            "@idleTime": "300",
            "@max": "10",
            "@min": "1",
            "@name": "GETME",
            "handlerContainer": {
                "@className": "com.abc.xyz.wqere.abcqwere",
                "handler": {
                    "@className": "com.abc.xyz.qweqweqwe.werwerwerwer"
                }
            },
            "mqListener": {
                "@copyMessageId": "true",
                "@maxExpiry": "500",
                "@minExpiry": "4",
                "@queue": "ABC.getme",
                "@suggExpiry": "30"
            }
        },
        {
            "@backend": "ABC",
            "@idleTime": "300",
            "@max": "10",
            "@min": "1",
            "@name": "GETME2",
            "handlerContainer": {
                "@className": "com.abc.xyz.wqere.abcqwere",
                "handler": {
                    "@className": "com.abc.xyz.qweqweqwe.werwerwerwer"
                }
            },
            "mqListener": {
                "@copyMessageId": "true",
                "@maxExpiry": "500",
                "@minExpiry": "4",
                "@queue": "ABC.getme2",
                "@suggExpiry": "30"
            }
        }
    ]
}
Share:
14,659

Related videos on Youtube

Philip Morris
Author by

Philip Morris

Updated on September 18, 2022

Comments

  • Philip Morris
    Philip Morris almost 2 years

    I have a file Sample.xml which contains a lot of services inside it and the structure looks like this

    Notes:

    1. I cannot use any XML parser tool as I don't have permission, read only

    2. My xmllint version does not support xpath, and I cannot update it, read only

    3. I don't have xmlstarlet and cannot install it

    PROBLEM: INPUT: QUEUE NAME

    OUTPUT: SERVICE BLOCK

    sample INPUT: ABC.getme2

    OUTPUT NEEDED:

    <service name="GETME2" min="1" max="10" idleTime="300" backend="ABC">
                                <handlerContainer className="com.abc.xyz.wqere.abcqwere">
                                <handler className="com.abc.xyz.qweqweqwe.werwerwerwer"/>
                                </handlerContainer>
                                <mqListener queue="ABC.getme2" suggExpiry="30" minExpiry="4" maxExpiry="500" copyMessageId="true"/>
                        </service>
    

    XML Structure:

         <?xml version="1.0" encoding="UTF-8"?>
            <deploymentconfig xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
                    <configfile>sample.xml</configfile>
                    <exceptionsFilterConfigFile>asdasd.xml</exceptionsFilterConfigFile>
                    <keyInfoConfigFile>asdasd.xml</keyInfoConfigFile>
                    <services>
    
        <service name="GETME" min="1" max="10" idleTime="300" backend="ABC">
                                <handlerContainer className="com.abc.xyz.wqere.abcqwere">
                                <handler className="com.abc.xyz.qweqweqwe.werwerwerwer"/>
                                </handlerContainer>
                                <mqListener queue="ABC.getme" suggExpiry="30" minExpiry="4" maxExpiry="500" copyMessageId="true"/>
                        </service>
    
        <service name="GETME2" min="1" max="10" idleTime="300" backend="ABC">
                                <handlerContainer className="com.abc.xyz.wqere.abcqwere">
                                <handler className="com.abc.xyz.qweqweqwe.werwerwerwer"/>
                                </handlerContainer>
                                <mqListener queue="ABC.getme2" suggExpiry="30" minExpiry="4" maxExpiry="500" copyMessageId="true"/>
                        </service>
            . . . .a lot of services like this . . . .
            . . . .a lot of services like this . . . .
            . . . .a lot of services like this . . . .
            . . . .a lot of services like this . . . .
            </services>
       <batchServices>
                            <batchService name="batch1">
                                    <executor className="com.abc.xyz.qwer.qweqwewqe.ffdsdfsdfsdfsdf" />
                            </batchService>
                            <batchService name="batch2">
                                    <executor className="com.abc.xyz.qwer.qweqwewqe.zxcsadsad" />
                            </batchService>
    . . . .a lot of batch services like this . . . .
            . . . .a lot of batch services like this . . . .
            . . . .a lot of batch services like this . . . .
            . . . .a lot of batch services like this . . . .
          </batchServices>
    
    <timerservices>
    <timerservice> - a lot of timeservice
    </timerservices>
    
      <connectionPools>
                    <pool>
                            <name>asdasd</name>
                            <driver>oracle.jdbc.driver.OracleDriver</driver>
                            <url>$asdasd_URL</url>
                            <userId>$asdasd_USER</userId>
                            <password>$asdasd_PASSWORD</password>
                            <minConnections>0</minConnections>
                            <maxConnections>10</maxConnections>
                            <poolUrl>jdbc:asdsad:asdasdsad</poolUrl>
                            <testSql>select * from abc</testSql>
                    </pool>
    
     . . a lot of pools. . .
    
    </connectionpools>
    
    </deploymentconfig>
    

    I need to grep an xml block like this:

     <service name="GETME" min="1" max="10" idleTime="300" backend="ABC">
                            <handlerContainer className="com.abc.xyz.wqere.abcqwere">
                            <handler className="com.abc.xyz.qweqweqwe.werwerwerwer"/>
                            </handlerContainer>
                            <mqListener queue="ABC.getme" suggExpiry="30" minExpiry="4" maxExpiry="500" copyMessageId="true"/>
                    </service>
    

    and I only need to provide the queue name

    QUEUENAME=INSERT_HERE
    grep ______________ $QUEUENAME. . . 
    

    Here is the output:

    Usage : xmllint [options] XMLfiles ...
        Parse the XML files and output the result of the parsing
        --version : display the version of the XML library used
        --debug : dump a debug tree of the in-memory document
        --shell : run a navigating shell
        --debugent : debug the entities defined in the document
        --copy : used to test the internal copy implementation
        --recover : output what was parsable on broken XML documents
        --noent : substitute entity references by their value
        --noout : don't output the result tree
        --path 'paths': provide a set of paths for resources
        --load-trace : print trace of all external entites loaded
        --nonet : refuse to fetch DTDs or entities over network
        --nocompact : do not generate compact text nodes
        --htmlout : output results as HTML
        --nowrap : do not put HTML doc wrapper
        --valid : validate the document in addition to std well-formed check
        --postvalid : do a posteriori validation, i.e after parsing
        --dtdvalid URL : do a posteriori validation against a given DTD
        --dtdvalidfpi FPI : same but name the DTD with a Public Identifier
        --timing : print some timings
        --output file or -o file: save to a given file
        --repeat : repeat 100 times, for timing or profiling
        --insert : ad-hoc test for valid insertions
        --compress : turn on gzip compression of output
        --html : use the HTML parser
        --xmlout : force to use the XML serializer when using --html
        --push : use the push mode of the parser
        --memory : parse from memory
        --maxmem nbbytes : limits memory allocation to nbbytes bytes
        --nowarning : do not emit warnings from parser/validator
        --noblanks : drop (ignorable?) blanks spaces
        --nocdata : replace cdata section with text nodes
        --format : reformat/reindent the input
        --encode encoding : output in the given encoding
        --dropdtd : remove the DOCTYPE of the input docs
        --c14n : save in W3C canonical format (with comments)
        --exc-c14n : save in W3C exclusive canonical format (with comments)
        --nsclean : remove redundant namespace declarations
        --testIO : test user I/O support
        --catalogs : use SGML catalogs from $SGML_CATALOG_FILES
                     otherwise XML Catalogs starting from 
                 file:///etc/xml/catalog are activated by default
        --nocatalogs: deactivate all catalogs
        --auto : generate a small doc on the fly
        --xinclude : do XInclude processing
        --noxincludenode : same but do not generate XInclude nodes
        --loaddtd : fetch external DTD
        --dtdattr : loaddtd + populate the tree with inherited attributes 
        --stream : use the streaming interface to process very large files
        --walker : create a reader and walk though the resulting doc
        --pattern pattern_value : test the pattern support
        --chkregister : verify the node registration code
        --relaxng schema : do RelaxNG validation against the schema
        --schema schema : do validation against the WXS schema
        --schematron schema : do validation against a schematron
        --sax1: use the old SAX1 interfaces for processing
        --sax: do not build a tree but work just at the SAX level
    
    Libxml project home page: http://xmlsoft.org/
    To report bugs or get some help check: http://xmlsoft.org/bugs.html
    

    Here is the version

    xmllint: using libxml version 20626
    
    • minorcaseDev
      minorcaseDev about 9 years
      Take a look at xmllint or xmlstarlet.
    • Joe Sewell
      Joe Sewell about 9 years
      Please explain what you mean by "grep an xml block like this..." Are you looking for a specific block by service name, but want to print the entire block? Or do you want to get the name fields of every service? (awk would probably work better in either case, since grep is line-oriented.)
    • Philip Morris
      Philip Morris about 9 years
      I just want to print the entire block of the service name
    • shivams
      shivams about 9 years
      Could you provide the exact xml input file? Atleast the exact structure of your xml file is required to provide you a xmllint code that would work.
    • Philip Morris
      Philip Morris about 9 years
      @shivams The xml file composes of a bunch of the sample block I posted in the question
    • shivams
      shivams about 9 years
      So, there is no other parent of these blocks in the xml?
    • Philip Morris
      Philip Morris about 9 years
      <services> <service> </service> </services>
    • shivams
      shivams about 9 years
      @PhilipMorris See my answer according to an assumed xml file.
    • Sobrique
      Sobrique about 9 years
      Sample XML is useful, but please resist the temptation to include more like this comments - it invalidates the XML. If your XML doesn't have matched start/end tags, it's not valid, and can't be parsed as XML. And that includes <connectionPools> closing with </connectionPools>.
  • Philip Morris
    Philip Morris about 9 years
    Im getting Unknown option --xpath
  • Philip Morris
    Philip Morris about 9 years
    neither worked. I posted a more elaborate xml structure. And I need queue name as input and the service block as my output
  • Philip Morris
    Philip Morris about 9 years
    This is the output :$ use strict; -ksh: use: not found [No such file or directory] $ use warnings; -ksh: use: not found [No such file or directory] $ $ use XML::Twig; -ksh: use: not found [No such file or directory]
  • Sobrique
    Sobrique about 9 years
    ... no, it wouldn't be found in ksh because it's a perl script.
  • Philip Morris
    Philip Morris about 9 years
    Am I doomed? =D
  • Sobrique
    Sobrique about 9 years
    No. You just need to save that first snippet as something.pl. Then use perl to run it. (The first line #!/usr/bin/perl should do that). You will probably find that you need to also install XML::Twig (I'm guessing that because you've got ksh you're on AIX). This can be done by running: perl -MCPAN -e shell then install XML::Twig. (Or look at your package manager)
  • shivams
    shivams about 9 years
    You will need to install xmlstarlet from your package manager. On Ubuntu, you could install it by sudo apt-get install xmlstarlet.
  • Philip Morris
    Philip Morris about 9 years
    I dont have permissions =(
  • Philip Morris
    Philip Morris about 9 years
    I am on read only =(