Extracting value of xml tag in PostgreSQL

33,937

Use the xpath() function:

WITH x(col) AS (SELECT '<?xml version="1.0" ?><response><status>ERROR_MISSING_DATA</status></response>'::xml)
SELECT xpath('./status/text()', col) AS status
FROM   x

/text() strips the surrounding <status> tag.
Returns an array of xml - with a single element in this case:

status
xml[]
-------
{ERROR_MISSING_DATA}

Applied to your table

In response to your question update, this can simply be:

SELECT id, xpath('./status/text()', response::xml) AS status
FROM   tbl;

If you are certain there is only a single status tag per row, you can simply extract the first item from the array:

SELECT id, (xpath('./status/text()', response::xml))[1] AS status
FROM   tbl;

If there can be multiple status items:

SELECT id, unnest(xpath('./status/text()', response::xml)) AS status
FROM   tbl;

Gets you 1-n rows per id.

Cast to xml

Since you defined your columns to be of type text (instead of xml, you need to cast to xml explicitly. The function xpath() expects the 2nd parameters of type xml. An untyped string constant is coerced to xml automatically, but a text column is not. You need to cast explicitly.

This works without explicit cast:

  SELECT xpath('./status/text()'
      ,'<?xml version="1.0" ?><response><status>SUCCESS</status></response>')

A CTE like in my first example needs a type for every column in the "common table expression". If I had not cast to a specific type, the type unknown would have been used - which is not the same thing as an untyped string. Obviously, there is no direct conversion implemented between unknown and xml. You'd have to cast to text first: unknown_type_col::text::xml. Better to cast to ::xml right away.

This has been tightened with PostgreSQL 9.1 (I think). Older versions were more permissive.

Either way, with any of these methods the string has to be valid xml or the cast (implicit or explicit) will raise an exception.

Share:
33,937
ronak
Author by

ronak

Technology enthusiast, linux and python lover..!

Updated on July 05, 2022

Comments

  • ronak
    ronak almost 2 years

    Below is the column response from my Postgres table. I want to extract the status from all the rows in my Postgres database. The status could be of varying sizes like SUCCESS as well so I do not want to use the substring function. Is there a way to do it?

    <?xml version="1.0" ?><response><status>ERROR_MISSING_DATA</status><responseType>COUNTRY_MISSING</responseType><country_info>USA</country_info><phone_country_code>1234</phone_country_code></response>
    

    so my table structure is like this

       Column    |            Type             |                        Modifiers                         
    
    -------------+-----------------------------+----------------------------------------------------------
    
     id          | bigint                      | not null default nextval('events_id_seq'::regclass)
     hostname    | text                        | not null
     time        | timestamp without time zone | not null
     trn_type    | text                        | 
     db_ret_code | text                        | 
     request     | text                        | 
     response    | text                        | 
     wait_time   | text                        | 
    

    And I want to extract status from each and every request. How do i do this?

    Below is a sample row. And assume the table name abc_events

    id          | 1870667
    hostname    | abcd.local
    time        | 2013-04-16 00:00:23.861
    trn_type    | A
    request     | <?xml version="1.0" ?><response><status>ERROR_MISSING_DATA</status><responseType>COUNTRY_MISSING</responseType><country_info>USA</country_info><phone_country_code>1234</phone_country_code></response>
    response    | <?xml version="1.0" ?><response><status>ERROR_MISSING_DATA</status><responseType>COUNTRY_MISSING</responseType><country_info>USA</country_info><phone_country_code>1234</phone_country_code></response>
    
  • Phrogz
    Phrogz about 11 years
    Do you need the ::xml? I was just doing SELECT xpath('...', '<raw>xml</raw>'); and it seems to work.
  • ronak
    ronak about 11 years
    I edited my question. Essentially what I want is to extract value of a tag from the column that has the xml request/response.
  • ronak
    ronak about 11 years
    I followed it but I am getting this error LINE 1: select unnest(xpath('./status/text()', request)) from abc_events ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. It is pointing to the xpath function.
  • Erwin Brandstetter
    Erwin Brandstetter about 11 years
    @Phrogz: I added a chapter on the topic of casting, since my initial comment wasn't completely correct. A cast is actually needed with a CTE in this case ...
  • Erwin Brandstetter
    Erwin Brandstetter about 11 years
    @ronak: I added a bit to my answer. Note the addendum about casting to xml. Also note I had the wrong cast at first. Must be ::xml.
  • ronak
    ronak about 11 years
    Thanks for the help Erwin. This helped me a lot.
  • Erwin Brandstetter
    Erwin Brandstetter about 11 years
    @ronak: Cool. :) For more advanced acrobatics with xpath() consider this related answer.
  • Aamir
    Aamir over 10 years
    But what if there is multiple tags in a column? How i can extract them? Suppose xml-data in a column is like - <status>abc</status><response>ERROR_MISSING_DATA</response>
  • Peter Krauss
    Peter Krauss over 6 years
    Hi, simple "cast XML to text" must use //text()... So array_to_string( xpath('path//text()', xcontent)::text[] , '') to obtain all text from, eg., the TXT of an HTML document.
  • Surya
    Surya over 5 years
    <?xml version="1.0" encoding="UTF-8"?><BookList xmlns="azkhaban.com/DEM-ON-TOR-20040511#" xmlns:xsd="hogwarts.org/2001/XMLSchema" xmlns:xsi="p07.org/1989/…> When the xml is something like this how can I get the xpath? For every entry in the table the urls may not be the same so I cannot keep the url as a part of the xpath right?