How to convert PostgreSQL escape bytea to hex bytea?

10,574

Solution 1

If the final purpose is to get the hexadecimal representation of all the bytes that constitute the strings in textColumn, this can be done with:

SELECT encode(convert_to(textColumn, 'UTF-8'), 'hex') from tableXY;

It does not depend on bytea_output. BTW, this setting plays a role only at the final stage of a query, when a result column is of type bytea and has to be returned in text format to the client (which is the most common case, and what pgAdmin does). It's a matter of representation, the actual values represented (the series of bytes) are identical.

In the query above, the result is of type text, so this is irrelevant anyway.

I think that your query with decode(..., 'escape') can't work because the argument is supposed to be encoded in escape format and it's not, per comments it's normal xml strings.

Solution 2

With the great help of Daniel-Vérité I use this general query now to check for all kind of BOM or unicode char problems:

select encode(textColumn::bytea, 'hex'), * from tableXY;

I had problem with pgAdmin and too long columns, as they had no result. I used that query for pgAdmin:

select encode(substr(textColumn,1,100)::bytea, 'hex'), * from tableXY;

Thanks Daniel!

Share:
10,574
jan
Author by

jan

Updated on June 14, 2022

Comments

  • jan
    jan almost 2 years

    I got the answer to check for one certain BOM in a PostgreSQL text column. What I really like to do would be to have something more general, i.e. something like

    select decode(replace(textColumn, '\\', '\\\\'), 'escape') from tableXY;
    

    The result of a UTF8 BOM is:

    \357\273\277
    

    Which is octal bytea and can be converted by switching the output of bytea in pgadmin:

    update pg_settings set setting = 'hex' WHERE name = 'bytea_output';
    select '\357\273\277'::bytea
    

    The result is:

    \xefbbbf
    

    What I would like to have is this result as one query, e.g.

    update pg_settings set setting = 'hex' WHERE name = 'bytea_output';
    select decode(replace(textColumn, '\\', '\\\\'), 'escape') from tableXY;
    

    But that doesn't work. The result is empty, probably because the decode cannot handle hex output.

  • jan
    jan almost 10 years
    Like I saw with the decode(..., 'escape') query, this query is not working always. In two of 14 cases it works and looks exactly like what I want. But in 12 of 14 cases the result of the query is empty. It is not related to the BOM, as two rows with BOM have empty results, the 2 results of no BOM and 10 other lines without BOM have no result too. If I remove the encode part, already the inner convert_to has only 4 results, i.e. 8 results are lost by the convert_to and 2 results by the encode.
  • jan
    jan almost 10 years
    Ain't SELECT encode(uuio_configuration_c::bytea, 'hex') from uic_UIObjectConfiguration; better as your query? It has the same result, i.e. only 2 of 14 rows are not empty.
  • jan
    jan almost 10 years
    I cannot, because the entries are too long for comments. I think that is the problem too, because the two working entries are the shortest (under 1000 chars). All entries not working have more then 2800 chars. I'm only interested in the beginning of the entries, can the query only convert the first 100 chars?
  • Daniel Vérité
    Daniel Vérité almost 10 years
    substr(longtext,1,100)
  • Daniel Vérité
    Daniel Vérité almost 10 years
    Note that the empty results are almost certainly a display problem with pgAdmin. The SQL functions involved here are not known to have a problem with large texts.
  • jan
    jan almost 10 years
    @DanielVérité: You are right, other clients have no problem with select encode(textColumn::bytea, 'hex'), * from tableXY;