Character with encoding UTF8 has no equivalent in WIN1252

86,471

Solution 1

What do you do when you get this message? Do you import a file to Postgres? As devstuff said it is a BOM character. This is a character Windows writes as first to a text file, when it is saved in UTF8 encoding - it is invisible, 0-width character, so you'll not see it when opening it in a text editor.

Try to open this file in for example Notepad, save-as it in ANSI encoding and add (or replace similar) set client_encoding to 'WIN1252' line in your file.

Solution 2

I had a similar issue, and I solved by setting the encoding to UTF8 with \encoding UTF8 in the client before attempting an INSERT INTO foo (SELECT * from bar WHERE x=y);. My client was using WIN1252 encoding but the database was in UTF8, hence the error.

More info is available on the PostgreSQL wiki under Character Set Support (devel docs).

Solution 3

Don't eridicate the characters, they're real and used for good reasons. Instead, eridicate Win1252.

Solution 4

I had a very similar issue. I had a linked server from SQL Server to a PostgreSQL database. Some data I had in the table I was selecting from using an openquery statement had some character that didn't have an equivalent in Win1252. The problem was that the System DSN entry (to be found under the ODBC Data Source Administrator) I had used for the connection was configured to use PostgreSQL ANSI(x64) rather than PostgreSQL Unicode(x64). Creating a new data source with the Unicode support and creating a new modified linked server and refernecing the new linked server in your openquery resolved the issue for me. Happy days.

Solution 5

That looks like the byte sequence 0xBD, 0xBF, 0xEF as a little-endian integer. This is the UTF8-encoded form of the Unicode byte-order-mark (BOM) character 0xFEFF.

I'm not sure what Postgre's normal behaviour is, but the BOM is normally used only for encoding detection at the beginning of an input stream, and is usually not returned as part of the result.

In any case, your exception is due to this code point not having a mapping in the Win1252 code page. This will occur with most other non-Latin characters too, such as those used in Asian scripts.

Can you change the database encoding to be UTF8 instead of 1252? This will allow your columns to contain almost any character.

Share:
86,471
Monis Iqbal
Author by

Monis Iqbal

software enthusiast

Updated on May 20, 2020

Comments

  • Monis Iqbal
    Monis Iqbal almost 4 years

    I am getting the following exception:

    Caused by: org.postgresql.util.PSQLException: ERROR: character 0xefbfbd of encoding "UTF8" has no equivalent in "WIN1252"
    

    Is there a way to eradicate such characters, either via SQL or programmatically?
    (SQL solution should be preferred).

    I was thinking of connecting to the DB using WIN1252, but it will give the same problem.

  • Monis Iqbal
    Monis Iqbal over 14 years
    I wish I could, it's a requirement :)
  • Monis Iqbal
    Monis Iqbal over 14 years
    No, we cannot change the destination database encoding to UTF8.
  • Monis Iqbal
    Monis Iqbal over 14 years
    We are fetching some data from a source DB (UTF-8 encoded Teradata DB) and writing it in a destination DB (WIN1252 encoded Postgres DB). I think there is nothing we can do to convert these characters to WIN1252 as they aren't supported, that is why we wanted to ignore these while fetching from the source DB or saving in the destination DB.
  • Tony Xu
    Tony Xu over 14 years
    You can then import this data first to bytea and then use convert_from(string bytea, type text). You can define your own function doing a conversion using "create conversion". This function can ignore characters that are not in WIN1252. Easier would be to use a program filter for your data "iconv -f utf-8 -t windows-1252//translit" would do.
  • Tony Xu
    Tony Xu over 14 years
    Easier: import data to temporary UTF-8 database in PostgreSQL, export to text file using pg_dump, convert file to WIN1252 using "iconv -f utf-8 -t windows-1252//translit", convert a file back to UTF-8 using "iconv -t windows-1252 -t utf-8" and then import this file to new WIN1252 database.
  • Mooing Duck
    Mooing Duck over 10 years
    The UTF-8 encoded form of 0xFEFF is "0xEFBBBF", not "0xEFBFBD". OP's is the UTF-8 encoded form of the replacement character 0x0fffd: "�"
  • Mooing Duck
    Mooing Duck over 10 years
    The UTF-8 encoded form of the BOM 0xFEFF is "0xEFBBBF", not "0xEFBFBD". OP's is the UTF-8 encoded form of the replacement character 0x0fffd: "�"
  • Harsimranjeet Singh
    Harsimranjeet Singh about 5 years
    This worked for system DSN connections created for PostgreSQL and used in SSIS too.