Is it possible to turn off quote processing in the Postgres COPY command with CSV format?

17,842

Solution 1

Workaround (thanks to this comment!)

COPY <tablename> FROM <filename> WITH CSV DELIMITER E'\t' QUOTE E'\b' NULL AS '';

So basically specifying a quote character that should never be in the text, but that's pretty ugly.

I'd much prefer it if there was in fact a way to turn off quote processing altogether.

Solution 2

(Added as a new answer since I don't have the reputation yet to comment.)

For the record, since I've been struggling with the same issue, you can use tr to remove \b, instead of just hoping it's not in your text anywhere.

tr -d '\010' < filename.csv > newfile.csv

(Using that \010 is the octal representation of \b).

Since COPY supports reading from STDIN, you can ease the I/O impact by piping tr's output:

cat filename.csv | tr -d '\010' | COPY <tablename> FROM STDIN WITH CSV DELIMITER E'\t' QUOTE E'\b' NULL AS '';

Solution 3

The mode you want to use for data formatted as you describe is the default text mode. It will pass most characters unhindered into the database. It does not have quote processing, and it's using tabs as delimiters. Using CSV mode will just cause you trouble because you're introducing quoting that you have to work around.

Text mode will pass dollar characters, single and double quotes, pipes, and even backspaces (even though that was not mentioned in the question) right in. The one thing in the example that's not passed through is backslashes. But that's as simple as escaping them, for example by this sed command:

sed -e 's/\\/\\\\/g' < source.txt > processed.txt

Then the processed file should be importable without any additional options:

\copy sometable from processed.txt
Share:
17,842

Related videos on Youtube

Tom De Leu
Author by

Tom De Leu

Software developer, technology enthusiast. Looking for ways to build software more simply, more focused, with more happiness for the people building it. Interested in anything that reduces the distance between idea and working software. Current main technical interests: Ruby, Rails, Unix, Elixir, ElasticSearch.

Updated on March 15, 2020

Comments

  • Tom De Leu
    Tom De Leu about 4 years

    I have CSV files, tab-separated, fields not wrapped in quotes, where field data can contain characters like single quotes, double quotes, pipes and backslashes.

    Sample data can look like this:

    1       2       "ba$aR\eR\       18
    

    I want to import this data into Postgres using the COPY statement.

    When I try to import this using

    COPY <tablename> FROM  <filename> NULL AS '';
    

    I get an error psql:-:1: ERROR: missing data for column because Postgres is treating the backslash + tab as an "escaped tab" instead of a backslash followed by the field separator.

    So I switched to using the "CSV format" of the COPY operator, like so:

    COPY <tablename> FROM <filename> WITH CSV DELIMITER E'\t' NULL AS '';
    

    Now there's a new error psql:-:1: ERROR: value too long for type character varying(254)

    Apparently because it's interpreting the double-quote at the start of field 3 as the field wrapping character.

    How can I specify that my data is NOT quoted at all?

  • Yuval
    Yuval over 4 years
    @Kyle Barron's addition is crucial for some badly-encoded text. I have a very large (~50GB) CSV which I needed to copy and about 3GB before the end, '\b' appeared. The source for this file did a bad job in sanitizing values, but it doesn't mean I want my COPY to fail. tr to the rescue.