How to import CSV file data into a PostgreSQL table?

966,315

Solution 1

Take a look at this short article.


The solution is paraphrased here:

Create your table:

CREATE TABLE zip_codes
(ZIP char(5), LATITUDE double precision, LONGITUDE double precision,
CITY varchar, STATE char(2), COUNTY varchar, ZIP_CLASS varchar);

Copy data from your CSV file to the table:

COPY zip_codes FROM '/path/to/csv/ZIP_CODES.txt' WITH (FORMAT csv);

Solution 2

If you don't have permission to use COPY (which work on the db server), you can use \copy instead (which works in the db client). Using the same example as Bozhidar Batsov:

Create your table:

CREATE TABLE zip_codes
(ZIP char(5), LATITUDE double precision, LONGITUDE double precision,
CITY varchar, STATE char(2), COUNTY varchar, ZIP_CLASS varchar);

Copy data from your CSV file to the table:

\copy zip_codes FROM '/path/to/csv/ZIP_CODES.txt' DELIMITER ',' CSV

Mind that \copy ... must be written in one line and without a ; at the end!

You can also specify the columns to read:

\copy zip_codes(ZIP,CITY,STATE) FROM '/path/to/csv/ZIP_CODES.txt' DELIMITER ',' CSV

See the documentation for COPY:

Do not confuse COPY with the psql instruction \copy. \copy invokes COPY FROM STDIN or COPY TO STDOUT, and then fetches/stores the data in a file accessible to the psql client. Thus, file accessibility and access rights depend on the client rather than the server when \copy is used.

And note:

For identity columns, the COPY FROM command will always write the column values provided in the input data, like the INSERT option OVERRIDING SYSTEM VALUE.

Solution 3

One quick way of doing this is with the Python Pandas library (version 0.15 or above works best). This will handle creating the columns for you - although obviously the choices it makes for data types might not be what you want. If it doesn't quite do what you want you can always use the 'create table' code generated as a template.

Here's a simple example:

import pandas as pd
df = pd.read_csv('mypath.csv')
df.columns = [c.lower() for c in df.columns] # PostgreSQL doesn't like capitals or spaces

from sqlalchemy import create_engine
engine = create_engine('postgresql://username:password@localhost:5432/dbname')

df.to_sql("my_table_name", engine)

And here's some code that shows you how to set various options:

# Set it so the raw SQL output is logged
import logging
logging.basicConfig()
logging.getLogger('sqlalchemy.engine').setLevel(logging.INFO)

df.to_sql("my_table_name2",
          engine,
          if_exists="append",  # Options are ‘fail’, ‘replace’, ‘append’, default ‘fail’
          index = False, # Do not output the index of the dataframe
          dtype = {'col1': sqlalchemy.types.NUMERIC,
                   'col2': sqlalchemy.types.String}) # Datatypes should be SQLAlchemy types

Solution 4

Most other solutions here require that you create the table in advance/manually. This may not be practical in some cases (e.g., if you have a lot of columns in the destination table). So, the approach below may come handy.

Providing the path and column count of your CSV file, you can use the following function to load your table to a temp table that will be named as target_table:

The top row is assumed to have the column names.

create or replace function data.load_csv_file
(
    target_table text,
    csv_path text,
    col_count integer
)

returns void as $$

declare

iter integer; -- dummy integer to iterate columns with
col text; -- variable to keep the column name at each iteration
col_first text; -- first column name, e.g., top left corner on a csv file or spreadsheet

begin
    create table temp_table ();

    -- add just enough number of columns
    for iter in 1..col_count
    loop
        execute format('alter table temp_table add column col_%s text;', iter);
    end loop;

    -- copy the data from csv file
    execute format('copy temp_table from %L with delimiter '','' quote ''"'' csv ', csv_path);

    iter := 1;
    col_first := (select col_1 from temp_table limit 1);

    -- update the column names based on the first row which has the column names
    for col in execute format('select unnest(string_to_array(trim(temp_table::text, ''()''), '','')) from temp_table where col_1 = %L', col_first)
    loop
        execute format('alter table temp_table rename column col_%s to %s', iter, col);
        iter := iter + 1;
    end loop;

    -- delete the columns row
    execute format('delete from temp_table where %s = %L', col_first, col_first);

    -- change the temp table name to the name given as parameter, if not blank
    if length(target_table) > 0 then
        execute format('alter table temp_table rename to %I', target_table);
    end if;

end;

$$ language plpgsql;

Solution 5

You could also use pgAdmin, which offers a GUI to do the import. That's shown in this SO thread. The advantage of using pgAdmin is that it also works for remote databases.

Much like the previous solutions though, you would need to have your table on the database already. Each person has his own solution, but I usually open the CSV file in Excel, copy the headers, paste special with transposition on a different worksheet, place the corresponding data type on the next column, and then just copy and paste that to a text editor together with the appropriate SQL table creation query like so:

CREATE TABLE my_table (
    /* Paste data from Excel here for example ... */
    col_1 bigint,
    col_2 bigint,
    /* ... */
    col_n bigint
)
Share:
966,315
Admin
Author by

Admin

Updated on January 03, 2022

Comments

  • Admin
    Admin over 2 years

    How can I write a stored procedure that imports data from a CSV file and populates the table?

  • asksw0rder
    asksw0rder over 11 years
    actually use \copy would do the same trick if you do not have the super user access; it complaints on my Fedora 16 when using COPY with a non-root account.
  • David Pelaez
    David Pelaez over 11 years
    TIP: you can indicate what columns you have in the CSV using zip_codes(col1, col2, col3). The columns must be listed in the same order that they appear in the file.
  • JhovaniC
    JhovaniC almost 11 years
    @asksw0rder does \copy have the same syntax? bcoz I'm getting a syntax error with \copy
  • bernie2436
    bernie2436 over 10 years
    Should I include the header row?
  • Barrett Clark
    Barrett Clark over 10 years
    You can easily include the header row -- just add HEADER in the options: COPY zip_codes FROM '/path/to/csv/ZIP_CODES.txt' DELIMITER ',' CSV HEADER; postgresql.org/docs/9.1/static/sql-copy.html
  • itsols
    itsols over 10 years
    This is REALLY COOL... I typed the copy statement in a query window using PGAdmin and it works beautifully.
  • user88
    user88 over 9 years
    I have a multiple delimiter like , and " the How can I copy
  • Peter Krauss
    Peter Krauss about 9 years
    How to use (at client psql) FROM ./relativePath/file ? Not works for me
  • joelostblom
    joelostblom about 9 years
    In addition, the if_exists parameter can be set to replace or append to an existing table, e.g. df.to_sql("fhrs", engine, if_exists='replace')
  • JZ.
    JZ. over 8 years
    \copy voters(ZIP,CITY) FROM '/Users/files/Downloads/WOOD.TXT' DELIMITER ',' CSV HEADER; ERROR: extra data after last expected column CONTEXT: COPY voters, line 2: "OH0012781511,87,26953,HOUSEHOLDER,SHERRY,LEIGH,,11/26/1965,‌​08/19/1988,,211 N GARFIELD ST , ,BLOOMD..."
  • Andy Ray
    Andy Ray over 8 years
    Will this overwrite data in an existing table, or append it?
  • Robban1980
    Robban1980 about 8 years
    @AndyRay " while COPY FROM copies data from a file to a table (appending the data to whatever is in the table already)" from the manual on the link above. postgresql.org/docs/current/static/sql-copy.html.
  • alex bennett
    alex bennett almost 8 years
    @JZ. I had a similar error. It was because I had extra blank columns. Check your csv and if you have blank columns, that could be the reason.
  • chbrown
    chbrown almost 8 years
    Link rot is voracious! The article you linked to no longer works, which makes me uncomfortable :(
  • mountainclimber11
    mountainclimber11 almost 8 years
    you might want to mention that his is py.
  • Noumenon
    Noumenon over 7 years
    DBVisualizer took 50 seconds to import 1400 rows with three fields -- and I had to cast everything back from a String to whatever it was supposed to be.
  • DavidC
    DavidC over 7 years
    For me I get a MemoryError if trying to import a large CSV so it looks like it doesn't stream.
  • sal
    sal over 7 years
    @DavidC Interesting. How big is your file? How much memory do you have? If it doesnt stream as it appears, I suggest chunking the data before insertion
  • sal
    sal over 7 years
    @DavidC Or you use the csvql command without the --insert option and insert the data later via COPY, or you go by RobinL's answer stackoverflow.com/a/29722393/2772305
  • DavidC
    DavidC over 7 years
    The file was 5GBs in size and I have 2GB memory. I gave up on it and use a script to generate CREATE TABLE and COPY commands in the end.
  • user2867432
    user2867432 over 7 years
    Hi Mehmet, thanks for the answer you posted but when I run your code I get the following error message : ERROR: schema "data" does not exist
  • mehmet
    mehmet over 7 years
    user2867432 you need to change schema name that you use accordingly (e.g., public)
  • IMSoP
    IMSoP over 7 years
    This is somewhat misleading: the difference between COPY and \copy is much more than just permissions, and you can't simply add a `` to make it magically work. See the description (in the context of export) here: stackoverflow.com/a/1517692/157957
  • Han
    Han over 7 years
    @IMSoP: you're right, I added a mention of server and client to clarify
  • Somnath Kadam
    Somnath Kadam about 7 years
    username and password : need to create Login and assign DB to user. If uses pgAdmin, then create "Login/Group role" using GUI
  • user48956
    user48956 about 7 years
    Pandas is a super slow way of loading to sql (vs csv files). Can be orders of magnitude slower.
  • Geeme
    Geeme almost 7 years
    Hi Mehmet, Thanks for solution, it's perfect but This works only if the postgres DB user is superuser, is ther any way to make it work without superuser?
  • mehmet
    mehmet almost 7 years
    Geeme: read "security definer" here, but I have not used it myself.
  • dcorking
    dcorking about 6 years
    pls show a couple of sample rows of your pasted data
  • Sebastian
    Sebastian almost 6 years
    @bjelli is \copy slower than copy? I have a 1.5MB file and a db.m4.large instance on RDS and it's been hours that this copy command has been running (at least 3).
  • Han
    Han almost 6 years
    @Sebastian: the important difference is that \copy works from the client. so you still have to transmit all the data to the server. with COPY (no slash) you first upload all the data to the server with other means (sftp, scp) and then do the import on the server. but transmitting 1.5 MB does not sound like it should talk 3 hours - no matter which way you do it.
  • Ankit Singh
    Ankit Singh almost 6 years
    This could be a way to write data but it is super slow even with batch and good computing power. Using CSVs is a good way to accomplish this.
  • citynorman
    citynorman over 5 years
    df.to_sql() is really slow, you can use d6tstack.utils.pd_to_psql() from d6tstack see performance comparison
  • citynorman
    citynorman over 5 years
    As an alternative, d6tstack streams and also deals with schema changes, see examples
  • nate
    nate about 5 years
    This worked for me, and I use Windows OS - just change the (absolute) path formatting style. It's good to know that this method is easy to learn and implement as I have been trying to do the same procedure with SQL and it does not work as easy as this method.
  • Wes
    Wes about 5 years
    How is this not the accepted answer? Why would I write a python script when the database already has a command to do this?
  • GammaGames
    GammaGames over 4 years
    You made a separate tool for the equivalent of psql -h 192.168.99.100 -U postgres mydatabase -c "COPY users FROM 'users.csv' DELIMITER ';' CSV"? I guess the part where it creates the table is nice, but since every field is text it's not super useful
  • Eduardo Pereira
    Eduardo Pereira over 4 years
    Ops, thanks for the heads up. Yes, I did it, well it took just a few hours and I learned cool stuff in Go and pq and database API in Go.
  • AbstProcDo
    AbstProcDo over 4 years
    I am very confused about the distinction of \copy on client and copy on server, since mysql, mariadb do not have such concepts to trouble users.
  • Manohar Reddy Poreddy
    Manohar Reddy Poreddy over 4 years
    Beautiful answer! I am not going to too generic though in my code for readability for others.
  • Robin Métral
    Robin Métral about 4 years
    If you have NULL values in your CSV, define them using the NULL AS flag: COPY zip_codes FROM '/path/to/csv/ZIP_CODES.txt' DELIMITER ',' CSV NULL AS '<your null value>';. Replace <your null value> with whatever you have in your CSV, often "NULL" or "".
  • ktaria
    ktaria about 4 years
  • umbe1987
    umbe1987 about 4 years
    It would have been nice to understand how to actually use DBeaver to import a CSV file. Anyway, this might help: dbeaver.com/docs/wiki/Data-transfer
  • Igor
    Igor about 4 years
    I tried to import 16Gb but I got error: ERROR: out of memory DETAIL: Cannot enlarge string buffer containing 1073725476 bytes by 65536 more bytes.
  • Han
    Han about 4 years
    @Igor this is a separate question, answered here: stackoverflow.com/questions/56714274/…
  • Cybernetic
    Cybernetic almost 3 years
    This gives me ERROR: syntax error at or near "\" Position: 1
  • questionto42standswithUkraine
    questionto42standswithUkraine almost 3 years
    Mind that \copy ... must be written in one line and without a ; at the end. This should at best be included in the answer since some comments here show that this is not always clear. My edit got rejected. [Another thing: if you have a header row, add HEADER at the end.]
  • Peter Mortensen
    Peter Mortensen about 2 years
    What do you mean by "your CSV format is a tab delimiter"?
  • Peter Mortensen
    Peter Mortensen about 2 years
    Please review Why not upload images of code/errors when asking a question? (e.g., "Images should only be used to illustrate problems that can't be made clear in any other way, such as to provide screenshots of a user interface.) and take the appropriate action (it covers answers as well). Thanks in advance.
  • Peter Mortensen
    Peter Mortensen about 2 years
    It applies at least to the first image. The last image is unreadable (possibly lost fidelity due to falsely being converted to JPEG (unsuitable for screenshots)).
  • Rich Lysakowski PhD
    Rich Lysakowski PhD about 2 years
    Peter suggested that I move this question to comments: "Can DBeaver power users who are Java developers provide some insight about the steps to create analytics widgets to add into the Community Edition of DBeaver? " I would like to know if the analytics plugins are also open source, and how to create them.
  • Always Sunny
    Always Sunny almost 2 years
    Why do we need datatype on the copy command? I mean on step 7