How to save numpy array of Strings (with commas) to CSV?

14,227

Adding fmt="%s" doesn't put quotes around each field—the quotes are part of the Python string literal for the string %s, and %s just says that any value should be formatted as a string. If you want to force quotes around everything, you need to have quotes in the format string, like fmt='"%s"'.

However, even if you don't do that, the line you showed can't possibly produce the output you showed. There is no way that NumPy is changing your commas into pipe characters, or using pipe characters as delimiters. The only you can get that is by adding delimiter=' |'. And if you add that… it works with no changes, and you get this:

text1, text2 | text3
text4 | text5

So whatever your actual problem is, it can't be the one you described.


Meanwhile, if you're trying to write CSV files for non-numeric data as flexibly as possible, the standard library's csv module is much more powerful than NumPy. The advantage of NumPy—as the name implies—is in dealing with numeric data. Here's how to do it with csv:

with open(filename, 'wb') as f:
    csv.writer(f).writerows(array)

This will default to , as a delimiter. Since some of your strings have , characters in them, by default, it will quote those strings. But you can configure the quoting/escaping behavior, the quote character, the delimiter, and all kinds of other things that NumPy can't.

Share:
14,227
ems
Author by

ems

Updated on July 26, 2022

Comments

  • ems
    ems almost 2 years

    tl;dr ANSWER: Don't use numpy. Use csv.writer instead of numpy.savetxt.

    I'm new to Python and NumPy. It seems like it shouldn't be so difficult to save a 2D array of strings (that contain commas) to a CSV file, but I can't get it to work the way I want.

    Let's say I have an array that looks like this (made from a list of lists):

    [['text1, text2', 'text3'],
    ['text4', 'text5']]
    

    I want a CSV file that looks like this (or without quote characters) in Excel (pipe = cell separator):

    'text1, text2' | 'text3'
    'text4'        | 'text5'
    

    I'm using numpy.savetxt(filename, array, fmt="%s"), and I get the following CSV output (with square brackets):

    ['text1, text2','text3']
    ['text4','text5']
    

    Which displays in Excel like this:

    ['text1  | text2' | 'text3']
    ['text4' | 'text5']
    

    I tried fussing with the savetxt delimiter argument, but no change in output.

    Do I need to do this manually? If so, let me know if there are any shortcuts I should be aware of.

    Ultimately, I need to import the CSV into a Postgresql database. I'm not completely clear on exactly what the CSV formatting needs to be for this to work as expected, but I'm assuming if it looks wrong in Excel, it will probably end up messed up in Postgres. The Postgres documentation says:

    The values in each record are separated by the DELIMITER character. If the value contains the delimiter character, the QUOTE character, the NULL string, a carriage return, or line feed character, then the whole value is prefixed and suffixed by the QUOTE character, and any occurrence within the value of a QUOTE character or the ESCAPE character is preceded by the escape character. You can also use FORCE_QUOTE to force quotes when outputting non-NULL values in specific columns.

    Thanks!

    ++++++++++++++++++++++++++++

    Real input and output, in case it's relevantly different:

    array:

    [['8908232', 'Plant Growth Chamber Facility at the Department of Botany, University of Wisconsin-Madison', 'DBI', 'INSTRUMENTAT & INSTRUMENT DEVP', '1/1/90', '12/19/89', 'WI', 'Standard Grant', 'Joann P. Roskoski', '12/31/91', '$94,914.00 ', 'BIO', '1108', '', '$0.00 ']]
    

    CSV output:

    ['8908232', 'Plant Growth Chamber Facility at the Department of Botany, University of Wisconsin-Madison', 'DBI', 'INSTRUMENTAT & INSTRUMENT DEVP', '1/1/90', '12/19/89', 'WI', 'Standard Grant', 'Joann P. Roskoski', '12/31/91', '$94,914.00 ', 'BIO', '1108', '', '$0.00 ']
    

    Excel's version:

    ['8908232'   'Plant Growth Chamber Facility at the Department of Botany  University of Wisconsin-Madison'    'DBI'   'INSTRUMENTAT & INSTRUMENT DEVP'    '1/1/90'    '12/19/89'  'WI'    'Standard Grant'    'Joann P. Roskoski'     '12/31/91'  '$94   914.00 '     'BIO'   '1108'  ''  '$0.00 ']