Dump a NumPy array into a csv file

986,205

Solution 1

numpy.savetxt saves an array to a text file.

import numpy
a = numpy.asarray([ [1,2,3], [4,5,6], [7,8,9] ])
numpy.savetxt("foo.csv", a, delimiter=",")

Solution 2

You can use pandas. It does take some extra memory so it's not always possible, but it's very fast and easy to use.

import pandas as pd 
pd.DataFrame(np_array).to_csv("path/to/file.csv")

if you don't want a header or index, use to_csv("/path/to/file.csv", header=None, index=None)

Solution 3

tofile is a convenient function to do this:

import numpy as np
a = np.asarray([ [1,2,3], [4,5,6], [7,8,9] ])
a.tofile('foo.csv',sep=',',format='%10.5f')

The man page has some useful notes:

This is a convenience function for quick storage of array data. Information on endianness and precision is lost, so this method is not a good choice for files intended to archive data or transport data between machines with different endianness. Some of these problems can be overcome by outputting the data as text files, at the expense of speed and file size.

Note. This function does not produce multi-line csv files, it saves everything to one line.

Solution 4

Writing record arrays as CSV files with headers requires a bit more work.

This example reads from a CSV file (example.csv) and writes its contents to another CSV file (out.csv).

import numpy as np

# Write an example CSV file with headers on first line
with open('example.csv', 'w') as fp:
    fp.write('''\
col1,col2,col3
1,100.1,string1
2,222.2,second string
''')

# Read it as a Numpy record array
ar = np.recfromcsv('example.csv', encoding='ascii')
print(repr(ar))
# rec.array([(1, 100.1, 'string1'), (2, 222.2, 'second string')], 
#           dtype=[('col1', '<i8'), ('col2', '<f8'), ('col3', '<U13')])

# Write as a CSV file with headers on first line
with open('out.csv', 'w') as fp:
    fp.write(','.join(ar.dtype.names) + '\n')
    np.savetxt(fp, ar, '%s', ',')

Note that the above example cannot handle values which are strings with commas. To always enclose non-numeric values within quotes, use the csv built-in module:

import csv

with open('out2.csv', 'w', newline='') as fp:
    writer = csv.writer(fp, quoting=csv.QUOTE_NONNUMERIC)
    writer.writerow(ar.dtype.names)
    writer.writerows(ar.tolist())

Solution 5

As already discussed, the best way to dump the array into a CSV file is by using .savetxt(...)method. However, there are certain things we should know to do it properly.

For example, if you have a numpy array with dtype = np.int32 as

   narr = np.array([[1,2],
                 [3,4],
                 [5,6]], dtype=np.int32)

and want to save using savetxt as

np.savetxt('values.csv', narr, delimiter=",")

It will store the data in floating point exponential format as

1.000000000000000000e+00,2.000000000000000000e+00
3.000000000000000000e+00,4.000000000000000000e+00
5.000000000000000000e+00,6.000000000000000000e+00

You will have to change the formatting by using a parameter called fmt as

np.savetxt('values.csv', narr, fmt="%d", delimiter=",")

to store data in its original format

Saving Data in Compressed gz format

Also, savetxt can be used for storing data in .gz compressed format which might be useful while transferring data over network.

We just need to change the extension of the file as .gz and numpy will take care of everything automatically

np.savetxt('values.gz', narr, fmt="%d", delimiter=",")

Hope it helps

Share:
986,205
Dexter
Author by

Dexter

Updated on July 08, 2022

Comments

  • Dexter
    Dexter almost 2 years

    Is there a way to dump a NumPy array into a CSV file? I have a 2D NumPy array and need to dump it in human-readable format.

  • Ehtesh Choudhury
    Ehtesh Choudhury almost 13 years
    is this preferred over looping through the array by dimension? I'm guessing so.
  • Dexter
    Dexter almost 13 years
    The array is an ndarray. I hope it adds up.
  • Andrea Zonca
    Andrea Zonca almost 13 years
    you can also change the format of each figure with the fmt keyword. default is '%.18e', this can be hard to read, you can use '%.3e' so only 3 decimals are shown.
  • Dexter
    Dexter almost 13 years
    Andrea, Yes I used %10.5f. It was pretty convenient.
  • Peter
    Peter over 8 years
    As far as I can tell, this does not produce a csv file, but puts everything on a single line.
  • Lee
    Lee over 8 years
    @Peter, good point, thanks, I've updated the answer. For me it does save ok in csv format (albeit limited to one line). Also, it's clear that the asker's intent is to "dump it in human-readable format" - so I think the answer is relevant and useful.
  • Ébe Isaac
    Ébe Isaac about 8 years
    Your method works well for numerical data, but it throws an error for numpy.array of strings. Could you prescribe a method to save as csv for an numpy.array object containing strings?
  • Arash Howaida
    Arash Howaida over 7 years
    What does the scipy documentation mean when it says delimiter is the character or string separating columns? When I use savetxt() it throws everything in the same column. Also, how do we go about saving in .tsv format? Do we use 4 spaces? The scipy documentation doesn't touch on .tsv at all, but .tsv is such a common format, there must be a way. Any thoughts?
  • RM-
    RM- about 7 years
    However this will also write a column index in the first row.
  • maxbellec
    maxbellec about 7 years
    @RM- you can use df.to_csv("file_path.csv", header=None)
  • mork
    mork about 7 years
    I find it again and again that the best csv exports are when 'piped' into pandas' to_csv
  • Luis
    Luis about 7 years
    @ÉbeIsaac You can specify the format as string as well: fmt='%s'
  • Tex
    Tex almost 7 years
    Not good. This creates a df and consumes extra memory for nothing
  • Adrian
    Adrian over 6 years
    You can even set different formats for each column, eg. fmt = '%.4f, %.8f' to write 4 and 8 decimals in the first and second column, respectively.
  • remram
    remram over 6 years
    This uses a lot of memory. Prefer looping over each row and format&write it.
  • Greg
    Greg over 6 years
    @remram it depends on your data, but yes if it is big it can use a lot of memory
  • thepunitsingh
    thepunitsingh over 6 years
    worked like charm, it's very fast - tradeoff for extra memory usage. parameters header=None, index=None remove header row and index column.
  • Kevin J. Black
    Kevin J. Black about 6 years
    Since version 1.5.0, np.tofile() takes an optional parameter newline='\n' to allow multi-line output. docs.scipy.org/doc/numpy-1.13.0/reference/generated/…
  • circuitdesigner5172
    circuitdesigner5172 about 6 years
    Works for lists with strings, too.
  • PirateApp
    PirateApp about 6 years
    well that is literally going to destroy all the memory savings for using a numpy array
  • Sohaib Aslam
    Sohaib Aslam almost 6 years
    TypeError: Mismatch between array dtype ('object') and format specifier ('%.18e,%.18e,%.18e,%.18e,%.18e,%.18e,%.18e,%.18e,%.18e,%.18‌​e,%.18e,%.18e,%.18e,‌​%.18e,%.18e,%.18e,%.‌​18e,%.18e,%.18e,%.18‌​e')
  • eaydin
    eaydin over 5 years
    Actually, np.savetext() provides the newline argument, not np.tofile()
  • Mr. T
    Mr. T over 5 years
    This has already been suggested: stackoverflow.com/a/41009026/8881141 Please only add new approaches, don't repeat previously published suggestions.
  • Dave C
    Dave C over 5 years
    The numpy.savetxt method is great, but it puts a hash symbol at the start of the header line.
  • payne
    payne over 5 years
    The fmt="%d" was what I was looking for. Thank you!
  • Milind R
    Milind R over 5 years
    @DaveC : You have to set the comments keyword argument to '', the # will be suppressed.
  • Dave C
    Dave C about 5 years
    Should this answer have comments='' to get rid of the weird hash symbol at the start of the column names?
  • HelloGoodbye
    HelloGoodbye over 4 years
    @EhteshChoudhury Usually when there is a function you can call instead of creating a loop that accomplishes the same thing, the function call is preferred since it makes the code simpler. (If calling the function wouldn't be the preferred method, why would the function in that case exist?)
  • smci
    smci about 4 years
    This only works when it's a numerical array. If it's an array of object (string), you need third argument fmt='%s' to avoid failing with TypeError: Mismatch between array dtype ('object') and format specifier ('%.18e'). Can you update your answer?
  • EFreak
    EFreak almost 4 years
    This is where pandas again helps. You can do: pd.DataFrame(out, columns=['col1', 'col2']), etc
  • rosefun
    rosefun over 3 years
  • mins
    mins over 3 years
    No need for a remake, the original is crisp and clear.
  • Abhilash Singh Chauhan
    Abhilash Singh Chauhan about 3 years
    @maxbellec It gives, ValueError: Must pass 2-d input
  • maxbellec
    maxbellec about 3 years
    @AbhilashSinghChauhan well yes, csv data is 2 dimensionnal (row and columns)
  • Abhilash Singh Chauhan
    Abhilash Singh Chauhan about 3 years
    @maxbellec I know that CSV data is 2D data, but while reading raster as numpy dataset, it additionally add the layer count and dataset become 3-D, even if the raster dataset has single layer, it still shows data as (count, height, width) where count = 1, how to export that data to .CSV or .txt file?
  • schade96
    schade96 almost 3 years
    See this answer for why this may be helpful: You can set the decimal separator in pandas on export.
  • Atybzz
    Atybzz over 2 years
    Since you are calling np.savetext(..., you don't need the import call from numpy import savetxt. If you do import it, you can simply call it as savetext(...