numpy recfromcsv and genfromtxt skips first row of data file

12,124

Solution 1

The default first line of a csv file contains the field names. The function recfromcsv invoke genfromtxt with parameters names=True as default. It means that it read the first line of the data as the header.

Definition: http://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html

You should write it before the array.

import numpy as np

filename = 'data.csv'
writer = open(filename,mode='w')
writer.write('first column,second column,third column\n')
writer.write('0,1.1,1.2\n1,2.1,2.2\n2,3.1,3.2')
writer.close()

data = np.recfromcsv(filename)
print data

Or use recfromtxt instead of recfromcsv.

Or overwrite the default name as

recfromcsv(filename, names=['a','a','a'])

Solution 2

You can add skiprow=0 to keep recfromcsv from skipping the first row.

Solution 3

The default behavior of recfromcsv is to read a header row, which is why it's skipping the first row. It works for me with genfromtxt (if I pass delimiter=','). Can you provide output showing how genfromtxt fails?

Unfortunately it seems there is a bug in Numpy that won't let you specify the dtype in recfromcsv (see https://github.com/numpy/numpy/issues/311), so I can't see how to read it in with specified column names, which I think is what you need to do to avoid reading the header line. But you can read the data in with genfromtxt.

Edit: It looks like you can read it in just by passing in a list of names:

np.recfromcsv(filename, delimiter=',', names=['a', 'b', 'c'])

(The reason it wasn't working for me is I had done from __future__ import unicode_literals and it apparently doesn't like unicode in dtypes.)

Share:
12,124
det
Author by

det

Updated on June 05, 2022

Comments

  • det
    det almost 2 years

    numpy's recfromcsv skips the first line of my data. (Same thing for genfromtxt)

    import numpy as np
    
    filename = 'data.csv'
    writer = open(filename,mode='w')
    writer.write('0,1.1,1.2\n1,2.1,2.2\n2,3.1,3.2')
    writer.close()
    
    data = np.recfromcsv(filename)
    print data
    

    Is this a bug, or how can I load the data without loosing the first line?

  • joris
    joris over 11 years
    Anf if you want to keep the result as a structured array with int for the first volumn and float for the other columns as you would get from recfromcsv, you also can specify the keyword dtype=None
  • det
    det over 11 years
    In my case the data file format is fixed -- but giving names does the trick. (...note that you've got an extra set of quotes in your code: should be recfromcsv(filename, names=['a','a','a'])