Numpy loadtxt: ValueError: Wrong number of columns

32,716

Solution 1

Try np.genfromtxt. It handles missing values; loadtxt does not. Compare their docs.

Missing values can be tricky when the delimiter is white space, but with tabs it should be ok. If there still are problems, test it with a , delimiter.

oops - you still need the extra delimiter

eg.

a, 34, 
b, 43, 34
c, 34

Both loadtxt and genfromtxt accept any iterable that delivers the txt line by line. So a simple thing is to readlines, tweak the lines that have missing values and delimiters, and pass that list of lines to the loader. Or you can write this a 'filter' or generator. This approach has been described in a number of previous SO questions.

In [36]: txt=b"""a\t45\t\nb\t45\t55\nc\t66\t""".splitlines()
In [37]: txt
Out[37]: [b'a\t45\t', b'b\t45\t55', b'c\t66\t']
In [38]: np.genfromtxt(txt,delimiter='\t',dtype=str)
Out[38]: 
array([['a', '45', ''],
       ['b', '45', '55'],
       ['c', '66', '']], 
      dtype='<U2')

I'm using Python3 so the byte strings are marked with a 'b' (for baby and me).

For strings, this is overkill; but genfromtxt makes it easy to construct a structured array with different dtypes for each column. Note that such array is 1d, with named fields - not numbered columns.

In [50]: np.genfromtxt(txt,delimiter='\t',dtype=None)
Out[50]: 
array([(b'a', 45, -1), (b'b', 45, 55), (b'c', 66, -1)], 
      dtype=[('f0', 'S1'), ('f1', '<i4'), ('f2', '<i4')])

to pad the lines I could define a function like:

def foo(astr,delimiter=b',',cnt=3,fill=b' '):
    c = astr.strip().split(delimiter)
    c.extend([fill]*cnt)
    return delimiter.join(c[:cnt])

and use it as:

In [85]: txt=b"""a\t45\nb\t45\t55\nc\t66""".splitlines()

In [87]: txt1=[foo(txt[0],b'\t',3,b'0') for t in txt]
In [88]: txt1
Out[88]: [b'a\t45\t0', b'a\t45\t0', b'a\t45\t0']
In [89]: np.genfromtxt(txt1,delimiter='\t',dtype=None)
Out[89]: 
array([(b'a', 45, 0), (b'a', 45, 0), (b'a', 45, 0)], 
      dtype=[('f0', 'S1'), ('f1', '<i4'), ('f2', '<i4')])

Solution 2

if you have variable number of columns you can't define a proper np.array shape. If you want to store them in an np.array try:

import numpy as np
a = np.loadtxt(r'TEST.txt', delimiter='\n', dtype=str)

now a is array(['a 45', 'b 45 55', 'c 66']).

But in this case is better a list:

with open(r'TEST.txt') as f:
    a = f.read().splitlines()

now a is a list ['a 45', 'b 45 55', 'c 66']

Solution 3

If you want all rows to have the same number of columns but some have missing values you can do it easily with pandas. But you have to know the total number of columns.

import pandas as pd
pd.read_csv('foo.txt', sep='\t', names=['col_a','col_b'])
Share:
32,716
G M
Author by

G M

About me I am an Italian analytical chemist specializing in the conservation of Cultural Heritage. I have a strong interest in science and IT for problem solving and divulgation. I enjoy learning new things and applying my knowledge to create new ones. Why am I here? I've learned Python and GIS on my own (but S.E. community really has helped me) so I'm trying to share my chemistry knowledge. I like helping people and I constantly try to improve my knowledge and skills ( so please correct my English mistakes!).

Updated on August 15, 2022

Comments

  • G M
    G M over 1 year

    Having the file TEST.txt structured as following:

    a   45
    b   45  55
    c   66
    

    When I try to open it:

    import numpy as np
    a= np.loadtxt(r'TEST.txt',delimiter='\t',dtype=str)
    

    I have got the following error:

    ValueError: Wrong number of columns at line 2

    It's clearly due to the fact that the second line has three columns instead of two, but I can't find an answer to my problem using the documentation.

    Is there anyway I can fix it keeping all the data into an array?

    In Matlab I can do something like:

    a=textscan(fopen('TEST.txt'),'%s%s%s');
    

    Something similar in Python would be apreciated.

  • June Wang
    June Wang over 4 years
    hmm, I just tried genfromtxt() on a data format 1,2,3;1,2;1,2,3,4 and it's giving me error Line #3 (got 4 columns instead of 6)
  • hpaulj
    hpaulj over 4 years
    Yes, it does that if the number of columns isn't same. Make sure you are using the right delimiter.
  • June Wang
    June Wang over 4 years
    yup, adding delimiter helps, but it reshapes it. Is there a way to keep original format?
  • hpaulj
    hpaulj over 4 years
    With a well formatted csv there is only one possible shape. Well, two if you specify a compound dtype and structured array. Without seeing part of your file I have no idea of what shape and dtype it should have. Comments is not the place to debug this.