python - using numpy loadtxt reading a csv file with different data types for each column

31,008

Solution 1

You are very close to what you are looking for. Try this

data = np.loadtxt('TS.csv', dtype='str,int', delimiter=',', usecols=(0, 1), unpack=True)

Solution 2

I would generally suggest np.genfromtxt if you have something that np.loadtxt can't handle, but they both struggle with space delimited files if there is missing data. It'd be hard to define how many missing data points there are without a comma separator for instance.

A similar function that may work is pd.read_csv or pd.read_table (same thing mostly), which does take care of this issue. Just make sure to set the parameter delim_whitespace to True with this file formatting.

pd.read_table('TS.csv', delim_whitespace=True, header=None)
Share:
31,008

Related videos on Youtube

Superstar
Author by

Superstar

Updated on May 14, 2020

Comments

  • Superstar
    Superstar almost 4 years

    I created a csv file with two columns, the first column is time data, and the second one is some measured data values.

    2015/1/1 0:00   5       
    2015/1/1 0:15   10    
    2015/1/1 0:30   10   
    2015/1/1 0:45   15   
    2015/1/1 1:00   5  
    2015/1/1 1:15   20  
    2015/1/1 1:30   20  
    2015/1/1 1:45   40  
    2015/1/1 2:00   30  
    2015/1/1 2:15   20  
    2015/1/1 2:30   25  
    2015/1/1 2:45   10  
    2015/1/1 3:00   
    2015/1/1 3:15   
    2015/1/1 3:30   
    2015/1/1 3:45   
    2015/1/1 4:00   
    2015/1/1 4:15   
    2015/1/1 4:30   30  
    2015/1/1 4:45   50  
    2015/1/1 5:00   70  
    

    Now I want to use numpy.loadtxt function to read this two columns into two different numpy arrays with string data type for the date column and integer data type for the value column.

    I tried different statements to do that, but none of them works.

    time, data = np.loadtxt('TS.csv',dtype=str,delimiter=',',usecols=(0, 1),unpack=True)
    time, data = np.loadtxt('TS.csv',dtype=(str,int),delimiter=',',usecols=(0, 1),unpack=True)
    time, data = np.loadtxt('TS.csv',dtype=[str,int],delimiter=',',usecols=(0, 1),unpack=True)
    

    Does anyone know how to realize the goal I just described? Thanks for your help!

  • Superstar
    Superstar almost 9 years
    In general, your solution works well. That's I'm looking for! But when it comes to the special dataset I posted here, there are several empty rows in it. So this argument setting you mentioned doesn't work in this situation. Anyway, your suggestion is really helpful! Thank you very much!
  • Earlee
    Earlee about 2 years
    this works but in my case (perhaps the numpy version), I have to specify the max. length. for example: dtype='S10,int' where 10 after S tells numpy to expect up to 10 characters.