Handling error "TypeError: Expected tuple, got str" loading a CSV to pandas multilevel and multiindex (pandas)

15,414

You are getting an error because some of your columns are not tuples, they are strings from index 2368 to 2959 in df.columns.
Indices where the columns are strings:

df.columns[2368:2959]
Index(['('z', '1', '1', '00h').1', '('z', '1', '1', '06h').1',
       '('z', '1', '1', '12h').1', '('z', '1', '1', '18h').1',
       '('z', '1', '2', '00h').1', '('z', '1', '2', '06h').1',
       '('z', '1', '2', '12h').1', '('z', '1', '2', '18h').1',
       '('z', '1', '3', '00h').1', '('z', '1', '3', '06h').1',
       ...
       '('z', '1000', '2', '06h').1', '('z', '1000', '2', '12h').1',
       '('z', '1000', '2', '18h').1', '('z', '1000', '3', '00h').1',
       '('z', '1000', '3', '06h').1', '('z', '1000', '3', '12h').1',
       '('z', '1000', '3', '18h').1', '('z', '1000', '4', '00h').1',
       '('z', '1000', '4', '06h').1', '('z', '1000', '4', '12h').1'],
      dtype='object', length=591)

Since you want multi-index column dataframe using the tuples, so we are cleaning these strings first by taking the substring which is necessary using re.findall with regex pattern = '(\(.*?\)).' then passing this value through ast.literal_eval for converting string to tuple automatically. Finally, using the pd.MultiIndex.from_tuples as:

df = pd.read_csv('teste.csv',index_col=[0,1,2,3,4],header=[0,1,2,3],parse_dates=True)

import re
import ast

column_list = []
for column in df.columns:
    if isinstance(column,str):
        column_list.append(ast.literal_eval(re.findall('(\(.*?\)).',column)[0]))
    else:
        column_list.append(column)


df.columns = pd.MultiIndex.from_tuples(column_list,
                                       names=('variables', 'level','days','times'))

print(df.iloc[:,:6].head())
variables                                                u                    
level                                                    1                    
days                                                     1               2    
times                                                  00h 06h 12h 18h 00h 06h
wsid lat        lon        start               prcp_24                        
329  -43.969397 -19.883945 2007-03-18 10:00:00 72.0      0   0   0   0   0   0
                           2007-03-20 10:00:00 104.4     0   0   0   0   0   0
                           2007-10-18 23:00:00 92.8      0   0   0   0   0   0
                           2007-12-21 00:00:00 60.4      0   0   0   0   0   0
                           2008-01-19 18:00:00 53.0      0   0   0   0   0   0
Share:
15,414
Andre Araujo
Author by

Andre Araujo

Software engineer, researcher and father of girl.

Updated on June 14, 2022

Comments

  • Andre Araujo
    Andre Araujo almost 2 years

    I'm trying to load a CSV file (this file) to create a multiindex e multilevel dataframe. It has 5(five) indexes and 3(three) levels in columns.

    How I can do? Here is the code:

    df = pd.read_csv('./teste.csv'
                      ,index_col=[0,1,2,3,4]
                      ,header=[0,1,2,3]
                      ,skipinitialspace=True
                      ,tupleize_cols=True)
    
    df.columns = pd.MultiIndex.from_tuples(df.columns)
    

    Expected output:

    variables                                                u                  \
    level                                                    1                   
    days                                                     1               2   
    times                                                  00h 06h 12h 18h 00h   
    wsid lat        lon        start               prcp_24                       
    329  -43.969397 -19.883945 2007-03-18 10:00:00 72.0      0   0   0   0   0   
                               2007-03-20 10:00:00 104.4     0   0   0   0   0   
                               2007-10-18 23:00:00 92.8      0   0   0   0   0   
                               2007-12-21 00:00:00 60.4      0   0   0   0   0   
                               2008-01-19 18:00:00 53.0      0   0   0   0   0   
                               2008-04-05 01:00:00 80.8      0   0   0   0   0   
                               2008-10-31 17:00:00 101.8     0   0   0   0   0   
                               2008-11-01 04:00:00 82.0      0   0   0   0   0   
                               2008-12-29 00:00:00 57.8      0   0   0   0   0   
                               2009-03-28 10:00:00 72.4      0   0   0   0   0   
                               2009-10-07 02:00:00 57.8      0   0   0   0   0   
                               2009-10-08 00:00:00 83.8      0   0   0   0   0   
                               2009-11-28 16:00:00 84.4      0   0   0   0   0   
                               2009-12-18 04:00:00 51.8      0   0   0   0   0   
                               2009-12-28 00:00:00 96.4      0   0   0   0   0   
                               2010-01-06 05:00:00 74.2      0   0   0   0   0   
                               2011-12-18 00:00:00 113.6     0   0   0   0   0   
                               2011-12-19 00:00:00 90.6      0   0   0   0   0   
                               2012-11-15 07:00:00 85.8      0   0   0   0   0   
                               2013-10-17 00:00:00 52.4      0   0   0   0   0   
                               2014-04-01 22:00:00 72.0      0   0   0   0   0   
                               2014-10-20 06:00:00 56.6      0   0   0   0   0   
                               2014-12-13 09:00:00 104.4     0   0   0   0   0   
                               2015-02-09 00:00:00 62.0      0   0   0   0   0   
                               2015-02-16 19:00:00 56.8      0   0   0   0   0   
                               2015-05-06 17:00:00 50.8      0   0   0   0   0   
                               2016-02-26 00:00:00 52.2      0   0   0   0   0   
    

    I need handling error "TypeError: Expected tuple, got str":

    TypeError: Expected tuple, got str