Pandas for loop over dataframe gives too many values to unpack

10,112

pd.DataFrame.itertuples returns an iterable of namedtuples including the index by default.

There are two options to account for this.

Option 1

Unpack 3 items instead of 2, the first of which you do not use.

Here is a minimal example:

df = pd.DataFrame([[10, 20], [30, 40], [50, 60]],
                  columns=['A', 'B'])

for idx, a, b in df.itertuples():
    print(idx, a, b)

0 10 20
1 30 40
2 50 60

In your case, a good convention to use would be to indicate an unused variable by _:

for _, file_date, file_name in process_list[['date', 'name']].itertuples():
    # do something

Option 2

Use index=False argument and unpack 2 elements:

for file_date, file_name in process_list[['date', 'name']].itertuples(index=False):
    # do something

The behaviour is indicated in the documentation:

DataFrame.itertuples(index=True, name='Pandas')

Iterate over DataFrame rows as namedtuples, with index value as first element of the tuple.

Share:
10,112
sio2bagger
Author by

sio2bagger

the worst form of microhacker, a real coding disaster. My only real function is to be the "before" or "don't EVER do this" examples in books and videos. Trust me, my code runs BUT...

Updated on June 16, 2022

Comments

  • sio2bagger
    sio2bagger almost 2 years

    I don't see why this code isn't working? I am trying to iterate over a data frame, which in this case only has one row in a for loop? There are only two columns and I have two for loop variables to take them? what am I missing please?

      print("process_list =  ",process_list)
    
      for row in process_list.itertuples():
           print("row = ", row)
    
    
      df_to_date = pd.DataFrame()
    
      try:
            print("process_list = {}  and it's type {}  process_list.itertuples() {} ".format(process_list, type(process_list),process_list.itertuples() ) )
    
            for   file_date , file_name  in process_list.itertuples(): # a whole batch of days 
                   file_to_process = dev_env + file_name
                   print("PROCESSING BATCH: ",file_to_process)
                   df  = pd.read_csv(file_to_process, header=None,skiprows=22, sep=',', comment='*', converters = {"Days" : just_number,"Percentile" : just_number,"Date" : just_number} ,names = column_names )
                   df.insert(0,'File_date',file_date)
                   df_to_date = df_to_date.append(df)
    
      except Exception as e: 
               print ("nothing to process exception = ",e)
               sys.exit(0)
    

    when I run it I get

    process_list =       File_date          File_name
    94   20180507  mcmhv20180507.csv
    row =  Pandas(Index=94, File_date=20180507, File_name='mcmhv20180507.csv')
    process_list =     File_date          File_name
    94   20180507  mcmhv20180507.csv  and it's type <class 'pandas.core.frame.DataFrame'>  process_list.itertuples() <map object at 0x7f6339371e48> 
    nothing to process exception =  too many values to unpack (expected 2)