Concatenate pandas DataFrames generated with a loop

22,812

Solution 1

Pandas concat takes a list of dataframes. If you can generate a list of dataframes with your looping function, once you are finished you can concatenate the list together:

data_day_list = []
for i, day in enumerate(list_day):
    data_day = df[df.day==day]
    data_day_list.append(data_day)
final_data_day = pd.concat(data_day_list)

Solution 2

Exhausting a generator is more elegant (if not more efficient) than appending to a list. For example:

def yielder(df, list_day):
    for i, day in enumerate(list_day):
        yield df[df['day'] == day]

final_data_day = pd.concat(list(yielder(df, list_day))

Solution 3

Appending or concatenating pd.DataFrames is slow. You can use a list in the interim and then create the final pd.DataFrame at the end with pd.DataFrame.from_records() e.g.:

interim_list = []
for i,(k,g) in enumerate(df.groupby(['[*name of your date column here*'])):
    if i % 1000 == 0 and i != 0:
        print('iteration: {}'.format(i)) # just tells you where you are in iteration
    # add your "new features" here...
    for v in g.values:
        interim_list.append(v)

# here you want to specify the resulting df's column list...
df_final = pd.DataFrame.from_records(interim_list,columns=['a','list','of','columns'])
Share:
22,812
Annalix
Author by

Annalix

Updated on December 26, 2021

Comments

  • Annalix
    Annalix over 2 years

    I am creating a new DataFrame named data_day, containing new features, for each day extrapolated from the day-timestamp of a previous DataFrame df.

    My new dataframes data_day are 30 independent DataFrames that I need to concatenate/append at the end in a unic dataframe (final_data_day).

    The for loop for each day is defined as follow:

    num_days=len(list_day)
    
    #list_day= random.sample(list_day,num_days_to_simulate)
    data_frame = pd.DataFrame()
    
    for i, day in enumerate(list_day):
    
        print('*** ',day,' ***')
    
        data_day=df[df.day==day]
        .....................
        final_data_day = pd.concat()
    

    Hope I was clear. Mine is basically a problem of append/concatenation of data-frames generated in a non-trivial for loop

  • Annalix
    Annalix about 6 years
    Lovely! @drinck's solution works amazing. Thanks so much
  • Annalix
    Annalix about 6 years
    you are fully write. Thanks! ...cannot give two votes on Stackoverflow??
  • uhoenig
    uhoenig almost 3 years
    I used to do "data_day = df[df.day==day]" as well earlier, but found this to be significantly faster: groups = df.groupby("day") and then do data_day = groups.get_group("day")