pandas to_csv arguments float_format and decimal not working for index column

65,488

I would rewrite your two bottom lines:

with open('foo.csv', 'w') as f:
    data.to_csv(f, index=True, header=True, decimal=',', sep=' ', float_format='%.3f')

Into

data.reset_index().to_csv('foo.csv', index=False, header=True, decimal=',', sep=' ', float_format='%.3f')

This is a bit of a workaround, but as you have noticed, the keyword arguments decimal= and float_format= only work on data columns, not on the index.

What I do instead is to put the index into the dataframe with reset_index and then I tell to_csv(index=False not to save the index to the file (since it is now in the data).

Also, opening a file stream yourself (with open('foo.csv', 'w') as f:) is better left to pandas, which does this by itself when you just give it a string 'foo.csv' as first argument.

Share:
65,488

Related videos on Youtube

albert
Author by

albert

Chemical Engineer / Process Engineer - Research Assistant - Doctoral Candidate - Digital Transformation Enthusiast

Updated on January 16, 2020

Comments

  • albert
    albert over 4 years

    Background

    I am doing some simulations resp. a system analysis by variing parameters (in this case rpm only) and append every last line of a results dataframe results_df to a summarizing dataframe df containing giving the baviour of my system in depencence of the varied rpm.

    In order to get an appropriate index for plotting and data analysis I converted the varied values (here rpm) from the list into a pandas series ser and concat this series with the summarizing dataframe df containing the results I am interested in.

    Since the results of each calculation I am interested in is only last line of each calculation I am extracting this data from the results dataframe results_df by using .tail(1).

    What I have done so far is shown in the following snippet:

    rpm = [0.25, 0.3, 0.5, 0.75, 1.0, 1.5, 2.0]
    
    ser = pd.Series(rpm, name='rpm')
    df = pd.DataFrame()
    df_list = list()
    
    for i, val in enumerate(rpm):
        results_df = get_some_data_from_somwhere()
        df_list.append(results_df.tail(1))
    
    df = df.append(df_list, ignore_index=True)
    df = pd.concat([df, ser], axis=1)
    df.set_index('rpm', inplace=True)
    
    
    with open('foo.csv', 'w') as f:
        data.to_csv(f, index=True, header=True, decimal=',', sep=' ', float_format='%.3f')
    

    Problem

    This csv-file what I get has the follwing format:

    rpm cooling_inner heating_inner cooling_outlet heating_outlet
    0.25 303,317 323,372 302,384 324,332
    

    However, I expected having three decimal digits and a comma as decimal sign on my index column, like shown here:

    rpm cooling_inner heating_inner cooling_outlet heating_outlet
    0,250 303,317 323,372 302,384 324,332
    

    So it seems that the index and decimal sign options are not applied to the index column when exporting dataframes to csv-files using the .to_csv command.

    How could I achieve this behaviour since the index option is set True and all values (with exception to the index column) have the right format and decimal sign?

    Do I have to handle the index column somehow seperate?

    • firelynx
      firelynx almost 9 years
      A workaround is to do data.reset_index().to_csv(index=False ...
    • firelynx
      firelynx almost 9 years
      Also, you can do data.to_csv('foo.csv'... you don't need to give it a stream.
    • albert
      albert almost 9 years
      @firelynx: Is the file closed automatically after exporting data? This is somehow relevant since I exceeded the maximum number of opened files a few months ago...
    • firelynx
      firelynx almost 9 years
      Yes, it closes the file after itself. I think it is generally safer to let pandas deal with the file handling, since then the logic is kept in one place, not in all places you do .to_csv
    • firelynx
      firelynx almost 9 years
      Wrote my two points as a proper answer instead with a bit more elaboration.