Precision lost while using read_csv in pandas

19,159

It is only display problem, see docs:

#temporaly set display precision
with pd.option_context('display.precision', 10):
    print df

     0          1   2      3   4             5            6             7   \
0  895  2015-4-23  19  10000  LA  0.4677978806  0.477346934  0.4089938425   

             8             9            10            11  12  
0  0.8224291972  0.8652525793  0.682994286  0.5139162227 NaN    

EDIT: (Thank you Mark Dickinson):

Pandas uses a dedicated decimal-to-binary converter that sacrifices perfect accuracy for the sake of speed. Passing float_precision='round_trip' to read_csv fixes this. See the documentation for more.

Share:
19,159

Related videos on Youtube

jezrael
Author by

jezrael

Please don't hesitate to contact me: import pandas as pd d = {0: [1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0], 1: [16, 1, 0, 15, 7, 14, 13, 7, 3, 10, 14, 8, 10, 3, 5, 4, 8, 13, 6, 2, 11, 11, 0, 12, 9, 16, 12, 15, 9, 1, 4, 5, 6, 2], 2: [32, 32, 32, 32, 32, 33, 46, 64, 97, 97, 99, 99, 100, 101, 101, 102, 103, 103, 103, 104, 105, 105, 106, 108, 109, 109, 110, 111, 111, 111, 112, 112, 121, 122]} print (pd.DataFrame(d).pivot(1,0,2).applymap(chr).agg(''.join)) Most time repeating dupes, not easy find: pivot dupe https://stackoverflow.com/q/47152691/ booelan indexing dupe https://stackoverflow.com/q/17071871 idxmax + groupby dupe https://stackoverflow.com/q/15705630 idxmin + groupby dupe https://stackoverflow.com/q/23394476 melt dupe https://stackoverflow.com/q/28654047 explode dupe https://stackoverflow.com/q/12680754 cumcount dupe https://stackoverflow.com/q/23435270 map dupe https://stackoverflow.com/q/24216425 groupby+size+unstack dupe https://stackoverflow.com/q/39132742 https://stackoverflow.com/q/38278603 sorting inplace dupe https://stackoverflow.com/q/42613581 factorize dupe https://stackoverflow.com/q/39357882 groupby+size dupe https://stackoverflow.com/q/19384532 groupby+ mean dupe https://stackoverflow.com/q/30482071 transform sum dupe https://stackoverflow.com/q/30244952 transform size dupe https://stackoverflow.com/q/37189878 keyerror dupe https://stackoverflow.com/q/43736163 merge/map dupe https://stackoverflow.com/q/53010406 value_count dupe https://stackoverflow.com/q/15411158 numpy select, where dupe https://stackoverflow.com/q/19913659 wide_to_long dupe https://stackoverflow.com/q/55766565 reset_index dupe https://stackoverflow.com/q/36932759

Updated on June 11, 2022

Comments

  • jezrael
    jezrael almost 2 years

    I have files of the below format in a text file which I am trying to read into a pandas dataframe.

    895|2015-4-23|19|10000|LA|0.4677978806|0.4773469340|0.4089938425|0.8224291972|0.8652525793|0.6829942860|0.5139162227|
    

    As you can see there are 10 integers after the floating point in the input file.

    df = pd.read_csv('mockup.txt',header=None,delimiter='|')
    

    When I try to read it into dataframe, I am not getting the last 4 integers

    df[5].head()
    
    0    0.467798
    1    0.258165
    2    0.860384
    3    0.803388
    4    0.249820
    Name: 5, dtype: float64
    

    How can I get the complete precision as present in the input file? I have some matrix operations that needs to be performed so i cannot cast it as string.

    I figured out that I have to do something about dtype but I am not sure where I should use it.

  • Admin
    Admin about 8 years
    Thanks. Had one other rookie question. Is there any recommendation in general for faster loading into data frame while using read_csv() when data is mostly floating point values.
  • jezrael
    jezrael about 8 years
    I think you can try set dtypes, see.
  • Mark Dickinson
    Mark Dickinson about 8 years
    It may be worth noting that this isn't purely a display problem, in the sense that if you use Pandas to write out a dataframe to a CSV file and then read it back in again, you can end up with small floating-point errors in the result: Pandas uses a dedicated decimal-to-binary converter that sacrifices perfect accuracy for the sake of speed. Passing float_precision='round_trip' to read_csv fixes this. See the documentation for more.
  • jezrael
    jezrael about 8 years
    @Mark Dickinson Thank you very much for comment, I add it to answer.
  • foxwendy
    foxwendy about 7 years
    @MarkDickinson my notebook kernel dies once I set float_precision='round_trip'
  • cozyss
    cozyss almost 6 years
    This solves my issue! For some reason float_precision=high doesn't work but float_precision=round_trip works