Precision lost while using read_csv in pandas
It is only display problem, see docs:
#temporaly set display precision
with pd.option_context('display.precision', 10):
print df
0 1 2 3 4 5 6 7 \
0 895 2015-4-23 19 10000 LA 0.4677978806 0.477346934 0.4089938425
8 9 10 11 12
0 0.8224291972 0.8652525793 0.682994286 0.5139162227 NaN
EDIT: (Thank you Mark Dickinson):
Pandas uses a dedicated decimal-to-binary converter that sacrifices perfect accuracy for the sake of speed. Passing
float_precision='round_trip'
to read_csv fixes this. See the documentation for more.
Related videos on Youtube
jezrael
Please don't hesitate to contact me: import pandas as pd d = {0: [1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0], 1: [16, 1, 0, 15, 7, 14, 13, 7, 3, 10, 14, 8, 10, 3, 5, 4, 8, 13, 6, 2, 11, 11, 0, 12, 9, 16, 12, 15, 9, 1, 4, 5, 6, 2], 2: [32, 32, 32, 32, 32, 33, 46, 64, 97, 97, 99, 99, 100, 101, 101, 102, 103, 103, 103, 104, 105, 105, 106, 108, 109, 109, 110, 111, 111, 111, 112, 112, 121, 122]} print (pd.DataFrame(d).pivot(1,0,2).applymap(chr).agg(''.join)) Most time repeating dupes, not easy find: pivot dupe https://stackoverflow.com/q/47152691/ booelan indexing dupe https://stackoverflow.com/q/17071871 idxmax + groupby dupe https://stackoverflow.com/q/15705630 idxmin + groupby dupe https://stackoverflow.com/q/23394476 melt dupe https://stackoverflow.com/q/28654047 explode dupe https://stackoverflow.com/q/12680754 cumcount dupe https://stackoverflow.com/q/23435270 map dupe https://stackoverflow.com/q/24216425 groupby+size+unstack dupe https://stackoverflow.com/q/39132742 https://stackoverflow.com/q/38278603 sorting inplace dupe https://stackoverflow.com/q/42613581 factorize dupe https://stackoverflow.com/q/39357882 groupby+size dupe https://stackoverflow.com/q/19384532 groupby+ mean dupe https://stackoverflow.com/q/30482071 transform sum dupe https://stackoverflow.com/q/30244952 transform size dupe https://stackoverflow.com/q/37189878 keyerror dupe https://stackoverflow.com/q/43736163 merge/map dupe https://stackoverflow.com/q/53010406 value_count dupe https://stackoverflow.com/q/15411158 numpy select, where dupe https://stackoverflow.com/q/19913659 wide_to_long dupe https://stackoverflow.com/q/55766565 reset_index dupe https://stackoverflow.com/q/36932759
Updated on June 11, 2022Comments
-
jezrael almost 2 years
I have files of the below format in a text file which I am trying to read into a pandas dataframe.
895|2015-4-23|19|10000|LA|0.4677978806|0.4773469340|0.4089938425|0.8224291972|0.8652525793|0.6829942860|0.5139162227|
As you can see there are 10 integers after the floating point in the input file.
df = pd.read_csv('mockup.txt',header=None,delimiter='|')
When I try to read it into dataframe, I am not getting the last 4 integers
df[5].head() 0 0.467798 1 0.258165 2 0.860384 3 0.803388 4 0.249820 Name: 5, dtype: float64
How can I get the complete precision as present in the input file? I have some matrix operations that needs to be performed so i cannot cast it as string.
I figured out that I have to do something about
dtype
but I am not sure where I should use it. -
Admin about 8 yearsThanks. Had one other rookie question. Is there any recommendation in general for faster loading into data frame while using read_csv() when data is mostly floating point values.
-
jezrael about 8 yearsI think you can try set
dtypes
, see. -
Mark Dickinson about 8 yearsIt may be worth noting that this isn't purely a display problem, in the sense that if you use Pandas to write out a dataframe to a CSV file and then read it back in again, you can end up with small floating-point errors in the result: Pandas uses a dedicated decimal-to-binary converter that sacrifices perfect accuracy for the sake of speed. Passing
float_precision='round_trip'
toread_csv
fixes this. See the documentation for more. -
jezrael about 8 years@Mark Dickinson Thank you very much for comment, I add it to answer.
-
foxwendy about 7 years@MarkDickinson my notebook kernel dies once I set float_precision='round_trip'
-
cozyss almost 6 yearsThis solves my issue! For some reason
float_precision=high
doesn't work butfloat_precision=round_trip
works