pandas read_csv column dtype is set to decimal but converts to string

16,063

I think you need converters:

import pandas as pd
import io
import decimal as D

temp = u"""a,b,c,d
           1,1,1,1.0"""

# after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), 
                 dtype={'a': int, 'b': float}, 
                 converters={'c': D.Decimal, 'd': D.Decimal})

print (df)
       a    b  c    d
    0  1  1.0  1  1.0

for i, v in df.iterrows():
    print(type(v.a), type(v.b), type(v.c), type(v.d))

    <class 'int'> <class 'float'> <class 'decimal.Decimal'> <class 'decimal.Decimal'>
Share:
16,063

Related videos on Youtube

candleford
Author by

candleford

Updated on June 05, 2022

Comments

  • candleford
    candleford almost 2 years

    I am using pandas (v0.18.1) to import the following data from a file called 'test.csv':

    a,b,c,d
    1,1,1,1.0
    

    I have set the dtype to 'decimal.Decimal' for columns 'c' and 'd' but instead they return as type 'str'.

    import pandas as pd
    import decimal as D
    
    df = pd.read_csv('test.csv', dtype={'a': int, 'b': float, 'c': D.Decimal, 'd': D.Decimal})
    
    for i, v in df.iterrows():
        print(type(v.a), type(v.b), type(v.c), type(v.d))
    

    Results:

    `<class 'int'> <class 'float'> <class 'str'> <class 'str'>`
    

    I have also tried converting to decimal explicitly after import with no luck (converting to float works but not decimal).

    df.c = df.c.astype(float)
    df.d = df.d.astype(D.Decimal)
    for i, v in df.iterrows():
        print(type(v.a), type(v.b), type(v.c), type(v.d))
    

    Results:

    `<class 'int'> <class 'float'> <class 'float'> <class 'str'>`
    

    The following code converts a 'str' to 'decimal.Decimal' so I don't understand why pandas doesn't behave the same way.

    x = D.Decimal('1.0')
    print(type(x))
    

    Results:

    `<class 'decimal.Decimal'>`
    
  • Jan Christoph Terasa
    Jan Christoph Terasa almost 8 years
    The pandas documentation is hilariously unspecific about what a dtype is, but since I assume the implementation in pandas is based on numpy, we luckily have numpy docs. Do keep in mind that using generic objects can be more inefficient performance- and memory-wise than using basic int and float.