Convert commas decimal separators to dots within a Dataframe
Solution 1
pandas.read_csv
has a decimal
parameter for this: doc
I.e. try with:
df = pd.read_csv(Input, delimiter=";", decimal=",")
Solution 2
I think the earlier mentioned answer of including decimal=","
in pandas read_csv is the preferred option.
However, I found it is incompatible with the Python parsing engine. e.g. when using skiprow=
, read_csv will fall back to this engine and thus you can't use skiprow=
and decimal=
in the same read_csv statement as far as I know. Also, I haven't been able to actually get the decimal=
statement to work (probably due to me though)
The long way round I used to achieving the same result is with list comprehensions, .replace
and .astype
. The major downside to this method is that it needs to be done one column at a time:
df = pd.DataFrame({'a': ['120,00', '42,00', '18,00', '23,00'],
'b': ['51,23', '18,45', '28,90', '133,00']})
df['a'] = [x.replace(',', '.') for x in df['a']]
df['a'] = df['a'].astype(float)
Now, column a will have float type cells. Column b still contains strings.
Note that the .replace
used here is not pandas' but rather Python's built-in version. Pandas' version requires the string to be an exact match or a regex.
Solution 3
stallasia's answer looks like the best one.
However, if you want to change the separator when you already have a dataframe, you could do :
df['a'] = df['a'].str.replace(',', '.').astype(float)
Solution 4
I answer to the question about how to change the decimal comma
to the decimal dot
with Python Pandas.
$ cat test.py
import pandas as pd
df = pd.read_csv("test.csv", quotechar='"', decimal=",")
df.to_csv("test2.csv", sep=',', encoding='utf-8', quotechar='"', decimal='.')
where we specify the reading in decimal separator as comma while the output separator is specified as dot. So
$ cat test.csv
header,header2
1,"2,1"
3,"4,0"
$ cat test2.csv
,header,header2
0,1,2.1
1,3,4.0
where you see that the separator has changed to dot.
Solution 5
Thanks for the great answers. I just want to add that in my case just using decimal=','
did not work because I had numbers like 1.450,00 (with thousands separator), therefore pandas did not recognize it, but passing thousands='.'
helped to read the file correctly:
df = pd.read_csv(
Input,
delimiter=";",
decimal=","
thousands="."
)
Nautilius
Updated on July 05, 2022Comments
-
Nautilius almost 2 years
I am importing a CSV file like the one below, using
pandas.read_csv
:df = pd.read_csv(Input, delimiter=";")
Example of CSV file:
10;01.02.2015 16:58;01.02.2015 16:58;-0.59;0.1;-4.39;NotApplicable;0.79;0.2 11;01.02.2015 16:58;01.02.2015 16:58;-0.57;0.2;-2.87;NotApplicable;0.79;0.21
The problem is that when I later on in my code try to use these values I get this error:
TypeError: can't multiply sequence by non-int of type 'float'
The error is because the number I'm trying to use is not written with a dot (
.
) as a decimal separator but a comma(,
). After manually changing the commas to a dots my program works.I can't change the format of my input, and thus have to replace the commas in my DataFrame in order for my code to work, and I want python to do this without the need of doing it manually. Do you have any suggestions?
-
Daddy32 over 3 yearsThis workaround is also needed for reading excel files, as
pd.read_excel
doesn't have thedecimal
parameter. -
nachouve over 2 yearsIt can be useful the function
pandas.to_numeric
docs to_numeric where you can control errors on the conversion instead ofastype(float)
. -
Register Sole over 2 yearsThis does not provide an answer to the question. Once you have sufficient reputation you will be able to comment on any post; instead, provide answers that don't require clarification from the asker. - From Review