python pandas replacing strings in dataframe with numbers
Solution 1
What about DataFrame.replace
?
In [9]: mapping = {'set': 1, 'test': 2}
In [10]: df.replace({'set': mapping, 'tesst': mapping})
Out[10]:
Unnamed: 0 respondent brand engine country aware aware_2 aware_3 age \
0 0 a volvo p swe 1 0 1 23
1 1 b volvo None swe 0 0 1 45
2 2 c bmw p us 0 0 1 56
3 3 d bmw p us 0 1 1 43
4 4 e bmw d germany 1 0 1 34
5 5 f audi d germany 1 0 1 59
6 6 g volvo d swe 1 0 0 65
7 7 h audi d swe 1 0 0 78
8 8 i volvo d us 1 1 1 32
tesst set
0 2 1
1 1 2
2 2 1
3 1 2
4 2 1
5 1 2
6 2 1
7 1 2
8 2 1
As @Jeff pointed out in the comments, in pandas versions < 0.11.1, manually tack .convert_objects()
onto the end to properly convert tesst and set to int64
columns, in case that matters in subsequent operations.
Solution 2
I know this is old, but adding for those searching as I was. Create a dataframe in pandas, df in this code
ip_addresses = df.source_ip.unique()
ip_dict = dict(zip(ip_addresses, range(len(ip_addresses))))
That will give you a dictionary map of the ip addresses without having to write it out.
Solution 3
You can use the applymap
DataFrame function to do this:
In [26]: df = DataFrame({"A": [1,2,3,4,5], "B": ['a','b','c','d','e'],
"C": ['b','a','c','c','d'], "D": ['a','c',7,9,2]})
In [27]: df
Out[27]:
A B C D
0 1 a b a
1 2 b a c
2 3 c c 7
3 4 d c 9
4 5 e d 2
In [28]: mymap = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5}
In [29]: df.applymap(lambda s: mymap.get(s) if s in mymap else s)
Out[29]:
A B C D
0 1 1 2 1
1 2 2 1 3
2 3 3 3 7
3 4 4 3 9
4 5 5 4 2
Solution 4
The simplest way to replace any value in the dataframe:
df=df.replace(to_replace="set",value="1")
df=df.replace(to_replace="test",value="2")
Hope this will help.
Solution 5
To convert Strings like 'volvo','bmw' into integers first convert it to a dataframe then pass it to pandas.get_dummies()
df = DataFrame.from_csv("myFile.csv")
df_transform = pd.get_dummies( df )
print( df_transform )
Better alternative: passing a dictionary to map() of a pandas series (df.myCol) (by specifying the column brand for example)
df.brand = df.brand.map( {'volvo':0 , 'bmw':1, 'audi':2} )
jonas
Updated on July 09, 2022Comments
-
jonas almost 2 years
Is there any way to use the mapping function or something better to replace values in an entire dataframe?
I only know how to perform the mapping on series.
I would like to replace the strings in the 'tesst' and 'set' column with a number for example set = 1, test =2
Here is a example of my dataset: (Original dataset is very large)
ds_r respondent brand engine country aware aware_2 aware_3 age tesst set 0 a volvo p swe 1 0 1 23 set set 1 b volvo None swe 0 0 1 45 set set 2 c bmw p us 0 0 1 56 test test 3 d bmw p us 0 1 1 43 test test 4 e bmw d germany 1 0 1 34 set set 5 f audi d germany 1 0 1 59 set set 6 g volvo d swe 1 0 0 65 test set 7 h audi d swe 1 0 0 78 test set 8 i volvo d us 1 1 1 32 set set
Final result should be
ds_r respondent brand engine country aware aware_2 aware_3 age tesst set 0 a volvo p swe 1 0 1 23 1 1 1 b volvo None swe 0 0 1 45 1 1 2 c bmw p us 0 0 1 56 2 2 3 d bmw p us 0 1 1 43 2 2 4 e bmw d germany 1 0 1 34 1 1 5 f audi d germany 1 0 1 59 1 1 6 g volvo d swe 1 0 0 65 2 1 7 h audi d swe 1 0 0 78 2 1 8 i volvo d us 1 1 1 32 1 1
-
Jeff almost 11 yearsnote that you might want to do a
df.convert_objects()
after the replacement to coerce to proper dtypes -
Jeff almost 11 years@Dan Allan this will be default in 0.11.1, FYI (to convert_objects)
-
SRS almost 9 yearsI working on the problem like this and I just followed the exact steps mentioned in your answer. I am not getting the output. Code: wc = pd.read_csv('PATH', usecols = ['Workclass'])
-
SRS almost 9 yearsdf = pd.DataFrame(wc) end of line wcdict = {"?":0,"Federal-gov":1,"Local-gov":2,"Never-worked":3,"Private":4,"Self-emp-inc":5, "Self-emp-n-inc":6,"State-gov":7,"Without-pay":8} end of line df.applymap(lambda s: wcdict.get(s) if s in wcdict else s) end of line print(df)
-
bdiamante almost 9 years
df.applymap(lambda s: mymap.get(s) if s in mymap else s)
does not make inline changes to df, so yourprint df
statement will not reflect the results of the applymap. You need to do an assigment likedf2 = df.applymap(lambda s: mymap.get(s) if s in mymap else s)
.print df2
will now reflect the changes. -
SRS almost 9 yearsThat worked!! Thanks :) I have one more question, I need to work with pyspark rather than working with normal python. Does the implementation of this logic differs in pyspark? When I created a data frame, I gave the file path [as shown in above comments] but, I would like to give an RDD as the input to data frame. I couldn't do that. Do you have any idea about this?
-
bdiamante almost 9 yearsGlad it worked. I'm really not sure... perhaps this might be a start?
-
SRS almost 9 yearsThanks for your help :)
-
HerrIvan over 5 yearsin general, what is this category type for?
-
tsando over 5 years@HerrIvan there's plenty of documentation here pandas.pydata.org/pandas-docs/stable/categorical.html
-
Ishnark about 5 yearsThis is super old but you can also do this now:
df.replace(to_replace=['set', 'test'], value=[1, 2])
-
H S Rathore about 4 yearsI think we shouldn't ask to hardcode name of the values, It should be dynamically picked up at run time and assigned number.