Determining when a column value changes in pandas dataframe
39,943
Solution 1
You can create a new column for the difference
> df['C'] = df['B'].diff()
> print df
# A B C
0 1 2 3 NaN
1 2 3 3 0
2 3 4 4 1
3 4 5 4 0
4 5 5 4 0
> df_filtered = df[df['C'] != 0]
> print df_filtered
# A B C
2 3 4 4 1
This will your required rows
Solution 2
You can do the following which also works for non numerical values:
>>> import pandas as pd
>>> df = pd.DataFrame({"Status": ["A","A","B","B","C","C","C"]})
>>> df["isStatusChanged"] = df["Status"].shift(1, fill_value=df["Status"].head(1)) != df["Status"]
>>> df
Status isStatusChanged
0 A False
1 A False
2 B True
3 B False
4 C True
5 C False
6 C False
>>>
Note the fill_value
could be different depending on your application.
Solution 3
you can use this it is much faster, hope it helps!!
my_column_changes = df["MyStringColumn"].shift() != df["MyStringColumn"]
Author by
badrobit
Updated on July 09, 2022Comments
-
badrobit almost 2 years
I am looking to write a quick script that will run through a csv file with two columns and provide me the rows in which the values in column B switch from one value to another:
eg:
dataframe:
# | A | B --+-----+----- 1 | 2 | 3 2 | 3 | 3 3 | 4 | 4 4 | 5 | 4 5 | 5 | 4
would tell me that the change happened between row 2 and row 3. I know how to get these values using for loops but I was hoping there was a more pythonic way of approaching this problem.
-
amc almost 7 yearsHow can you do this with differences for strings? It seems that this diff() only works for numbers. I could convert all the strings to numbers...
-
Kathirmani Sukumar about 4 yearsIf you have numbers which are embedded in strings, you could use something like
df['A'].astype(float).diff()
-
Eamonn Kenny over 3 yearsThis does not work for large strings such as "NL5358383". It can't be converted to a float.
-
Jinhua Wang over 3 years@amc you can map the string to int first.