Pandas add new columns based on splitting another column

python pandas dataframe split multiple-columns

10,302

Solution 1

You can use split with parameter expand=True and add one [] to left side:

df[['country','code','com']] = df.A.str.split(',', expand=True)

Then replace , to .:

df.A = df.A.str.replace(',','.')

print (df)
              A     B country code     com
0  US.65.AMAZON  2016      US   65  AMAZON
1    US.65.EBAY  2016      US   65    EBAY

Another solution with DataFrame constructor if there are no NaN values:

df[['country','code','com']] = pd.DataFrame([ x.split(',') for x in df['A'].tolist() ])
df.A = df.A.str.replace(',','.')
print (df)
              A     B country code     com
0  US.65.AMAZON  2016      US   65  AMAZON
1    US.65.EBAY  2016      US   65    EBAY

Also you can use column names in constructor, but then concat is necessary:

df1=pd.DataFrame([x.split(',') for x in df['A'].tolist()],columns= ['country','code','com'])
df.A = df.A.str.replace(',','.')
df = pd.concat([df, df1], axis=1)
print (df)
              A     B country code     com
0  US.65.AMAZON  2016      US   65  AMAZON
1    US.65.EBAY  2016      US   65    EBAY

Solution 2

For getting the new columns I would prefer doing it as following:

df['Country'] = df['A'].apply(lambda x: x[0])
df['Code'] = df['A'].apply(lambda x: x[1])
df['Com'] = df['A'].apply(lambda x: x[2])

As for the replacement of , with a . you can use the following:

df['A'] = df['A'].str.replace(',','.')

Solution 3

This will not give the output as expected it will only give the df['A'] first value which is 'U'

This is okay to create column based on provided data df1=pd.DataFrame([x.split(',') for x in df['A'].tolist()],columns= ['country','code','com'])

instead of for lambda also can be use

10,302

Author by

dagg3r

PhD student interested in Data Science using python and spark.

Updated on June 18, 2022

Comments

dagg3r almost 2 years

I have a pandas dataframe like the following:

A              B
US,65,AMAZON   2016
US,65,EBAY     2016

My goal is to get to look like this:

A              B      country    code    com
US.65.AMAZON   2016   US         65      AMAZON
US.65.AMAZON   2016   US         65      EBAY

I know this question has been asked before here and here but none of them works for me. I have tried:

df['country','code','com'] = df.Field.str.split('.')

and

df2 = pd.DataFrame(df.Field.str.split('.').tolist(),columns = ['country','code','com','A','B'])

Am I missing something? Any help is much appreciated.

Recents

Why Is PNG file with Drop Shadow in Flutter Web App Grainy?

How to troubleshoot crashes detected by Google Play Store for Flutter app

Cupertino DateTime picker interfering with scroll behaviour

Why does awk -F work for most letters, but not for the letter "t"?

Flutter change focus color and icon color but not works

How to print and connect to printer using flutter desktop via usb?

Critical issues have been reported with the following SDK versions: com.google.android.gms:play-services-safetynet:17.0.0

Flutter Dart - get localized country name from country code

navigatorState is null when using pushNamed Navigation onGenerateRoutes of GetMaterialPage

Android Sdk manager not found- Flutter doctor error

Flutter Laravel Push Notification without using any third party like(firebase,onesignal..etc)

How to change the color of ElevatedButton when entering text in TextField

Divide two dataframes with python

Split Pandas Series into DataFrame by delimiter

how to split 'number' to separate columns in pandas DataFrame

Splitting multiple columns into rows in pandas dataframe

Masking multiple columns on a pandas dataframe in python

Pandas Dataframe sort by a column

How to select and delete columns with duplicate name in pandas DataFrame

Pandas split column into multiple columns by comma

Split pandas dataframe in two if it has more than 10 rows

Splitting dataframe into multiple dataframes