Splitting a list in a Pandas cell into multiple columns
Solution 1
You can loop through the Series with apply()
function and convert each list to a Series
, this automatically expand the list as a series in the column direction:
df[0].apply(pd.Series)
# 0 1 2
#0 8 10 12
#1 7 9 11
Update: To keep other columns of the data frame, you can concatenate the result with the columns you want to keep:
pd.concat([df[0].apply(pd.Series), df[1]], axis = 1)
# 0 1 2 1
#0 8 10 12 A
#1 7 9 11 B
Solution 2
You could do pd.DataFrame(df[col].values.tolist())
- is much faster ~500x
In [820]: pd.DataFrame(df[0].values.tolist())
Out[820]:
0 1 2
0 8 10 12
1 7 9 11
In [821]: pd.concat([pd.DataFrame(df[0].values.tolist()), df[1]], axis=1)
Out[821]:
0 1 2 1
0 8 10 12 A
1 7 9 11 B
Timings
Medium
In [828]: df.shape
Out[828]: (20000, 2)
In [829]: %timeit pd.DataFrame(df[0].values.tolist())
100 loops, best of 3: 15 ms per loop
In [830]: %timeit df[0].apply(pd.Series)
1 loop, best of 3: 4.06 s per loop
Large
In [832]: df.shape
Out[832]: (200000, 2)
In [833]: %timeit pd.DataFrame(df[0].values.tolist())
10 loops, best of 3: 161 ms per loop
In [834]: %timeit df[0].apply(pd.Series)
1 loop, best of 3: 40.9 s per loop
user2242044
Updated on October 12, 2020Comments
-
user2242044 over 3 years
I have a really simple
Pandas
dataframe
where each cell contains a list. I'd like to split each element of the list into it's own column. I can do that by exporting the values and then creating a newdataframe
. This doesn't seem like a good way to do this especially, if mydataframe
had a column aside from the list column.import pandas as pd df = pd.DataFrame(data=[[[8,10,12]], [[7,9,11]]]) df = pd.DataFrame(data=[x[0] for x in df.values])
Desired output:
0 1 2 0 8 10 12 1 7 9 11
Follow-up based on @Psidom answer:
If I did have a second column:
df = pd.DataFrame(data=[[[8,10,12], 'A'], [[7,9,11], 'B']])
How do I not loose the other column?
Desired output:
0 1 2 3 0 8 10 12 A 1 7 9 11 B