Split Pandas Series into DataFrame by delimiter
Solution 1
You can use str.split
:
df = SR_test.str.split('; ', expand=True)
print df
0 1 2 3 4
0 a b c d e
1 aa bb cc dd ee
2 a1 b2 c3 d4 e5
Another faster solution, if Series
have no NaN
values:
print pd.DataFrame([ x.split('; ') for x in SR_test.tolist() ])
0 1 2 3 4
0 a b c d e
1 aa bb cc dd ee
2 a1 b2 c3 d4 e5
Timings:
SR_test = pd.concat([SR_test]*1000).reset_index(drop=True)
In [21]: %timeit SR_test.str.split('; ', expand=True)
10 loops, best of 3: 34.5 ms per loop
In [22]: %timeit pd.DataFrame([ x.split('; ') for x in SR_test.tolist() ])
100 loops, best of 3: 9.59 ms per loop
Solution 2
Use the vectorised str.split
with param expand=True
and pass as the data arg to the DataFrame
ctor:
In [4]:
df = pd.DataFrame(SR_test.str.split(';',expand=True))
df
Out[4]:
0 1 2 3 4
0 a b c d e
1 aa bb cc dd ee
2 a1 b2 c3 d4 e5
O.rka
I am an academic researcher studying machine-learning and microorganisms
Updated on July 31, 2022Comments
-
O.rka almost 2 years
I'm trying to split a
pandas
series
object by a particular delimiter"; "
in this case. I want to turn it into adataframe
there will always be the same amount of "columns" or to be more exact, same amount of"; "
that will indicate columns. I thought this would do the trick but it didnt python, how to convert a pandas series into a pandas DataFrame? I dont want to iterate through, I'm surepandas
has made a shortcut that's more effective.Does anyone know of the most efficient way to split this series into a dataframe by
"; "
?#Example Data SR_test = pd.Series(["a; b; c; d; e","aa; bb; cc; dd; ee","a1; b2; c3; d4; e5"]) # print(SR_test) # 0 a; b; c; d; e # 1 aa; bb; cc; dd; ee # 2 a1; b2; c3; d4; e5 #Convert each row one at a time (not efficient) tmp = [] for element in SR_test: tmp.append([e.strip() for e in element.split("; ")]) DF_split = pd.DataFrame(tmp) # print(DF_split) # 0 1 2 3 4 # 0 a b c d e # 1 aa bb cc dd ee # 2 a1 b2 c3 d4 e5
-
O.rka almost 8 yearsIterating through is quicker?
-
jezrael almost 8 yearsYes, if If use this way. str.split is a bit slower, because it works with NaN values very nice too.