pandas - convert string into list of strings
Solution 1
You can split the string manually:
>>> df['Tags'] = df.Tags.apply(lambda x: x[1:-1].split(','))
>>> df.Tags[0]
['Tag1', 'Tag2']
Solution 2
Or
df.Tags=df.Tags.str[1:-1].str.split(',').tolist()
Solution 3
I think you could use the json module.
import json
import pandas
df = pd.read_csv('file.csv', sep='|')
df['Tags'] = df['Tags'].apply(lambda x: json.loads(x))
So this will load your dataframe as before, then apply a lambda function to each of the items in the Tags
column. The lambda function calls json.loads()
which converts the string representation of the list to an actual list.
Solution 4
You could use the inbuilt ast.literal_eval
, it works for tuples as well as lists
import ast
import pandas as pd
df = pd.DataFrame({"mytuples": ["(1,2,3)"]})
print(df.iloc[0,0])
# >> '(1,2,3)'
df["mytuples"] = df["mytuples"].apply(ast.literal_eval)
print(df.iloc[0,0])
# >> (1,2,3)
EDIT: eval
should be avoided! If the the string being evaluated is os.system(‘rm -rf /’)
it will start deleting all the files on your computer (here). For ast.literal_eval
the string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, and None (here). Thanks @TrentonMcKinney :)
Solution 5
You can convert the string to a list using strip
and split
.
df_out = df.assign(Tags=df.Tags.str.strip('[]').str.split(','))
df_out.Tags[0][0]
Output:
'Tag1'
Related videos on Youtube
Fabio Lamanna
I am a freelance civil engineer working on transportation networks, urban mobility and data analysis. I work with data coming from urban mobility, cities, social networks and transportation systems. I play with data organizing the DataBeers and PyData event in Venezia.
Updated on October 01, 2021Comments
-
Fabio Lamanna over 2 years
I have this 'file.csv' file to read with pandas:
Title|Tags T1|"[Tag1,Tag2]" T1|"[Tag1,Tag2,Tag3]" T2|"[Tag3,Tag1]"
using
df = pd.read_csv('file.csv', sep='|')
the output is:
Title Tags 0 T1 [Tag1,Tag2] 1 T1 [Tag1,Tag2,Tag3] 2 T2 [Tag3,Tag1]
I know that the column
Tags
is a full string, since:In [64]: df['Tags'][0][0] Out[64]: '['
I need to read it as a list of strings like
["Tag1","Tag2"]
. I tried the solution provided in this question but no luck there, since I have the[
and]
characters that actually mess up the things.The expecting output should be:
In [64]: df['Tags'][0][0] Out[64]: 'Tag1'
-
Ahmed almost 7 yearsI asked a question similar to this before, you can see the answers here: stackoverflow.com/questions/44529483/…
-
-
Jon Clements almost 7 yearsOr apply it on load...
df = pd.read_csv('file.csv', sep='|', converters={'Tags': lambda x: x[1:-1].split(',')})
-
Brendan about 6 years@JonClements,
converters={'Tags': lambda x: x[1:-1].split(',')}
just saved me so much headache. Thanks for this. -
zerohedge almost 5 years@WeNToBen - nice solution. Care to expand on it a little bit? why do we need
str[1:-1]
, why is it notstr[0:-1]
? (for me both yield the same result by the way). Also, ifsplit()
already creates a list, why do we explicitly calltolist()
? -
BENY almost 5 years@zerohedge cause you want to remove the "[" at the beginning and "]" at the end
-
zerohedge almost 5 yearsthanks. and why
tolist()
aftersplit()
(which itself creates a list, no?) -
BENY almost 5 years@zerohedge ah , that one I need to remove ,you are right
-
Yevhen Kuzmovych over 4 yearsI think this is a better solution, less prone to errors! Also, note that you can pass
json.loads
directly as anapply
parameter:df['Tags'].apply(json.loads)
-
nicanz almost 3 yearsIt does not remove the square brackets at the beginning and at the end