pandas - convert string into list of strings

python string pandas csv

57,287

Solution 1

You can split the string manually:

>>> df['Tags'] = df.Tags.apply(lambda x: x[1:-1].split(','))
>>> df.Tags[0]
['Tag1', 'Tag2']

Solution 2

df.Tags=df.Tags.str[1:-1].str.split(',').tolist()

Solution 3

I think you could use the json module.

import json
import pandas

df = pd.read_csv('file.csv', sep='|')
df['Tags'] = df['Tags'].apply(lambda x: json.loads(x))

So this will load your dataframe as before, then apply a lambda function to each of the items in the Tags column. The lambda function calls json.loads() which converts the string representation of the list to an actual list.

Solution 4

You could use the inbuilt ast.literal_eval, it works for tuples as well as lists

import ast
import pandas as pd

df = pd.DataFrame({"mytuples": ["(1,2,3)"]})

print(df.iloc[0,0])
# >> '(1,2,3)'

df["mytuples"] = df["mytuples"].apply(ast.literal_eval)

print(df.iloc[0,0])
# >> (1,2,3)

EDIT: eval should be avoided! If the the string being evaluated is os.system(‘rm -rf /’) it will start deleting all the files on your computer (here). For ast.literal_eval the string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, and None (here). Thanks @TrentonMcKinney :)

Solution 5

You can convert the string to a list using strip and split.

df_out = df.assign(Tags=df.Tags.str.strip('[]').str.split(','))

df_out.Tags[0][0]

Output:

'Tag1'

View more solutions

57,287

Fabio Lamanna

I am a freelance civil engineer working on transportation networks, urban mobility and data analysis. I work with data coming from urban mobility, cities, social networks and transportation systems. I play with data organizing the DataBeers and PyData event in Venezia.

Updated on October 01, 2021

Comments

Fabio Lamanna over 2 years
I have this 'file.csv' file to read with pandas:
```
Title|Tags
T1|"[Tag1,Tag2]"
T1|"[Tag1,Tag2,Tag3]"
T2|"[Tag3,Tag1]"
```
using
```
df = pd.read_csv('file.csv', sep='|')
```
the output is:
```
  Title              Tags
0    T1       [Tag1,Tag2]
1    T1  [Tag1,Tag2,Tag3]
2    T2       [Tag3,Tag1]
```
I know that the column Tags is a full string, since:
```
In [64]: df['Tags'][0][0]
Out[64]: '['
```
I need to read it as a list of strings like ["Tag1","Tag2"]. I tried the solution provided in this question but no luck there, since I have the [ and ] characters that actually mess up the things.

The expecting output should be:
```
In [64]: df['Tags'][0][0]
Out[64]: 'Tag1'
```
- Ahmed almost 7 years
  
  I asked a question similar to this before, you can see the answers here: stackoverflow.com/questions/44529483/…
Jon Clements almost 7 years

Or apply it on load...df = pd.read_csv('file.csv', sep='|', converters={'Tags': lambda x: x[1:-1].split(',')})
Brendan about 6 years

@JonClements, converters={'Tags': lambda x: x[1:-1].split(',')} just saved me so much headache. Thanks for this.
zerohedge almost 5 years

@WeNToBen - nice solution. Care to expand on it a little bit? why do we need str[1:-1], why is it not str[0:-1]? (for me both yield the same result by the way). Also, if split() already creates a list, why do we explicitly call tolist()?
BENY almost 5 years

@zerohedge cause you want to remove the "[" at the beginning and "]" at the end
zerohedge almost 5 years

thanks. and why tolist() after split() (which itself creates a list, no?)
BENY almost 5 years

@zerohedge ah , that one I need to remove ,you are right
Yevhen Kuzmovych over 4 years

I think this is a better solution, less prone to errors! Also, note that you can pass json.loads directly as an apply parameter: df['Tags'].apply(json.loads)
nicanz almost 3 years

It does not remove the square brackets at the beginning and at the end