pandas - convert string into list of strings

57,287

Solution 1

You can split the string manually:

>>> df['Tags'] = df.Tags.apply(lambda x: x[1:-1].split(','))
>>> df.Tags[0]
['Tag1', 'Tag2']

Solution 2

Or

df.Tags=df.Tags.str[1:-1].str.split(',').tolist()

Solution 3

I think you could use the json module.

import json
import pandas

df = pd.read_csv('file.csv', sep='|')
df['Tags'] = df['Tags'].apply(lambda x: json.loads(x))

So this will load your dataframe as before, then apply a lambda function to each of the items in the Tags column. The lambda function calls json.loads() which converts the string representation of the list to an actual list.

Solution 4

You could use the inbuilt ast.literal_eval, it works for tuples as well as lists

import ast
import pandas as pd

df = pd.DataFrame({"mytuples": ["(1,2,3)"]})

print(df.iloc[0,0])
# >> '(1,2,3)'

df["mytuples"] = df["mytuples"].apply(ast.literal_eval)

print(df.iloc[0,0])
# >> (1,2,3)

EDIT: eval should be avoided! If the the string being evaluated is os.system(‘rm -rf /’) it will start deleting all the files on your computer (here). For ast.literal_eval the string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, and None (here). Thanks @TrentonMcKinney :)

Solution 5

You can convert the string to a list using strip and split.

df_out = df.assign(Tags=df.Tags.str.strip('[]').str.split(','))

df_out.Tags[0][0]

Output:

'Tag1'
Share:
57,287

Related videos on Youtube

Fabio Lamanna
Author by

Fabio Lamanna

I am a freelance civil engineer working on transportation networks, urban mobility and data analysis. I work with data coming from urban mobility, cities, social networks and transportation systems. I play with data organizing the DataBeers and PyData event in Venezia.

Updated on October 01, 2021

Comments

  • Fabio Lamanna
    Fabio Lamanna over 2 years

    I have this 'file.csv' file to read with pandas:

    Title|Tags
    T1|"[Tag1,Tag2]"
    T1|"[Tag1,Tag2,Tag3]"
    T2|"[Tag3,Tag1]"
    

    using

    df = pd.read_csv('file.csv', sep='|')
    

    the output is:

      Title              Tags
    0    T1       [Tag1,Tag2]
    1    T1  [Tag1,Tag2,Tag3]
    2    T2       [Tag3,Tag1]
    

    I know that the column Tags is a full string, since:

    In [64]: df['Tags'][0][0]
    Out[64]: '['
    

    I need to read it as a list of strings like ["Tag1","Tag2"]. I tried the solution provided in this question but no luck there, since I have the [ and ] characters that actually mess up the things.

    The expecting output should be:

    In [64]: df['Tags'][0][0]
    Out[64]: 'Tag1'
    
  • Jon Clements
    Jon Clements almost 7 years
    Or apply it on load...df = pd.read_csv('file.csv', sep='|', converters={'Tags': lambda x: x[1:-1].split(',')})
  • Brendan
    Brendan about 6 years
    @JonClements, converters={'Tags': lambda x: x[1:-1].split(',')} just saved me so much headache. Thanks for this.
  • zerohedge
    zerohedge almost 5 years
    @WeNToBen - nice solution. Care to expand on it a little bit? why do we need str[1:-1], why is it not str[0:-1]? (for me both yield the same result by the way). Also, if split() already creates a list, why do we explicitly call tolist()?
  • BENY
    BENY almost 5 years
    @zerohedge cause you want to remove the "[" at the beginning and "]" at the end
  • zerohedge
    zerohedge almost 5 years
    thanks. and why tolist() after split() (which itself creates a list, no?)
  • BENY
    BENY almost 5 years
    @zerohedge ah , that one I need to remove ,you are right
  • Yevhen Kuzmovych
    Yevhen Kuzmovych over 4 years
    I think this is a better solution, less prone to errors! Also, note that you can pass json.loads directly as an apply parameter: df['Tags'].apply(json.loads)
  • nicanz
    nicanz almost 3 years
    It does not remove the square brackets at the beginning and at the end