Can we use wildcard within pandas dataframe

15,521

You can chain startswith and endswith masks or use contains - ^ is for match start of string, .* is for any string and $ for end:

mask = data['Safe'].str.startswith("CDS") & data['Safe'].str.endswith("DEFAULT-UNIX-ROOT")

Or regex:

mask = data['Safe'].str.contains("^CDS-.*DEFAULT-UNIX-ROOT$")

Sample:

data = pd.DataFrame({'Safe':['CDS-DEFAULT-UNIX-ROOT',
                             'CDS-NhjghOI-DEFAULT-UNIX-ROOT',
                             'CDS-NhjghOI-DEFAULT',
                             'ACDS-DEFAULT-UNIX-ROOT']})

print (data)
                            Safe
0          CDS-DEFAULT-UNIX-ROOT
1  CDS-NhjghOI-DEFAULT-UNIX-ROOT
2            CDS-NhjghOI-DEFAULT
3         ACDS-DEFAULT-UNIX-ROOT

mask = data['Safe'].str.contains("^CDS-.*DEFAULT-UNIX-ROOT$")
print (mask)
0     True
1     True
2    False
3    False
Name: Safe, dtype: bool
Share:
15,521

Related videos on Youtube

Karn Kumar
Author by

Karn Kumar

A *Nix (Unix,Linux distros) person by day and Data-Science dreamer by nights who loves sharing on his domain and believes in contributions where its possible, a deep Fan of DMR(Dennis Ritchie) who has given his all contribution and research, most of the computer industry and digital world just sitting on the masterpiece he left for everyone without any patent of his own! Message to downvoters: please first provide comment and start a discussion. If the person in question does not solve the problem, then downvote. By the way i don't downvote.

Updated on June 04, 2022

Comments

  • Karn Kumar
    Karn Kumar over 1 year

    I have below code thats working but it throws some UserWarning while printing the data..

    import pandas as pd
    
    pd.set_option('display.height', None)
    pd.set_option('display.max_rows', None)
    pd.set_option('display.max_columns', None)
    pd.set_option('display.width', None)
    pd.set_option('expand_frame_repr', True)
    
    data = pd.read_csv('/home/karn/plura/Test/Python_Pnada/Cyber_July.csv', usecols=['Platform ID', 'Safe', 'Target system address', 'Failure reason'])
    hostData = data[data['Platform ID'].str.startswith("CS-Unix-")][data['Safe'].str.startswith("CS-NOI-DEFAULT-UNIX-ROOT")] [['Platform ID', 'Safe', 'Target system address','Failure reason']]
    hostData.reset_index(level=0, drop=True)
    print(hostData)
    

    Below is the UserWarning ..

    ./CyberCSV.py:12: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
      hostData  = data[data['Platform ID'].str.startswith("CS-Unix-")][data['Safe'].str.startswith("CS-NOI-DEFAULT-UNIX-ROOT")] [['Platform ID', 'Safe', 'Target system address','Failure reason']]
    

    Secondly, Is there a way to use wildcard within dataframe like i have

    data['Safe'].str.startswith("CDS-NOI-DEFAULT-UNIX-ROOT") where i want to use data['Safe'].str.startswith("CDS-*DEFAULT-UNIX-ROOT")

    is this possible.

  • Karn Kumar
    Karn Kumar over 5 years
    let me try , nice solution.