Can we use wildcard within pandas dataframe
You can chain startswith
and endswith
masks or use contains
- ^
is for match start of string, .*
is for any string and $
for end:
mask = data['Safe'].str.startswith("CDS") & data['Safe'].str.endswith("DEFAULT-UNIX-ROOT")
Or regex:
mask = data['Safe'].str.contains("^CDS-.*DEFAULT-UNIX-ROOT$")
Sample:
data = pd.DataFrame({'Safe':['CDS-DEFAULT-UNIX-ROOT',
'CDS-NhjghOI-DEFAULT-UNIX-ROOT',
'CDS-NhjghOI-DEFAULT',
'ACDS-DEFAULT-UNIX-ROOT']})
print (data)
Safe
0 CDS-DEFAULT-UNIX-ROOT
1 CDS-NhjghOI-DEFAULT-UNIX-ROOT
2 CDS-NhjghOI-DEFAULT
3 ACDS-DEFAULT-UNIX-ROOT
mask = data['Safe'].str.contains("^CDS-.*DEFAULT-UNIX-ROOT$")
print (mask)
0 True
1 True
2 False
3 False
Name: Safe, dtype: bool
Related videos on Youtube
Karn Kumar
A *Nix (Unix,Linux distros) person by day and Data-Science dreamer by nights who loves sharing on his domain and believes in contributions where its possible, a deep Fan of DMR(Dennis Ritchie) who has given his all contribution and research, most of the computer industry and digital world just sitting on the masterpiece he left for everyone without any patent of his own! Message to downvoters: please first provide comment and start a discussion. If the person in question does not solve the problem, then downvote. By the way i don't downvote.
Updated on June 04, 2022Comments
-
Karn Kumar over 1 year
I have below code thats working but it throws some
UserWarning
while printing the data..import pandas as pd pd.set_option('display.height', None) pd.set_option('display.max_rows', None) pd.set_option('display.max_columns', None) pd.set_option('display.width', None) pd.set_option('expand_frame_repr', True) data = pd.read_csv('/home/karn/plura/Test/Python_Pnada/Cyber_July.csv', usecols=['Platform ID', 'Safe', 'Target system address', 'Failure reason']) hostData = data[data['Platform ID'].str.startswith("CS-Unix-")][data['Safe'].str.startswith("CS-NOI-DEFAULT-UNIX-ROOT")] [['Platform ID', 'Safe', 'Target system address','Failure reason']] hostData.reset_index(level=0, drop=True) print(hostData)
Below is the UserWarning ..
./CyberCSV.py:12: UserWarning: Boolean Series key will be reindexed to match DataFrame index. hostData = data[data['Platform ID'].str.startswith("CS-Unix-")][data['Safe'].str.startswith("CS-NOI-DEFAULT-UNIX-ROOT")] [['Platform ID', 'Safe', 'Target system address','Failure reason']]
Secondly, Is there a way to use wildcard within dataframe like i have
data['Safe'].str.startswith("CDS-NOI-DEFAULT-UNIX-ROOT")
where i want to usedata['Safe'].str.startswith("CDS-*DEFAULT-UNIX-ROOT")
is this possible.
-
Karn Kumar over 5 yearslet me try , nice solution.