Python - Use a Regex to Filter Data
Solution 1
import re
re.sub(pattern, '', s)
Solution 2
The answers so far have focused on doing the same thing as your Ruby code, which is exactly the reverse of what you're asking in the English part of your question: the code removes character that DO match, while your text asks for
a simple way to remove all characters from a given string that fail to match
For example, suppose your RE's pattern was r'\d{2,}'
, "two or more digits" -- so the non-matching parts would be all non-digits plus all single, isolated digits. Removing the NON-matching parts, as your text requires, is also easy:
>>> import re
>>> there = re.compile(r'\d{2,}')
>>> ''.join(there.findall('123foo7bah45xx9za678'))
'12345678'
Edit: OK, OP's clarified the question now (he did indeed mean what his code, not his text, said, and now the text is right too;-) but I'm leaving the answer in for completeness (the other answers suggesting re.sub
are correct for the question as it now stands).
I realize you probably mean what you "say" in your Ruby code, and not what you say in your English text, but, just in case, I thought I'd better complete the set of answers!-)
Solution 3
re.subn() is your friend:
>>> import re
>>> key = "cd baz ; ls -l"
>>> re.subn(r'\W', "", key)
('cdbazlsl', 6)
>>> re.subn(r'\W', "", key)[0]
'cdbazlsl'
Returns a tuple. Take the first element if you only want the resulting string. Or just call re.sub(), as SilentGhost notes. (Which is to say, his answer is more exact.)
Solution 4
import re
old = "cd baz ; ls -l"
regex = r"[^\w\d]" # which is the same as \W btw
pat = re.compile( regex )
new = pat.sub('', old )
tkokoszka
I'm a software engineer at Google in Mountain View, California. I love programming and learning new programming languages. Some open source projects I've been involved in: AppScale - an open source implementation of the Google App Engine APIs. Runs App Engine apps written in Python, Java, Go, or PHP over Amazon EC2, Google Compute Engine, Eucalyptus, Xen, or KVM. Active Cloud DB - a software-as-a-service that exposes a REST API to any of the databases that AppScale supports (e.g., HBase, Cassandra, MongoDB) or the Datastore that App Engine supports. Neptune - a domain specific language that automatically configures and deploys high performance computing apps over AppScale. Run your MPI, MapReduce, X10, and other codes automatically over EC2 without needing to know how to start them and configure them!
Updated on June 28, 2022Comments
-
tkokoszka almost 2 years
Is there a simple way to remove all characters from a given string that match a given regular expression? I know in Ruby I can use
gsub
:>> key = "cd baz ; ls -l" => "cd baz ; ls -l" >> newkey = key.gsub(/[^\w\d]/, "") => "cdbazlsl"
What would the equivalent function be in Python?
-
tkokoszka over 14 yearsAh, yes, you are correct. I changed the question to match up with what I was actually trying to say. Thanks!
-
Alex Martelli over 14 yearsWhy call subn and then use [0] rather than just call the simpler sub?
-
hughdbrown over 14 yearsI posted my answer when no other was visible. I subsequently found that it was not an ideal answer. I could have deleted my answer or edited it, possibly with attribution to others for the idea. What have you found answerers do when their answers are not quite on -- delete or edit?
-
John Machin over 14 yearsEmpirical evidence is that it depends on how many up-votes (deserved or not!) have already been acquired :-(
-
ForeverLearner almost 12 yearsDoesn't this only remove the first occurrence?