Python - Use a Regex to Filter Data

16,795

Solution 1

import re
re.sub(pattern, '', s)

Docs

Solution 2

The answers so far have focused on doing the same thing as your Ruby code, which is exactly the reverse of what you're asking in the English part of your question: the code removes character that DO match, while your text asks for

a simple way to remove all characters from a given string that fail to match

For example, suppose your RE's pattern was r'\d{2,}', "two or more digits" -- so the non-matching parts would be all non-digits plus all single, isolated digits. Removing the NON-matching parts, as your text requires, is also easy:

>>> import re
>>> there = re.compile(r'\d{2,}')
>>> ''.join(there.findall('123foo7bah45xx9za678'))
'12345678'

Edit: OK, OP's clarified the question now (he did indeed mean what his code, not his text, said, and now the text is right too;-) but I'm leaving the answer in for completeness (the other answers suggesting re.sub are correct for the question as it now stands). I realize you probably mean what you "say" in your Ruby code, and not what you say in your English text, but, just in case, I thought I'd better complete the set of answers!-)

Solution 3

re.subn() is your friend:

>>> import re
>>> key = "cd baz ; ls -l"
>>> re.subn(r'\W', "", key)
('cdbazlsl', 6)
>>> re.subn(r'\W', "", key)[0]
'cdbazlsl'

Returns a tuple. Take the first element if you only want the resulting string. Or just call re.sub(), as SilentGhost notes. (Which is to say, his answer is more exact.)

Solution 4

import re
old = "cd baz ; ls -l"
regex = r"[^\w\d]" # which is the same as \W btw
pat = re.compile( regex )
new = pat.sub('', old )
Share:
16,795
tkokoszka
Author by

tkokoszka

I'm a software engineer at Google in Mountain View, California. I love programming and learning new programming languages. Some open source projects I've been involved in: AppScale - an open source implementation of the Google App Engine APIs. Runs App Engine apps written in Python, Java, Go, or PHP over Amazon EC2, Google Compute Engine, Eucalyptus, Xen, or KVM. Active Cloud DB - a software-as-a-service that exposes a REST API to any of the databases that AppScale supports (e.g., HBase, Cassandra, MongoDB) or the Datastore that App Engine supports. Neptune - a domain specific language that automatically configures and deploys high performance computing apps over AppScale. Run your MPI, MapReduce, X10, and other codes automatically over EC2 without needing to know how to start them and configure them!

Updated on June 28, 2022

Comments

  • tkokoszka
    tkokoszka almost 2 years

    Is there a simple way to remove all characters from a given string that match a given regular expression? I know in Ruby I can use gsub:

    >> key = "cd baz ; ls -l"
    => "cd baz ; ls -l"
    >> newkey = key.gsub(/[^\w\d]/, "")
    => "cdbazlsl"
    

    What would the equivalent function be in Python?

  • tkokoszka
    tkokoszka over 14 years
    Ah, yes, you are correct. I changed the question to match up with what I was actually trying to say. Thanks!
  • Alex Martelli
    Alex Martelli over 14 years
    Why call subn and then use [0] rather than just call the simpler sub?
  • hughdbrown
    hughdbrown over 14 years
    I posted my answer when no other was visible. I subsequently found that it was not an ideal answer. I could have deleted my answer or edited it, possibly with attribution to others for the idea. What have you found answerers do when their answers are not quite on -- delete or edit?
  • John Machin
    John Machin over 14 years
    Empirical evidence is that it depends on how many up-votes (deserved or not!) have already been acquired :-(
  • ForeverLearner
    ForeverLearner almost 12 years
    Doesn't this only remove the first occurrence?