Replace all non-alphanumeric characters in a string

137,676

Solution 1

Regex to the rescue!

import re

s = re.sub('[^0-9a-zA-Z]+', '*', s)

Example:

>>> re.sub('[^0-9a-zA-Z]+', '*', 'h^&ell`.,|o w]{+orld')
'h*ell*o*w*orld'

Solution 2

The pythonic way.

print "".join([ c if c.isalnum() else "*" for c in s ])

This doesn't deal with grouping multiple consecutive non-matching characters though, i.e.

"h^&i => "h**i not "h*i" as in the regex solutions.

Solution 3

Try:

s = filter(str.isalnum, s)

in Python3:

s = ''.join(filter(str.isalnum, s))

Edit: realized that the OP wants to replace non-chars with '*'. My answer does not fit

Solution 4

Use \W which is equivalent to [^a-zA-Z0-9_]. Check the documentation, https://docs.python.org/2/library/re.html

import re
s =  'h^&ell`.,|o w]{+orld'
replaced_string = re.sub(r'\W+', '*', s)
output: 'h*ell*o*w*orld'

update: This solution will exclude underscore as well. If you want only alphabets and numbers to be excluded, then solution by nneonneo is more appropriate.

Share:
137,676
tchadwik
Author by

tchadwik

Updated on July 08, 2022

Comments

  • tchadwik
    tchadwik almost 2 years

    I have a string with which i want to replace any character that isn't a standard character or number such as (a-z or 0-9) with an asterisk. For example, "h^&ell`.,|o w]{+orld" is replaced with "h*ell*o*w*orld". Note that multiple characters such as "^&" get replaced with one asterisk. How would I go about doing this?

  • zhazha
    zhazha almost 8 years
    If you handle unicode a lot, you may also need to keep all non-ASCII unicode symbols: re.sub("[\x00-\x2F\x3A-\x40\x5B-\x60\x7B-\x7F]+", " ", ":%# unicode ΣΘΙП@./\n")
  • stackPusher
    stackPusher over 7 years
    If you want to keep spaces in your string, just add a space within the brackets: s = re.sub('[^0-9a-zA-Z ]+', '*', s)
  • Chris
    Chris almost 6 years
    If doing more than one replace, this will perform slightly quicker if you pre-compile the regex, e.g., import re; regex = re.compile('[^0-9a-zA-Z]+'); regex.sub('*', 'h^&ell.,|o w]{+orld')
  • JHS
    JHS over 5 years
    Also note \W is for non-word characters, it's almost the same but allows the underscore as a word character (don't know why): docs.python.org/3.6/library/re.html#index-32
  • Wiktor Stribiżew
    Wiktor Stribiżew about 5 years
    Note that \W is equivalent to [^a-zA-Z0-9_] only in Python 2.x. In Python 3.x, \W+ is equivalent to [^a-zA-Z0-9_] only if re.ASCII / re.A flag is used.
  • Serg
    Serg over 3 years
    You don't need the '+' in the regex
  • nneonneo
    nneonneo over 3 years
    @Serg: The OP wanted to replace multiple consecutive characters with a single * - hence, the + in the regex.
  • Paul Rougieux
    Paul Rougieux over 2 years
    Updated link to the documentation of re, search for \W in the page "Matches Unicode word characters; this includes most characters that can be part of a word in any language, as well as numbers and the underscore. If the ASCII flag is used, only [a-zA-Z0-9_] is matched."