Replace all non-alphanumeric characters in a string

python

137,676

Solution 1

Regex to the rescue!

import re

s = re.sub('[^0-9a-zA-Z]+', '*', s)

Example:

>>> re.sub('[^0-9a-zA-Z]+', '*', 'h^&ell`.,|o w]{+orld')
'h*ell*o*w*orld'

Solution 2

The pythonic way.

print "".join([ c if c.isalnum() else "*" for c in s ])

This doesn't deal with grouping multiple consecutive non-matching characters though, i.e.

"h^&i => "h**i not "h*i" as in the regex solutions.

Solution 3

Try:

s = filter(str.isalnum, s)

in Python3:

s = ''.join(filter(str.isalnum, s))

Edit: realized that the OP wants to replace non-chars with '*'. My answer does not fit

Solution 4

Use \W which is equivalent to [^a-zA-Z0-9_]. Check the documentation, https://docs.python.org/2/library/re.html

import re
s =  'h^&ell`.,|o w]{+orld'
replaced_string = re.sub(r'\W+', '*', s)
output: 'h*ell*o*w*orld'

update: This solution will exclude underscore as well. If you want only alphabets and numbers to be excluded, then solution by nneonneo is more appropriate.

View more solutions

137,676

Author by

tchadwik

Updated on July 08, 2022

Comments

tchadwik almost 2 years

I have a string with which i want to replace any character that isn't a standard character or number such as (a-z or 0-9) with an asterisk. For example, "h^&ell`.,|o w]{+orld" is replaced with "h*ell*o*w*orld". Note that multiple characters such as "^&" get replaced with one asterisk. How would I go about doing this?
zhazha almost 8 years

If you handle unicode a lot, you may also need to keep all non-ASCII unicode symbols: re.sub("[\x00-\x2F\x3A-\x40\x5B-\x60\x7B-\x7F]+", " ", ":%# unicode ΣΘΙП@./\n")
stackPusher over 7 years

If you want to keep spaces in your string, just add a space within the brackets: s = re.sub('[^0-9a-zA-Z ]+', '*', s)
Chris almost 6 years

If doing more than one replace, this will perform slightly quicker if you pre-compile the regex, e.g., import re; regex = re.compile('[^0-9a-zA-Z]+'); regex.sub('*', 'h^&ell.,|o w]{+orld')
JHS over 5 years

Also note \W is for non-word characters, it's almost the same but allows the underscore as a word character (don't know why): docs.python.org/3.6/library/re.html#index-32
Wiktor Stribiżew about 5 years

Note that \W is equivalent to [^a-zA-Z0-9_] only in Python 2.x. In Python 3.x, \W+ is equivalent to [^a-zA-Z0-9_] only if re.ASCII / re.A flag is used.
Serg over 3 years

You don't need the '+' in the regex
nneonneo over 3 years

@Serg: The OP wanted to replace multiple consecutive characters with a single * - hence, the + in the regex.
Paul Rougieux over 2 years

Updated link to the documentation of re, search for \W in the page "Matches Unicode word characters; this includes most characters that can be part of a word in any language, as well as numbers and the underscore. If the ASCII flag is used, only [a-zA-Z0-9_] is matched."