Replace all non-alphanumeric characters in a string
Solution 1
Regex to the rescue!
import re
s = re.sub('[^0-9a-zA-Z]+', '*', s)
Example:
>>> re.sub('[^0-9a-zA-Z]+', '*', 'h^&ell`.,|o w]{+orld')
'h*ell*o*w*orld'
Solution 2
The pythonic way.
print "".join([ c if c.isalnum() else "*" for c in s ])
This doesn't deal with grouping multiple consecutive non-matching characters though, i.e.
"h^&i => "h**i
not "h*i"
as in the regex solutions.
Solution 3
Try:
s = filter(str.isalnum, s)
in Python3:
s = ''.join(filter(str.isalnum, s))
Edit: realized that the OP wants to replace non-chars with '*'. My answer does not fit
Solution 4
Use \W
which is equivalent to [^a-zA-Z0-9_]
. Check the documentation, https://docs.python.org/2/library/re.html
import re
s = 'h^&ell`.,|o w]{+orld'
replaced_string = re.sub(r'\W+', '*', s)
output: 'h*ell*o*w*orld'
update: This solution will exclude underscore as well. If you want only alphabets and numbers to be excluded, then solution by nneonneo is more appropriate.
tchadwik
Updated on July 08, 2022Comments
-
tchadwik almost 2 years
I have a string with which i want to replace any character that isn't a standard character or number such as (a-z or 0-9) with an asterisk. For example, "h^&ell`.,|o w]{+orld" is replaced with "h*ell*o*w*orld". Note that multiple characters such as "^&" get replaced with one asterisk. How would I go about doing this?
-
zhazha almost 8 yearsIf you handle unicode a lot, you may also need to keep all non-ASCII unicode symbols:
re.sub("[\x00-\x2F\x3A-\x40\x5B-\x60\x7B-\x7F]+", " ", ":%# unicode ΣΘΙП@./\n")
-
stackPusher over 7 yearsIf you want to keep spaces in your string, just add a space within the brackets: s = re.sub('[^0-9a-zA-Z ]+', '*', s)
-
Chris almost 6 yearsIf doing more than one replace, this will perform slightly quicker if you pre-compile the regex, e.g.,
import re; regex = re.compile('[^0-9a-zA-Z]+'); regex.sub('*', 'h^&ell.,|o w]{+orld')
-
JHS over 5 yearsAlso note
\W
is for non-word characters, it's almost the same but allows the underscore as a word character (don't know why): docs.python.org/3.6/library/re.html#index-32 -
Wiktor Stribiżew about 5 yearsNote that
\W
is equivalent to[^a-zA-Z0-9_]
only in Python 2.x. In Python 3.x,\W+
is equivalent to[^a-zA-Z0-9_]
only ifre.ASCII
/re.A
flag is used. -
Serg over 3 yearsYou don't need the '+' in the regex
-
nneonneo over 3 years@Serg: The OP wanted to replace multiple consecutive characters with a single
*
- hence, the+
in the regex. -
Paul Rougieux over 2 yearsUpdated link to the documentation of re, search for
\W
in the page "Matches Unicode word characters; this includes most characters that can be part of a word in any language, as well as numbers and the underscore. If the ASCII flag is used, only [a-zA-Z0-9_] is matched."