Python 2.6+ str.format() and regular expressions
Solution 1
you first would need to format string and then use regex. It really doesn't worth it to put everything into a single line. Escaping is done by doubling the curly braces:
>>> pat= '^(w{{3}}\.)?([0-9A-Za-z-]+\.){{1}}{domainName}$'.format(domainName = 'delivery.com')
>>> pat
'^(w{3}\\.)?([0-9A-Za-z-]+\\.){1}delivery.com$'
>>> re.match(pat, str1)
Also, re.match
is matching at the beginning of the string, you don't have to put ^
if you use re.match
, you need ^
if you're using re.search
, however.
Please note, that {1}
in regex is rather redundant.
Solution 2
Per the documentation, if you need a literal {
or }
to survive the formatting opertation, use {{
and }}
in the original string.
'^(w{{3}}\.)?([0-9A-Za-z-]+\.){{1}}{domainName}$'.format(domainName = 'delivery.com')
Related videos on Youtube
Comments
-
brildum almost 2 years
Using
str.format()
is the new standard for formatting strings in Python 2.6, and Python 3. I've run into an issue when usingstr.format()
with regular expressions.I've written a regular expression to return all domains that are a single level below a specified domain or any domains that are 2 levels below the domain specified, if the 2nd level below is www...
Assuming the specified domain is delivery.com, my regex should return a.delivery.com, b.delivery.com, www.c.delivery.com ... but it should not return x.a.delivery.com.
import re str1 = "www.pizza.delivery.com" str2 = "w.pizza.delivery.com" str3 = "pizza.delivery.com" if (re.match('^(w{3}\.)?([0-9A-Za-z-]+\.){1}delivery.com$', str1): print 'String 1 matches!' if (re.match('^(w{3}\.)?([0-9A-Za-z-]+\.){1}delivery.com$', str2): print 'String 2 matches!' if (re.match('^(w{3}\.)?([0-9A-Za-z-]+\.){1}delivery.com$', str3): print 'String 3 matches!'
Running this should give the result:
String 1 matches! String 3 matches!
Now, the problem is when I try to replace delivery.com dynamically using str.format...
if (re.match('^(w{3}\.)?([0-9A-Za-z-]+\.){1}{domainName}$'.format(domainName = 'delivery.com'), str1): print 'String 1 matches!'
This seems to fail, because the
str.format()
expects the{3}
and{1}
to be parameters to the function. (I'm assuming)I could concatenate the string using + operator
'^(w{3}\.)?([0-9A-Za-z-]+\.){1}' + domainName + '$'
The question comes down to, is it possible to use
str.format()
when the string (usually regex) has "{n}" within it?-
Mark Peters over 14 yearsNot directly related to the question, but you will save yourself a lot of grief later by getting into the habit of always using raw strings in your regex.
-
brildum over 14 years@Mark what are the reasons for this? Thanks for the tip.
-
Mark Peters over 14 yearsAs a rule, anytime you are putting backslashes in string literals you should use raw strings. Otherwise you can end up with unexpected string escapes. This is most evident in Windows file paths where (non-raw) "c:\names\bob" does not mean what you think it means. In a regex, using a raw string means your regex string is what you type. To match a single backslash in a regex, you need to escape it with another: \\ However that sequence in a non-raw string produces a single backslash but it is not obvious from looking at your regex. In a raw string, your r'\\' comes through as expected.
-
-
Don O'Donnell over 14 yearsNot only is
{1}
redundant, but wouldn'twww
be clearer thanw{{3}}
. I know it doesn't answer the original general question but seems like a better solution for this case.