Python interpreting a Regex from a yaml config file

11,284

Solution 1

The problem is the YAML, not the Python.
If you want to store a string value containing literal square brackets in a YAML file, you have to quote it:

regex:
  - '[A-Za-z0-9]'

The use of 'single quotes' means the YAML will not interpret any backslash escape sequences in the regex. e.g. \b

Also, note that in this YAML the value of regex is a list containing one string, not a simple string.

Solution 2

I did this in my YAML parsing "engine".

In [1]: from StringIO import StringIO
In [2]: import re, yaml
In [3]: yaml.add_constructor('!regexp', lambda l, n: re.compile(l.construct_scalar(n)))
In [4]: yaml.load(StringIO("pattern: !regexp '^(Yes|No)$'"))
Out[4]: {'pattern': re.compile(ur'^(Yes|No)$')}

Also this works if you want to use safe_load and !!python/regexp (similar to ruby's and nodejs' implementations):

In [5]: yaml.SafeLoader.add_constructor(u'tag:yaml.org,2002:python/regexp', lambda l, n: re.compile(l.construct_scalar(n)))
In [6]: yaml.safe_load(StringIO("pattern: !!python/regexp '^(Yes|No)$'"))
Out[6]: {'pattern': re.compile(ur'^(Yes|No)$')}

Solution 3

You're using two list constructs in your YAML file. When you load the YAML file:

>>> d = yaml.load(open('config.yaml'))

You get this:

>>> d
{'regex': [['A-Za-z0-9']]}

Note that the square brackets in your regular expression are actually disappearing because they are being recognized as list delimiters. You can quote them:

regex: - "[A-Za-z0-9]"

To get this:

>>> yaml.load(open('config.yaml'))
{'regex': ['[A-Za-z0-9]']}

So the regular expression is d['regex'][0]. But you could also just do this in your yaml file:

regex: "[A-Za-z0-9]"

Which gets you:

>>> d = yaml.load(open('config.yaml'))
>>> d
{'regex': '[A-Za-z0-9]'}

So the regular expression can be retrieved with a similar dictionary lookup:

>>> d['regex']
'[A-Za-z0-9]'

...which is arguably much simpler.

Share:
11,284
Will
Author by

Will

Updated on June 08, 2022

Comments

  • Will
    Will almost 2 years

    So I have a yaml file that I'm using as a config file. I'm trying to do some string matching with regular expressions, but I'm having trouble interpreting the regex from yaml into python. The regex in question looks like this:

    regex:
        - [A-Za-z0-9]
    

    And when I try to use the re.match function, I get this error:

    Traceback (most recent call last):
      File "./dirpylint.py", line 132, in <module>
        sys.exit(main())
      File "./dirpylint.py", line 32, in main
        LevelScan(level)
      File "./dirpylint.py", line 50, in LevelScan
        regex_match(level)
      File "./dirpylint.py", line 65, in regex_match
        if re.match(expression, item) == None:
      File "/usr/lib/python2.7/re.py", line 137, in match
        return _compile(pattern, flags).match(string)
      File "/usr/lib/python2.7/re.py", line 229, in _compile
        p = _cache.get(cachekey)
    TypeError: unhashable type: 'list'
    

    I understand that it's interpreting the regex as a list, but how would I use the regex defined in the yaml file to search for a string?