Python: Splitting by certain pattern

19,548

Solution 1

I prefer to use re.findall and specify what I want instead of trying to describe the delimiter for re.split

>>> s = '[5.955894, 45.817792], [10.49238, 45.817792], [10.49238, 47.808381], [5.955894, 47.808381]'
>>> re.findall(r"\[[^\]]*\]",s)
['[5.955894, 45.817792]', '[10.49238, 45.817792]', '[10.49238, 47.808381]', '[5.955894, 47.808381]']
  1. \[ matches [
  2. [^\]]* matches anything but ]
  3. \] matches ]

Solution 2

You need to use re.split with look-ahead:

>>> s = '[5.955894, 45.817792], [10.49238, 45.817792], [10.49238, 47.808381], [5.955894, 47.808381]'

>>> re.split(",[ ]*(?=\[)", s)
['[5.955894, 45.817792]', '[10.49238, 45.817792]', '[10.49238, 47.808381]', '[5.955894, 47.808381]']

And don't use str as variable. It's shadows the built-in.

The below pattern:

,[ ]*(?=\[)

will match the comma(,) and some whitespaces, which is followed by a [

You can even do it with look-behind. So, (?<=\]),[ ]* will also work.

Share:
19,548
grssnbchr
Author by

grssnbchr

Updated on June 15, 2022

Comments

  • grssnbchr
    grssnbchr almost 2 years

    I have the following

    str = '[5.955894, 45.817792], [10.49238, 45.817792], [10.49238, 47.808381], [5.955894, 47.808381]'
    

    I want to split it so that I have an array of strings like

    ['[5.955894, 45.817792]', '[10.49238, 45.817792]', ...]

    So that the [...] objects are elements of the array. It is important that the enclosing [ and ] are included. I've come so far:

    re.split('\D,\s\D', str)
    

    But that gives me:

    ['[5.955894, 45.817792', '10.49238, 45.817792', '10.49238, 47.808381', '5.955894, 47.808381]']
    

    Not really what I want.