How can I split by 1 or more occurrences of a delimiter in Python?

58,800

Solution 1

Just do not give any delimeter?

>>> a="test                            result"
>>> a.split()
['test', 'result']

Solution 2

>>> import re
>>> a="test                            result"
>>> re.split(" +",a)
['test', 'result']

>>> a.split()
['test', 'result']

Solution 3

Just this should work:

a.split()

Example:

>>> 'a      b'.split(' ')
['a', '', '', '', '', '', 'b']
>>> 'a      b'.split()
['a', 'b']

From the documentation:

If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace. Consequently, splitting an empty string or a string consisting of just whitespace with a None separator returns [].

Solution 4

Any problem with simple a.split()?

Solution 5

If you want to split by 1 or more occurrences of a delimiter and don't want to just count on the default split() with no parameters happening to match your use case, you can use regex to match the delimiter. The following will use one or more occurrences of . as the delimiter:

s = 'a.b....c......d.ef...g'
sp = re.compile('\.+').split(s)
print(sp)

which gives:

['a', 'b', 'c', 'd', 'ef', 'g']
Share:
58,800

Related videos on Youtube

Adam Matan
Author by

Adam Matan

Team leader, developer, and public speaker. I build end-to-end apps using modern cloud infrastructure, especially serverless tools. My current position is R&D Manager at Corvid by Wix.com, a serverless platform for rapid web app generation. My CV and contact details are available on my Github README.

Updated on February 26, 2020

Comments

  • Adam Matan
    Adam Matan over 3 years

    I have a formatted string from a log file, which looks like:

    >>> a="test                            result"
    

    That is, the test and the result are split by some spaces - it was probably created using formatted string which gave test some constant spacing.

    Simple splitting won't do the trick:

    >>> a.split(" ")
    ['test', '', '', '', ... '', '', '', '', '', '', '', '', '', '', '', 'result']
    

    split(DELIMITER, COUNT) cleared some unnecessary values:

    >>> a.split(" ",1)
    ['test', '                           result']
    

    This helped - but of course, I really need:

    ['test', 'result']
    

    I can use split() followed by map + strip(), but I wondered if there is a more Pythonic way to do it.

    Thanks,

    Adam

    UPDATE: Such a simple solution! Thank you all.

  • Adam Matan
    Adam Matan over 13 years
    Cool. Might help with other, none-whitespace delimiters.
  • Sakie
    Sakie over 13 years
    As for why this works: a.split(None) is a special case, which in Python means "split on one or more whitespace chars". re.split() is the general case solution.
  • Sakie
    Sakie over 13 years
    re.split('\W+',mystring) is more equivalent string.split(None).
  • Wowbagger and his liquid lunch
    Wowbagger and his liquid lunch over 10 years
    This is the only answer to the actual request, "split by 1 or more occurrences of a delimiter".
  • tbrittoborges
    tbrittoborges over 8 years
    One needs to use str.split(None, maxsplit) since the function does not accept keyword arguments. I wonder why.
  • Risinek
    Risinek over 7 years
    this should be accepted Answer.... The other ones are not answering the real question...
  • Risinek
    Risinek over 7 years
    the question was, how to split with delimiter+ (one or more). You answer is saying any of whitespace will be taken as delimiter, which is not correct answer
  • Risinek
    Risinek over 7 years
    the question was, how to split with delimiter+ (one or more). You answer is saying any of whitespace will be taken as delimiter, which is not correct answer
  • BarathVutukuri
    BarathVutukuri over 4 years
    re.split() gives me an extra token if the string ends with a space.
  • theferrit32
    theferrit32 over 3 years
    @BarathVutukuri that is the correct behavior of a split function. If the input sequence ends with a delimiter, there is an empty term after that delimiter. Java's handling of this case is out of the ordinary, where the API documentation specifically says it discards trailing empty terms (but not leading ones) when no term limit is applied. Python, Javascript, C# do not discard trailing terms.

Related