Split string on whitespace in Python

1,018,891

Solution 1

The str.split() method without an argument splits on whitespace:

>>> "many   fancy word \nhello    \thi".split()
['many', 'fancy', 'word', 'hello', 'hi']

Solution 2

import re
s = "many   fancy word \nhello    \thi"
re.split('\s+', s)

Solution 3

Using split() will be the most Pythonic way of splitting on a string.

It's also useful to remember that if you use split() on a string that does not have a whitespace then that string will be returned to you in a list.

Example:

>>> "ark".split()
['ark']

Solution 4

Another method through re module. It does the reverse operation of matching all the words instead of spitting the whole sentence by space.

>>> import re
>>> s = "many   fancy word \nhello    \thi"
>>> re.findall(r'\S+', s)
['many', 'fancy', 'word', 'hello', 'hi']

Above regex would match one or more non-space characters.

Share:
1,018,891
siamii
Author by

siamii

Updated on August 09, 2022

Comments

  • siamii
    siamii almost 2 years

    I'm looking for the Python equivalent of

    String str = "many   fancy word \nhello    \thi";
    String whiteSpaceRegex = "\\s";
    String[] words = str.split(whiteSpaceRegex);
    
    ["many", "fancy", "word", "hello", "hi"]
    
  • yak
    yak over 12 years
    Also good to know is that if you want the first word only (which means passing 1 as second argument), you can use None as the first argument: s.split(None, 1)
  • Raymond Hettinger
    Raymond Hettinger over 12 years
    If you only want the first word, use str.partition.
  • Gulzar
    Gulzar over 8 years
    this gives me a whitespace token at the end of the line. No idea why, the original line doesn't even have that. Maybe this ignores newline?
  • Óscar López
    Óscar López over 8 years
    @Gulzar do a strip() at the end
  • user3527975
    user3527975 about 8 years
    @yak : Can you please edit your comment. The way it sounds right now is that s.split(None, 1) would return 1st word only. It rather gives a list of size 2. First item being the first word, second - rest of the string. s.split(None, 1)[0] would return the first word only
  • Mark Jin
    Mark Jin almost 8 years
    Note that this is usually slower than str.split if performance is an issue.
  • lee penkman
    lee penkman over 7 years
    Also the default split trims whitespace from either side so you don't have to call str.strip() e.g. " asdf asdf \t\n ".split() returns ['asdf', 'asdf']
  • galois
    galois over 7 years
    does str.split() do something like re.split('\s+', string) behind the scenes?
  • Sven Marnach
    Sven Marnach over 7 years
    @galois No, it uses a custom implementation (which is faster). Also note that it handles leading and trailing whitespace differently.
  • Kishor Pawar
    Kishor Pawar over 5 years
    Sven, in my case line, could contain words like 'Kishor Pawar' 'Sven Marnach'. What would you suggest?
  • Sven Marnach
    Sven Marnach over 5 years
    @KishorPawar It's rather unclear to me what you are trying to achieve. Do you want to split on whitespace, but disregard whitespace inside single-quoted substrings? If so, you can look into shlex.split(), which may be what you are looking for. Otherwise I suggest asking a new question – you will get a much quicker and more detailed answer.
  • Kishor Pawar
    Kishor Pawar over 5 years
    Thank you @SvenMarnach. You guessed the case correctly. I will take a look at shelx.split()