Split string on whitespace in Python
1,018,891
Solution 1
The str.split()
method without an argument splits on whitespace:
>>> "many fancy word \nhello \thi".split()
['many', 'fancy', 'word', 'hello', 'hi']
Solution 2
import re
s = "many fancy word \nhello \thi"
re.split('\s+', s)
Solution 3
Using split()
will be the most Pythonic way of splitting on a string.
It's also useful to remember that if you use split()
on a string that does not have a whitespace then that string will be returned to you in a list.
Example:
>>> "ark".split()
['ark']
Solution 4
Another method through re
module. It does the reverse operation of matching all the words instead of spitting the whole sentence by space.
>>> import re
>>> s = "many fancy word \nhello \thi"
>>> re.findall(r'\S+', s)
['many', 'fancy', 'word', 'hello', 'hi']
Above regex would match one or more non-space characters.
Author by
siamii
Updated on August 09, 2022Comments
-
siamii almost 2 years
I'm looking for the Python equivalent of
String str = "many fancy word \nhello \thi"; String whiteSpaceRegex = "\\s"; String[] words = str.split(whiteSpaceRegex); ["many", "fancy", "word", "hello", "hi"]
-
yak over 12 yearsAlso good to know is that if you want the first word only (which means passing
1
as second argument), you can useNone
as the first argument:s.split(None, 1)
-
Raymond Hettinger over 12 yearsIf you only want the first word, use str.partition.
-
Gulzar over 8 yearsthis gives me a whitespace token at the end of the line. No idea why, the original line doesn't even have that. Maybe this ignores newline?
-
Óscar López over 8 years@Gulzar do a
strip()
at the end -
user3527975 about 8 years@yak : Can you please edit your comment. The way it sounds right now is that s.split(None, 1) would return 1st word only. It rather gives a list of size 2. First item being the first word, second - rest of the string.
s.split(None, 1)[0]
would return the first word only -
Mark Jin almost 8 yearsNote that this is usually slower than str.split if performance is an issue.
-
lee penkman over 7 yearsAlso the default split trims whitespace from either side so you don't have to call str.strip() e.g.
" asdf asdf \t\n ".split()
returns['asdf', 'asdf']
-
galois over 7 yearsdoes
str.split()
do something likere.split('\s+', string)
behind the scenes? -
Sven Marnach over 7 years@galois No, it uses a custom implementation (which is faster). Also note that it handles leading and trailing whitespace differently.
-
Kishor Pawar over 5 yearsSven, in my case line, could contain words like
'Kishor Pawar' 'Sven Marnach'
. What would you suggest? -
Sven Marnach over 5 years@KishorPawar It's rather unclear to me what you are trying to achieve. Do you want to split on whitespace, but disregard whitespace inside single-quoted substrings? If so, you can look into
shlex.split()
, which may be what you are looking for. Otherwise I suggest asking a new question – you will get a much quicker and more detailed answer. -
Kishor Pawar over 5 yearsThank you @SvenMarnach. You guessed the case correctly. I will take a look at shelx.split()