Tokenising a String containing empty tokens
Pass a -1
to split
as the limit
argument:
String s = ",abd,def,,ghi,";
String[] tokens = s.split(",", -1);
Then your result array will include any trailing empty strings.
From the javadocs:
If [the limit] is non-positive then the pattern will be applied as many times as possible and the array can have any length. If [the limit] is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
Calling split(regex)
acts as if the limit
argument is 0
, so trailing empty strings are discarded.
Adamski
Updated on June 29, 2022Comments
-
Adamski almost 2 years
I have a seemingly simple problem of splitting a comma separated
String
into tokens, whereby the output should include empty tokens in cases where:- The first character in the
String
is a comma. - The last character in the
String
is a comma. - Two consecutive commas occur.
For example, for the
String
:",abd,def,,ghi,"
should yield the output:{"", "abd", "def", "", "ghi", ""}
.I have tried using
String.split
,Scanner
andStringTokenizer
for this but each gives a different undesired output (examples below). Can anyone suggest an elegant solution for this, preferably using JDK classes? Obviously I could code something myself but I feel like I'm missing something on one of the three approaches mentioned. Note that the delimiter is a fixedString
although not necessarily a comma, nor a single character.Example Code
import java.util.*; public class Main12 { public static void main(String[] args) { String s = ",abd,def,,ghi,"; String[] tokens = s.split(","); System.err.println("--- String.split Output ---"); System.err.println(String.format("%s -> %s", s, Arrays.asList(tokens))); for (int i=0; i<tokens.length; ++i) { System.err.println(String.format("tokens[%d] = %s", i, tokens[i])); } System.err.println("--- Scanner Output ---"); Scanner sc = new Scanner(s); sc.useDelimiter(","); while (sc.hasNext()) { System.err.println(sc.next()); } System.err.println("--- StringTokenizer Output ---"); StringTokenizer tok = new StringTokenizer(s, ","); while (tok.hasMoreTokens()) { System.err.println(tok.nextToken()); } } }
Output
$ java Main12 --- String.split Output --- ,abd,def,,ghi, -> [, abd, def, , ghi] tokens[0] = tokens[1] = abd tokens[2] = def tokens[3] = tokens[4] = ghi --- Scanner Output --- abd def ghi --- StringTokenizer Output --- abd def ghi
- The first character in the