Most efficient way of splitting String in Java
Solution 1
StringTokenizer
is faster than StringBuilder
.
public static void main(String[] args) {
String str = "This is String , split by StringTokenizer, created by me";
StringTokenizer st = new StringTokenizer(str);
System.out.println("---- Split by space ------");
while (st.hasMoreElements()) {
System.out.println(st.nextElement());
}
System.out.println("---- Split by comma ',' ------");
StringTokenizer st2 = new StringTokenizer(str, ",");
while (st2.hasMoreElements()) {
System.out.println(st2.nextElement());
}
}
Solution 2
This is the method I use for splitting large (1GB+) tab-separated files. It is limited to a char
delimiter to avoid any overhead of additional method invocations (which may be optimized out by the runtime), but it can be easily converted to String-delimited. I'd be interested if anyone can come up with a faster method or improvements on this method.
public static String[] split(final String line, final char delimiter)
{
CharSequence[] temp = new CharSequence[(line.length() / 2) + 1];
int wordCount = 0;
int i = 0;
int j = line.indexOf(delimiter, 0); // first substring
while (j >= 0)
{
temp[wordCount++] = line.substring(i, j);
i = j + 1;
j = line.indexOf(delimiter, i); // rest of substrings
}
temp[wordCount++] = line.substring(i); // last substring
String[] result = new String[wordCount];
System.arraycopy(temp, 0, result, 0, wordCount);
return result;
}
Solution 3
If you want the ultimate in efficiency I wouldn't use Strings
at all, let alone split them. I would do what compilers do: process the file a character at a time. Use a BufferedReader
with a large buffer size, say 128kb, and read a char
at a time, accumulating them into say a StringBuilder
until you get a ;
or line terminator.
Related videos on Youtube
user92038111111
Updated on June 04, 2022Comments
-
user92038111111 almost 2 years
For the sake of this question, let's assume I have a
String
which contains the valuesTwo;.Three;.Four
(and so on) but the elements are separated by;.
.Now I know there are multiple ways of splitting a string such as
split()
andStringTokenizer
(being the faster one and works well) but my input file is around 1GB and I am looking for something slightly more efficient thanStringTokenizer
.After some research, I found that
indexOf
andsubstring
are quite efficient but the examples only have single delimiters or results are returning only a single word/element.Sample code using
indexOf
andsubstring
:String s = "quick,brown,fox,jumps,over,the,lazy,dog"; int from = s.indexOf(','); int to = s.indexOf(',', from+1); String brown = s.substring(from+1, to);
The above works for printing
brown
but how can I useindexOf
andsubstring
to split a line with multiple delimiters and display all the items as below.Expected output
Two Three Four ....and so on
-
Buhake Sindi about 9 yearsWhat are you trying to achieve? Have you done tests on various test cases and see which is "efficient"?
-
Prashant about 9 yearsJust loop, indexOf() takes a start parameter which is supposed to be the last found index.
-
-
user92038111111 about 9 yearsOkay will give this a try and report back. Thanks
-
user207421 about 7 years@AvinashRaj Your comment has nothing to do with my answer. Don't post irrelevant comments here.
-
user207421 about 7 years@AvinashRaj That doesn't have anything more to do with my answer than your previous comment.
-
Sport about 3 yearsYou can further improve this by obtaining all the indexes at once, as indexOf loops through the String
-
Parker about 3 years@Sport Inside the loop, I start each search after the index of the previous occurrence (
line.indexOf(delimiter, i)
), so each character is only checked once. I could probably write an inline version ofindexOf(char, int)
to avoid the overhead of repeated method invocation. -
Yonathan W'Gebriel almost 3 yearsAccording to JDK Docs,
StringTokenizer
is considered a Legacy class for a while now. The recommendation is to useString.split
or something fromjava.util.regex
package.