String.substring vs String[].split

22,987

Solution 1

Since String.Split returns a string[], using a 60-way Split would result in about sixty needless allocations per line. Split goes through your entire string, and creates sixty new object plus the array object itself. Of these sixty one objects you keep exactly one, and let garbage collector deal with the remaining sixty.

If you are calling this in a tight loop, a substring would definitely be more efficient: it goes through the portion of your string up to the second comma ,, and then creates one new object that you keep.

String s = "quick,brown,fox,jumps,over,the,lazy,dog";
int from = s.indexOf(',');
int to = s.indexOf(',', from+1);
String brown = s.substring(from+1, to);

The above prints brown

When you run this multiple times, the substring wins on time hands down: 1,000,000 iterations of split take 3.36s, while 1,000,000 iterations of substring take only 0.05s. And that's with only eight components in the string! The difference for sixty components would be even more drastic.

Solution 2

ofcourse why iterate through whole string, just use substring() and indexOf()

Solution 3

You are certainly better off doing it by hand for two reasons:

  • .split() takes a string as an argument, but this string is interpreted as a Pattern, and for your use case Pattern is costly;
  • as you say, you only need the second element: the algorithm to grab that second element is simple enough to do by hand.

Solution 4

I would use something like:

final int first = searchString.indexOf(",");
final int second = searchString.indexOf(",", first+1);
String result= searchString.substring(first+1, second);

Solution 5

My first inclination would be to find the index of the first and second commas and take the substring.

The only real way to tell for sure, though, is to test each in your particular scenario. Break out the appropriate stopwatch and measure the two.

Share:
22,987
Duncan Krebs
Author by

Duncan Krebs

Yup, I've been through a lot of it, full stack, Java expert, DevOps Guru combined with leadership capabilities as needed.

Updated on July 05, 2022

Comments

  • Duncan Krebs
    Duncan Krebs almost 2 years

    I have a comma delaminated string that when calling String.split(",") it returns an array size of about 60. In a specific use case I only need to get the value of the second value that would be returned from the array. So for example "Q,BAC,233,sdf,sdf," all I want is the value of the string after the first ',' and before the second ','. The question I have for performance am I better off parsing it myself using substring or using the split method and then get the second value in the array? Any input would be appreciated. This method will get called hundreds of times a second so it's important I understand the best approach regarding performance and memory allocation.

    -Duncan

  • Admin
    Admin over 11 years
    The possibility of off-by-one errors? The increased amount of less obvious code?
  • Admin
    Admin over 11 years
    That is, of course, assuming it's actually performance critical and you couldn't achieve more speed up in other ways. Programmers have a tendency to make wildly inaccurate guesses about that.
  • Duncan Krebs
    Duncan Krebs over 11 years
    Thanks, what is the purpose of declaring the index values as final?
  • MrSmith42
    MrSmith42 over 11 years
    It's just a code convention I am used to. I make all variables final which are only assigned once.
  • Duncan Krebs
    Duncan Krebs over 11 years
    I appreciate all the answers. Seems like the theme is substring and you explained it the best.
  • Duncan Krebs
    Duncan Krebs over 11 years
    Thank I you upvoted! Please upvote my question I want to break 600!
  • Mr. Polywhirl
    Mr. Polywhirl over 7 years
    I wrote a method to retrieve a token at a desired index, pastebin.com/R9Z6uW6H