Split a string, at every nth position

44,032

Solution 1

For a big performance improvement, an alternative would be to use substring() in a loop:

public String[] splitStringEvery(String s, int interval) {
    int arrayLength = (int) Math.ceil(((s.length() / (double)interval)));
    String[] result = new String[arrayLength];

    int j = 0;
    int lastIndex = result.length - 1;
    for (int i = 0; i < lastIndex; i++) {
        result[i] = s.substring(j, j + interval);
        j += interval;
    } //Add the last bit
    result[lastIndex] = s.substring(j);

    return result;
}

Example:

Input:  String st = "1231241251341351452342352456"
Output: 123 124 125 134 135 145 234 235 245 6.

It's not as short as stevevls' solution, but it's way more efficient (see below) and I think it would be easier to adjust in the future, of course depending on your situation.


Performance tests (Java 7u45)

2,000 characters long string - interval is 3.

split("(?<=\\G.{" + count + "})") performance (in miliseconds):

7, 7, 5, 5, 4, 3, 3, 2, 2, 2

splitStringEvery() (substring()) performance (in miliseconds):

2, 0, 0, 0, 0, 1, 0, 1, 0, 0

2,000,000 characters long string - interval is 3.

split() performance (in miliseconds):

207, 95, 376, 87, 97, 83, 83, 82, 81, 83

splitStringEvery() performance (in miliseconds):

44, 20, 13, 24, 13, 26, 12, 38, 12, 13

2,000,000 characters long string - interval is 30.

split() performance (in miliseconds):

103, 61, 41, 55, 43, 44, 49, 47, 47, 45

splitStringEvery() performance (in miliseconds):

7, 7, 2, 5, 1, 3, 4, 4, 2, 1

Conclusion:

The splitStringEvery() method is a lot faster (even after the changes in Java 7u6), and it escalates when the intervals become higher.

Ready-to-use Test Code:

pastebin.com/QMPgLbG9

Solution 2

You can use the brace operator to specify the number of times a character must occur:

String []thisCombo2 = thisCombo.split("(?<=\\G.{" + count + "})");

The brace is a handy tool because you can use it to specify either an exact count or ranges.

Solution 3

Using Google Guava, you can use Splitter.fixedLength()

Returns a splitter that divides strings into pieces of the given length

Splitter.fixedLength(2).split("abcde");
// returns an iterable containing ["ab", "cd", "e"].
Share:
44,032
Emile Beukes
Author by

Emile Beukes

Updated on April 05, 2020

Comments

  • Emile Beukes
    Emile Beukes about 4 years

    I use this regex to split a string at every say 3rd position:

    String []thisCombo2 = thisCombo.split("(?<=\\G...)");
    

    where the 3 dots after the G indicates every nth position to split. In this case, the 3 dots indicate every 3 positions. An example:

    Input: String st = "123124125134135145234235245"
    Output: 123 124 125 134 135 145 234 235 245.
    

    My question is, how do i let the user control the number of positions where the string must be split at? In other words, how do I make those 3 dots, n dots controlled by the user?

  • thedayturns
    thedayturns over 11 years
    Isn't this just premature optimization?
  • Aske B.
    Aske B. over 11 years
    @thedayturns Why are you posting that statement with a question mark? Don't be unsure of your accusations. It's one of those accusations that should be used against people who waste their time with unnecessary performance improvements. Anyway, this is fastly written, ready-to-use code; easier to understand, to me at least; and on the plus side, it runs e.g. 60 times faster in the last case (it grows exponentially with the interval). My whole performance research act may be unnecessary, but now it's there for generations to come.
  • thedayturns
    thedayturns over 11 years
    Good response. I thought about it, and I think you're right - the highest voted answer is probably even more confusing than this one. On the other hand, the google guava solution is better than both you're fine with including another library.
  • Aske B.
    Aske B. over 11 years
    @thedayturns If you mean "if you're fine with including another library" then I agree. It's a very elegant solution, but I don't think it's the majority that wants to include an external library just for one functionality.
  • thedayturns
    thedayturns over 11 years
    Yep. Caught my typo after the 5 minute deadline, whoops.
  • Dennis Meng
    Dennis Meng over 10 years
    With the recent changes to substring's performance, I wonder if this is still fastest. Has anyone tried comparing these using Java 7 instead of Java 6?
  • Aske B.
    Aske B. over 10 years
    @DennisMeng I just tested it out, using the test code I provided, and it has slightly different results. I'll update the results to the answer. Regardless, I would be surprised if the substring would ever become bad enough to match using regex.
  • Zout
    Zout about 8 years
    I think you should check that the input string is non-empty - otherwise you will access result[-1] and get an ArrayIndexOutOfBoundsException on the "add the last bit" line for an empty string..
  • Aske B.
    Aske B. about 8 years
    @Zout You are right. You could also get a NullPointerException if the string is null. Probably also some weird behavior if the interval is 0 or negative. I think this is far beyond what the OP asked though. Defensive programming can be good in some circumstances, but it's not necessary in most cases. Hopefully people will figure out what they need in their own case. Or seek the knowledge about how to handle this in respective questions.
  • Zout
    Zout about 8 years
    In my use case, the string was coming from user input, but the interval was predefined (so we can guarantee the string is non-null and that the interval is > 0) so I think it would make sense to check for isEmpty. I can imagine other situations where the interval would also be user defined, so I see your point.