java - after splitting a string, what is the first element in the array?

12,756

Solution 1

Consider the split expression ",1,2,3,4".split(",");

What would you expect? Right, an empty-string to start with. In your case you have a 'nothing' in front of the first 'a' as well as one behind it.

Update: comments indicate this explanation is not enough of an explanation (which it may not be)... but, it really is this simple: the engine starts at the beginning of the string, and it looks to see if what's in front of it matches the pattern. If it does, it assigns what's behind it to a new item in the split.

On the first character, it has "" (nothing behind it), and it looks to see if there's "" (the pattern) in front of it. There is, so it creates a "" match.

It then moves on, and it has 'a' behind it, and again, it again has "" in front of it. So the second result is an "a" string.

An interesting observation is that, if you use split("", -1) you will also get an empty-string result in the last position of the result array.


Edit 2: If I wrack my brains further, and consider this to be an academic exercise (I would not recommend this in real life...) I can think of only one good way to do a regex split() of a String into a String[] array with 1 character in each string (as opposed to char[] - which other people have given great answers for....).

String[] chars = str.split("(?<=.)", str.length());

This will look behind each character, in a non-capturing group, and split on that, and then limit the array size to the number of characters (you can leave the str.length() out, but if you put -1 you will get an extra space at the end)

Borrowing nitro2k01's alternative (below in the comments) which references the string beginning and end, you can split reliably on:

String[] chars = str.split("(?!(^|$))");

Solution 2

You can just use the built in java method from the string class. myString.toCharArray() the empty string is being stored at index 0

Share:
12,756

Related videos on Youtube

user2994814
Author by

user2994814

Updated on September 23, 2022

Comments

  • user2994814
    user2994814 over 1 year

    I was trying to split a string into an array of single letters. Here's what I did,

    String str = "abcddadfad"; 
    System.out.println(str.length());    //  output: 10  
    String[] strArr = str.split("");  
    System.out.println(strArr.length);   //  output: 11   
    System.out.println(strArr[0]);       // output is nothing 
    

    The new array did contain all the letters, however it has nothing at index 0,not even a white space, but still incremented the size of my array. Can anyone explain why this is happening?

    • Bakuriu
      Bakuriu
      I find it quite counter-intuitive that you can use an empty separator. Because you put any number of empty separators wherever you want making (almost) all array lengths equally valid. The fact that the implementation somehow chooses the "minimum" length doesn't change the fact that this operation doesn't make much sense. Raising a "NoEmptySeparator" exception would have been more appropriate in my opinion.
  • justhalf
    justhalf over 10 years
    You can improve this answer by saying that: "If you just want to split a String into an array of characters, you can just do "myString.toCharArray()", and there will be no empty string in the begninning of the array, and it's also simpler"
  • nitro2k01
    nitro2k01 over 10 years
    While the answer addresses what the OP wants to achieve, it doesn't answer the question that was asked.
  • mangr3n
    mangr3n over 10 years
    It doesn't explain how "" works as a regular expression which is at issue here. I've done some regex stuff, and have never tried any kind of matching with "", such that I understand how it works exactly. Someone who has, or who knows the java regex code internally might be able to explain this better.
  • Floris
    Floris over 10 years
    This seems to be a pretty clear explanation. "Split the string when you encounter nothing, then go to the next character". Note - that second part is important. You don't get an infinite array of empty strings; only the first element returned is nothing, after that the split algorithm increments by at least one. But not the first time. Still a bit odd...
  • user2994814
    user2994814 over 10 years
    If I want to stick with the split() function, is there any way that I can modify the code to circumvent the problem?
  • mangr3n
    mangr3n over 10 years
    No, because the only regex I can think of that works, "", also matches the empty string on the front end. You have to account for it, not "fix" it. The most efficient (performance) way is toCharArray().
  • Floris
    Floris over 10 years
    I wonder if a look around expression works in this context. I can't test that unfortunately.
  • rolfl
    rolfl over 10 years
    Interesting, yes, it explicitly checks for not-start-of-line but also has the -1 split vulnerability ... 50/50 as to which option is better
  • nitro2k01
    nitro2k01 over 10 years
    Well, if you want to get into the silly territory, you could use "(?!(^|$))". But yeeeah.
  • rolfl
    rolfl over 10 years
    Updated my answer again, @nitro2k01's offering will reliably split it as the OP originally intended.
  • ratchet freak
    ratchet freak over 10 years
    it's easier (and more efficient) to use strArr[i] = substring(i,i+1);
  • nitro2k01
    nitro2k01 over 10 years
    Well, the OP is better off using str.toCharArray() for performance reasons, but that's not what the question actually asked.