String.split by semicolon

16,180

Solution 1

The phrase contains bi-directional characters like right-to-left embedding. It's why some editors don't manage to display correctly the string.

This piece of code shows the actual characters in the String (for some people the phrase won't display here the right way, but it compiles and looks fine in Eclipse). I just translate left-right with ->, right-to-left with <- and pop directions with ^:

public static void main(String[]args) {
    String phrase = "‫;‪14/May/2015‬‬ ‫‪FC‬‬ ‫‪Barcelona‬‬ ‫‪VS.‬‬ ‫‪Real‬‬ ‫‪Madrid";
    String[] dateSplit = phrase.split(";");
    for (String d : dateSplit) {
        System.out.println(d);
    }
    char[] c = phrase.toCharArray();
    StringBuilder p = new StringBuilder();
    for (int i = 0; i < c.length;i++) {
        int code = Character.codePointAt(c, i);
        switch (code) {
        case 8234:
            p.append(" -> ");
            break;
        case 8235:
            p.append(" <- ");
            break;
        case 8236:
            p.append(" ^ ");
            break;
        default:
            p.append(c[i]);
        }
    }
    System.out.println(p.toString());
}

Prints:

<- ; -> 14/May/2015 ^ ^ <- -> FC ^ ^ <- -> Barcelona ^ ^ <- -> VS. ^ ^ <- -> Real ^ ^ <- -> Madrid

The String#split() will work on the actual character string and not on what the editor displays, hence you can see the ; is the second character after a right-to-left, which gives (beware of display again: the ; is not part of the string in dateSplit[1]):

dateSplit[0] = "";
dateSplit[1] = "14/May/2015‬‬ ‫‪FC‬‬ ‫‪Barcelona‬‬ ‫‪VS.‬‬ ‫‪Real‬‬ ‫‪Madrid";

I guess you are processing data from a language writing/reading from right-to-left and there is some mixing with the football team names which are left-to-right. The solution is certainly to get rid of directional characters and put the ; at the right place, i.e as a separator for the token.

Solution 2

I rewrote your code, instead of coping from here and its working perfectly fine.

public static void main(String[] args) {
    String phrase = "14/May/2015; FC Barcelona VS. Real Madrid";
    String[] dateSplit = phrase.split(";");
    System.out.println("dateSplit[0]:" + dateSplit[0]);
    System.out.println("dateSplit[1]:" + dateSplit[1]);
}

Demo

Share:
16,180

Related videos on Youtube

s_puria
Author by

s_puria

Freshman Student in computer science.

Updated on September 16, 2022

Comments

  • s_puria
    s_puria over 1 year

    I want to split a string by semicolon(";"):

    String phrase = "‫;‪14/May/2015‬‬ ‫‪FC‬‬ ‫‪Barcelona‬‬ ‫‪VS.‬‬ ‫‪Real‬‬ ‫‪Madrid";
    String[] dateSplit = phrase.split(";");
    System.out.println("dateSplit[0]:" + dateSplit[0]);
    System.out.println("dateSplit[1]:" + dateSplit[1]);
    

    But it removes the ";" from string and puts all string to 'datesplit1' so the output is:

    dateSplit[0]:‫
    dateSplit[1]:‪14/May/2015‬‬ ‫‪FC‬‬ ‫‪Barcelona‬‬ ‫‪VS.‬‬ ‫‪Real‬‬ ‫‪Madrid`
    

    Demo

    and on doing

    System.out.println("Real String :"+phrase);
    

    string printed is

    Real String :‫;‪14/May/2015‬‬ ‫‪FC‬‬ ‫‪Barcelona‬‬ ‫‪VS.‬‬ ‫‪Real‬‬ ‫‪Madrid
    
    • vdwijngaert
      vdwijngaert about 9 years
      Your "phrase" variable is not correctly formatted. Show us the actual code and we might be able to help.
    • Palcente
      Palcente about 9 years
      I smell possible encoding issue here.
    • Maroun
      Maroun about 9 years
      @s_puria No way, this won't even compile.
    • alainlompo
      alainlompo about 9 years
      @s_puria as it appears right now, the String is just made of Madrid and what follows make the code non - compilable
    • Naman Gala
      Naman Gala about 9 years
      When I copied your code on my system, it got copied like this ‪String phrase = ";14/May/2015‬ ‫‪FC‬‬ ‫‪Barcelona‬‬ ‫‪VS.‬‬ ‫‪Real‬‬ ‫‪Madrid";
    • Palcente
      Palcente about 9 years
      There is an invisible character on position 0 and 2 in your phrase String, which is not UTF-8, hence the issue. Depending on the browser/os it may get clipboarded or not...
    • singhakash
      singhakash about 9 years
      @MarounMaroun I agree with you the string declaration should give an compilation error but its compiling fine I have checked it.I dont get the reason yet
    • Palcente
      Palcente about 9 years
      in UTF-8 this string looks like this: "?;?14/May/2015?? ??FC?? ??Barcelona?? ??VS.?? ??Real?? ??Madrid"
  • Palcente
    Palcente about 9 years
    this is not the string OP posted
  • Prashant
    Prashant about 9 years
    there are some hidden character in string which OP posted
  • Naman Gala
    Naman Gala about 9 years
    @Palcente, Oh I see, i thought OP is trying with this text which is visible in the question.
  • Naman Gala
    Naman Gala about 9 years
    @s_puria, is this you wanted? Or there is some hidden character?
  • s_puria
    s_puria about 9 years
    There are some hidden charracters.
  • Tom
    Tom about 9 years
    You say: "However, I would recommend usinge a StringTokenizer instead. You can then iterate over it, which leads to nicer (and safer) code.", JavaDoc says: "StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.". What should someone new to Java think? :P
  • Steve Chaloner
    Steve Chaloner about 9 years
    @Tom hmm, hadn't noticed that, probably because it hasn't been annotated as Deprecated (also, I can't remember the last time I actually used a StringTokenizer). Good to know.
  • s_puria
    s_puria about 9 years
    There were some kind of hidden characters in my code and I think it was RTL. but it showed LTR.