Split string into key-value pairs

66,620

Solution 1

You could do a single call to split() and a single pass on the String using the following code. But it of course assumes the String is valid in the first place:

    Map<String, String> map = new HashMap<String, String>();
    String test = "pet:cat::car:honda::location:Japan::food:sushi";

    // split on ':' and on '::'
    String[] parts = test.split("::?");

    for (int i = 0; i < parts.length; i += 2) {
        map.put(parts[i], parts[i + 1]);
    }

    for (String s : map.keySet()) {
        System.out.println(s + " is " + map.get(s));
    }

The above is probably a little bit more efficient than your solution, but if you find your code clearer, then keep it, because there is almost zero chance such an optimization has a significant impact on performance, unless you do that millions of times. Anyway, if it's so important, then you should measure and compare.

EDIT:

for those who wonder what ::? means in the above code: String.split() takes a regular expression as argument. A separator is a substring that matches the regular expression. ::? is a regular expression which means: 1 colon, followed by 0 or 1 colon. It thus allows considering :: and : as separators.

Solution 2

Using Guava library it's a one-liner:

String test = "pet:cat::car:honda::location:Japan::food:sushi";
Map<String, String> map = Splitter.on( "::" ).withKeyValueSeparator( ':' ).split( test );
System.out.println(map);

The output:

{pet=cat, car=honda, location=Japan, food=sushi}

This also might work faster than JDK String.split as it does not create a regexp for "::".

Update it even handles correctly the corner case from the comments:

String test = "pet:cat::car:honda::location:Japan::food:sushi:::cool";
Map<String, String> map = Splitter.on( "::" ).withKeyValueSeparator( ':' ).split( test );
System.out.println(map);

The output is:

{pet=cat, car=honda, location=Japan, food=sushi, =cool}

Solution 3

Your solution is indeed somewhat inefficient.

The person who gave you the string to parse is also somewhat of a clown. There are industry standard serialization formats, like JSON or XML, for which fast, efficient parses exist. Inventing the square wheel is never a good idea.

First question: Do you care? Is it slow enough that it hinders performance of your application? It's likely not to, but there is only one way to find out. Benchmark your code.

That said, more efficient solutions exist. Below is an example

public static void main (String[] args) throws java.lang.Exception
{
    String test = "pet:cat::car:honda::location:Japan::food:sushi";
    boolean stateiskey = true;

    Map<String, String> map = new HashMap<>();
    int keystart = 0;
    int keyend = 0;
    int valuestart = 0;
    int valueend = 0;

    for(int i = 0; i < test.length(); i++){
        char nextchar = test.charAt(i);
        if (stateiskey) {
            if (nextchar == ':') {
              keyend = i;           
              stateiskey = false;
              valuestart = i + 1;
            }
        } else {
            if (i == test.length() - 1 || (nextchar == ':' && test.charAt(i + 1) == ':')) {
                valueend = i;
                if (i + 1 == test.length()) valueend += 1; //compensate one for the end of the string
                String key = test.substring(keystart, keyend);
                String value = test.substring(valuestart, valueend);
                keystart = i + 2;
                map.put(key, value);
                i++;
                stateiskey = true;
            }
        }
    }

    System.out.println(map);
}

This solution is a finite state machine with only two states. It looks at every character only twice, once when it tests it for a boundary, and once when it copies it to the new string in your map. This is the minimum amount.

It doesn't create objects that are not needed, like stringbuilders, strings or arrays, this keeps collection pressure low.

It maintains good locality. The next character probably always is in cache, so the lookup is cheap.

It comes at a grave cost that is probably not worth it though:

  • It's far more complicated and less obvious
  • There are all sorts of moving parts
  • It's harder to debug when your string is in an unexpected format
  • Your coworkers will hate you
  • You will hate you when you have to debug something

Worth it? Maybe. How fast do you need that string parsed exactly?

A quick and dirty benchmark at https://ideone.com/8T7twy tells me that for this string, this method is approximately 4 times faster. For longer strings the difference is likely somewhat greater.

But your version is still only 415 milliseconds for 100.000 repetitions, where this one is 99 milliseconds.

Solution 4

Try this code - see the comments for an explanation:

HashMap<String,String> hmap = new HashMap<>();
String str="abc:1::xyz:2::jkl:3";
String straraay[]= str.split("::?");

for(int i=0;i<straraay.length;i+=2) {
    hmap.put(straraay[i],straraay[i+1]);
}

for(String s:straraay){
    System.out.println(hmap.values()); //for Values only
    System.out.println(hmap.keySet()); //for keys only if you want to more clear
}
Share:
66,620
v1shnu
Author by

v1shnu

I have been into application development for quite sometime and I am really in love with what I do. Although my base is Java, I am quite fluent in JavaScript and Python as well. I try to keep learning something new each and every day. Currently, I am exploring more into becoming a better polyglot programmer. Peace!

Updated on July 05, 2022

Comments

  • v1shnu
    v1shnu almost 2 years

    I have a string like this:

    pet:cat::car:honda::location:Japan::food:sushi
    

    Now : indicates key-value pairs while :: separates the pairs. I want to add the key-value pairs to a map.

    I can achieve this using:

    Map<String, String> map = new HashMap<String, String>();
    String test = "pet:cat::car:honda::location:Japan::food:sushi";
    String[] test1 = test.split("::");
    
    for (String s : test1) {
        String[] t = s.split(":");
        map.put(t[0], t[1]);
    }
    
    for (String s : map.keySet()) {
        System.out.println(s + " is " + map.get(s));
    }
    

    But is there an efficient way of doing this?


    I feel the code is inefficient because I have used 2 String[] objects and called the split function twice. Also, I am using t[0] and t[1] which might throw an ArrayIndexOutOfBoundsException if there are no values.

  • JB Nizet
    JB Nizet almost 9 years
    Your code forgets the last key/value pair. There is no foo/sushi in the result. I really doubt it's more efficient: this code creates a whole lot of temporary string objects that need to be garbage-collected.
  • Uma Kanth
    Uma Kanth almost 9 years
    better than wasting arrays memory which is already in the string.
  • v1shnu
    v1shnu almost 9 years
    wow ! this takes care of everything which I thought was inefficient. But one more thing. I changed the string to this "location:Japan::food:sushi:::cool" such that the value cool has no key. But still the output is this: is cool location is Japan food is sushi
  • v1shnu
    v1shnu almost 9 years
    I think new_test wont be needed as you can do it in the test1 statement itself like String[] test1 = test.replaceAll("::",":").split(":"); Thanks anyway :)
  • v1shnu
    v1shnu almost 9 years
    just found that the key for the value 'cool' is just an empty String.
  • v1shnu
    v1shnu almost 9 years
    Nice ! Thanks for this.I don't know if this is the most efficient but this is something that I should keep in mind when dealing with strings like these.
  • Vishnu
    Vishnu almost 9 years
    @ViChU ya that's true :)
  • v1shnu
    v1shnu almost 9 years
    @UmaKanth this snippet calls charAt a number of times. Is that okay ?
  • Uma Kanth
    Uma Kanth almost 9 years
    We can use a char susbstitute
  • Martijn
    Martijn almost 9 years
    calling replaceAll not really more efficient than calling split
  • Vishnu
    Vishnu almost 9 years
    @Martijn i have upload this code as an alternate way for doing same thing not as an efficient code for doing the thing :)
  • Martijn
    Martijn almost 9 years
    @Vishnu the question was about efficiency though.
  • Vishnu
    Vishnu almost 9 years
    @Martijn In terms of efficiency i am agree with you but it is given in the questing "I have used 2 String array objects and called the split function twice." <- to resolve this part i have give my code not to make code efficient
  • v1shnu
    v1shnu almost 9 years
    Well, to answer your question. There are still many organizations which have not adapted to the latest standards. In my case , this data comes the POS system. There are a huge number of retail stores in America and each retail store has many POS counters and each transaction in the counter sends this data. And by data , it does not include just this string. This string is like a single line in a 10000 line xml file. So there is a requirement to keep the code as efficient as possible. And hence this question arose in my mind :)
  • Martijn
    Martijn almost 9 years
    Well, my benchmarks show that on very flimsy infrastructure (ideone) 10.000 lines take 44 milliseconds. How fast is your requirement? What is the rest of the code doing? How much of its time does it spend parsing strings to maps?
  • v1shnu
    v1shnu almost 9 years
    The whole process is like really HUGE. I can tell you that this line comes in a 3000 line java class which does a lot of xml reading , decryption, map operations , database access and JMS. Consider you buy an iPhone, this string tells me about the type of card you used.
  • Martijn
    Martijn almost 9 years
    My argument is that those other 2990 lines are likely to have a far greater impact on performance than the 10 lines parsing the string do, and that it's far more important to find out what to optimize (by benchmarking and profiling) than optimising parts that stand out to you in the hope that that improves overall performance. Even very experienced programmers are almost always wrong in their guess what's causing performance issues if they don't profile.
  • Bernhard Barker
    Bernhard Barker almost 9 years
    It might be good to briefly explain what ::? means (and mention the obvious alternative :|:: for those with the same problem, but different separators).
  • v1shnu
    v1shnu almost 9 years
    Sure I will try profiling. There was a change request in this part of the code. And I am supposed to do it as efficient as possible :)
  • Martijn
    Martijn almost 9 years
    @ViChU the most efficient way possible (in terms of computing resources) is probably to ditch the Java solution and re-do the entire thing in assembly. It's unlikely that anyone wants you to do that. It's much more likely that you are supposed to do it while keeping to things running with sufficient performance. There are (almost) always trade offs.
  • Salman
    Salman about 6 years
    I think this part of the code withKeyValueSeparator( ':' ) should be changed with withKeyValueSeparator( ":" ) since the method withKeyValueSeparator is taking String as an argument