Split string into key-value pairs
Solution 1
You could do a single call to split() and a single pass on the String using the following code. But it of course assumes the String is valid in the first place:
Map<String, String> map = new HashMap<String, String>();
String test = "pet:cat::car:honda::location:Japan::food:sushi";
// split on ':' and on '::'
String[] parts = test.split("::?");
for (int i = 0; i < parts.length; i += 2) {
map.put(parts[i], parts[i + 1]);
}
for (String s : map.keySet()) {
System.out.println(s + " is " + map.get(s));
}
The above is probably a little bit more efficient than your solution, but if you find your code clearer, then keep it, because there is almost zero chance such an optimization has a significant impact on performance, unless you do that millions of times. Anyway, if it's so important, then you should measure and compare.
EDIT:
for those who wonder what ::?
means in the above code: String.split() takes a regular expression as argument. A separator is a substring that matches the regular expression. ::?
is a regular expression which means: 1 colon, followed by 0 or 1 colon. It thus allows considering ::
and :
as separators.
Solution 2
Using Guava library it's a one-liner:
String test = "pet:cat::car:honda::location:Japan::food:sushi";
Map<String, String> map = Splitter.on( "::" ).withKeyValueSeparator( ':' ).split( test );
System.out.println(map);
The output:
{pet=cat, car=honda, location=Japan, food=sushi}
This also might work faster than JDK String.split
as it does not create a regexp for "::"
.
Update it even handles correctly the corner case from the comments:
String test = "pet:cat::car:honda::location:Japan::food:sushi:::cool";
Map<String, String> map = Splitter.on( "::" ).withKeyValueSeparator( ':' ).split( test );
System.out.println(map);
The output is:
{pet=cat, car=honda, location=Japan, food=sushi, =cool}
Solution 3
Your solution is indeed somewhat inefficient.
The person who gave you the string to parse is also somewhat of a clown. There are industry standard serialization formats, like JSON or XML, for which fast, efficient parses exist. Inventing the square wheel is never a good idea.
First question: Do you care? Is it slow enough that it hinders performance of your application? It's likely not to, but there is only one way to find out. Benchmark your code.
That said, more efficient solutions exist. Below is an example
public static void main (String[] args) throws java.lang.Exception
{
String test = "pet:cat::car:honda::location:Japan::food:sushi";
boolean stateiskey = true;
Map<String, String> map = new HashMap<>();
int keystart = 0;
int keyend = 0;
int valuestart = 0;
int valueend = 0;
for(int i = 0; i < test.length(); i++){
char nextchar = test.charAt(i);
if (stateiskey) {
if (nextchar == ':') {
keyend = i;
stateiskey = false;
valuestart = i + 1;
}
} else {
if (i == test.length() - 1 || (nextchar == ':' && test.charAt(i + 1) == ':')) {
valueend = i;
if (i + 1 == test.length()) valueend += 1; //compensate one for the end of the string
String key = test.substring(keystart, keyend);
String value = test.substring(valuestart, valueend);
keystart = i + 2;
map.put(key, value);
i++;
stateiskey = true;
}
}
}
System.out.println(map);
}
This solution is a finite state machine with only two states. It looks at every character only twice, once when it tests it for a boundary, and once when it copies it to the new string in your map. This is the minimum amount.
It doesn't create objects that are not needed, like stringbuilders, strings or arrays, this keeps collection pressure low.
It maintains good locality. The next character probably always is in cache, so the lookup is cheap.
It comes at a grave cost that is probably not worth it though:
- It's far more complicated and less obvious
- There are all sorts of moving parts
- It's harder to debug when your string is in an unexpected format
- Your coworkers will hate you
- You will hate you when you have to debug something
Worth it? Maybe. How fast do you need that string parsed exactly?
A quick and dirty benchmark at https://ideone.com/8T7twy tells me that for this string, this method is approximately 4 times faster. For longer strings the difference is likely somewhat greater.
But your version is still only 415 milliseconds for 100.000 repetitions, where this one is 99 milliseconds.
Solution 4
Try this code - see the comments for an explanation:
HashMap<String,String> hmap = new HashMap<>();
String str="abc:1::xyz:2::jkl:3";
String straraay[]= str.split("::?");
for(int i=0;i<straraay.length;i+=2) {
hmap.put(straraay[i],straraay[i+1]);
}
for(String s:straraay){
System.out.println(hmap.values()); //for Values only
System.out.println(hmap.keySet()); //for keys only if you want to more clear
}
![v1shnu](https://i.stack.imgur.com/rxvJy.png?s=256&g=1)
v1shnu
I have been into application development for quite sometime and I am really in love with what I do. Although my base is Java, I am quite fluent in JavaScript and Python as well. I try to keep learning something new each and every day. Currently, I am exploring more into becoming a better polyglot programmer. Peace!
Updated on July 05, 2022Comments
-
v1shnu almost 2 years
I have a string like this:
pet:cat::car:honda::location:Japan::food:sushi
Now
:
indicates key-value pairs while::
separates the pairs. I want to add the key-value pairs to a map.I can achieve this using:
Map<String, String> map = new HashMap<String, String>(); String test = "pet:cat::car:honda::location:Japan::food:sushi"; String[] test1 = test.split("::"); for (String s : test1) { String[] t = s.split(":"); map.put(t[0], t[1]); } for (String s : map.keySet()) { System.out.println(s + " is " + map.get(s)); }
But is there an efficient way of doing this?
I feel the code is inefficient because I have used 2
String[]
objects and called thesplit
function twice. Also, I am usingt[0]
andt[1]
which might throw anArrayIndexOutOfBoundsException
if there are no values. -
JB Nizet almost 9 yearsYour code forgets the last key/value pair. There is no foo/sushi in the result. I really doubt it's more efficient: this code creates a whole lot of temporary string objects that need to be garbage-collected.
-
Uma Kanth almost 9 yearsbetter than wasting arrays memory which is already in the string.
-
v1shnu almost 9 yearswow ! this takes care of everything which I thought was inefficient. But one more thing. I changed the string to this "location:Japan::food:sushi:::cool" such that the value cool has no key. But still the output is this: is cool location is Japan food is sushi
-
v1shnu almost 9 yearsI think new_test wont be needed as you can do it in the test1 statement itself like String[] test1 = test.replaceAll("::",":").split(":"); Thanks anyway :)
-
v1shnu almost 9 yearsjust found that the key for the value 'cool' is just an empty String.
-
v1shnu almost 9 yearsNice ! Thanks for this.I don't know if this is the most efficient but this is something that I should keep in mind when dealing with strings like these.
-
Vishnu almost 9 years@ViChU ya that's true :)
-
v1shnu almost 9 years@UmaKanth this snippet calls charAt a number of times. Is that okay ?
-
Uma Kanth almost 9 yearsWe can use a char susbstitute
-
Martijn almost 9 yearscalling replaceAll not really more efficient than calling split
-
Vishnu almost 9 years@Martijn i have upload this code as an alternate way for doing same thing not as an efficient code for doing the thing :)
-
Martijn almost 9 years@Vishnu the question was about efficiency though.
-
Vishnu almost 9 years@Martijn In terms of efficiency i am agree with you but it is given in the questing "I have used 2 String array objects and called the split function twice." <- to resolve this part i have give my code not to make code efficient
-
v1shnu almost 9 yearsWell, to answer your question. There are still many organizations which have not adapted to the latest standards. In my case , this data comes the POS system. There are a huge number of retail stores in America and each retail store has many POS counters and each transaction in the counter sends this data. And by data , it does not include just this string. This string is like a single line in a 10000 line xml file. So there is a requirement to keep the code as efficient as possible. And hence this question arose in my mind :)
-
Martijn almost 9 yearsWell, my benchmarks show that on very flimsy infrastructure (ideone) 10.000 lines take 44 milliseconds. How fast is your requirement? What is the rest of the code doing? How much of its time does it spend parsing strings to maps?
-
v1shnu almost 9 yearsThe whole process is like really HUGE. I can tell you that this line comes in a 3000 line java class which does a lot of xml reading , decryption, map operations , database access and JMS. Consider you buy an iPhone, this string tells me about the type of card you used.
-
Martijn almost 9 yearsMy argument is that those other 2990 lines are likely to have a far greater impact on performance than the 10 lines parsing the string do, and that it's far more important to find out what to optimize (by benchmarking and profiling) than optimising parts that stand out to you in the hope that that improves overall performance. Even very experienced programmers are almost always wrong in their guess what's causing performance issues if they don't profile.
-
Bernhard Barker almost 9 yearsIt might be good to briefly explain what
::?
means (and mention the obvious alternative:|::
for those with the same problem, but different separators). -
v1shnu almost 9 yearsSure I will try profiling. There was a change request in this part of the code. And I am supposed to do it as efficient as possible :)
-
Martijn almost 9 years@ViChU the most efficient way possible (in terms of computing resources) is probably to ditch the Java solution and re-do the entire thing in assembly. It's unlikely that anyone wants you to do that. It's much more likely that you are supposed to do it while keeping to things running with sufficient performance. There are (almost) always trade offs.
-
Salman about 6 yearsI think this part of the code
withKeyValueSeparator( ':' )
should be changed withwithKeyValueSeparator( ":" )
since the method withKeyValueSeparator is taking String as an argument