Split hashmap to partitions in java 8

14,696

Solution 1

Using my unorderedBatches() collector from this answer:

Collector<Entry<String, Set<String>>, ?, List<Map<String, Set<String>>>> batchesCollector = 
    unorderedBatches(100, 
        Collectors.toMap(Entry::getKey, Entry::getValue), Collectors.toList());
List<Map<String, Set<String>>> listofMaps = myMap.entrySet().stream()
        .collect(batchesCollector);

Solution 2

Splitting the stream into ordered fixed size chunks (as in Lists.partition) is not possible because in a parallel execution each chunk will have to wait for its left spatial chunk to be fully processed.

If, however you don't care about the order of the keys in the resulting sub maps (as it would have been returned by the Map#iterator's method), then you can roll a custom collector.

private static <K, V> Collector<Map.Entry<K, V>, ?, List<Map<K, V>>> mapSize(int limit) {
    return Collector.of(ArrayList::new,
            (l, e) -> {
                if (l.isEmpty() || l.get(l.size() - 1).size() == limit) {
                    l.add(new HashMap<>());
                }
                l.get(l.size() - 1).put(e.getKey(), e.getValue());
            },
            (l1, l2) -> {
                if (l1.isEmpty()) {
                    return l2;
                }
                if (l2.isEmpty()) {
                    return l1;
                }
                if (l1.get(l1.size() - 1).size() < limit) {
                    Map<K, V> map = l1.get(l1.size() - 1);
                    ListIterator<Map<K, V>> mapsIte = l2.listIterator(l2.size());
                    while (mapsIte.hasPrevious() && map.size() < limit) {
                        Iterator<Map.Entry<K, V>> ite = mapsIte.previous().entrySet().iterator();
                        while (ite.hasNext() && map.size() < limit) {
                            Map.Entry<K, V> entry = ite.next();
                            map.put(entry.getKey(), entry.getValue());
                            ite.remove();
                        }
                        if (!ite.hasNext()) {
                            mapsIte.remove();
                        }
                    }
                }
                l1.addAll(l2);
                return l1;
            }
    );
}

This one takes map entries as values and put them into a List<Map<K,V>>.

The accumulator, check if the current list is empty or if the size of the last map as reached the limit. If it's the case, it add a new map. Then the new mapping from the current entry processed is added to the map.

The combiner needs to combine two lists that have been built in parallel. If one of the list is empty, return the other. If it's not the case, you need to check that the last map of the first list has the number of elements required. If it's not the case we grab the last map of the second list and we add elements to the last map of the first list. It stops if either the limit is reached or if there is no more elements to add from the second list. Don't forget to remove an empty map if all its elements have been consumed.

One usage of such a collector would be:

List<Map<String, Set<String>>> listofMaps =
                myMap.entrySet().stream().collect(mapSize(2));

Some examples (with both parallel and sequential streams) with an initial map consisting of 13 key-value mappings:

Size of maps 2
{11=[11a, 11b], 12=[12a, 12b]}
{13=[13b, 13a], 8=[8a, 8b]}
{1=[1a, 1b], 2=[2b, 2a]}
{3=[3a, 3b], 6=[6a, 6b]}
{4=[4a, 4b], 5=[5a, 5b]}
{7=[7a, 7b], 10=[10a, 10b]}
{9=[9a, 9b]}
=============================
Size of maps 5
{11=[11a, 11b], 12=[12a, 12b], 13=[13b, 13a], 6=[6a, 6b], 7=[7a, 7b]}
{1=[1a, 1b], 2=[2b, 2a], 3=[3a, 3b], 4=[4a, 4b], 5=[5a, 5b]}
{8=[8a, 8b], 9=[9a, 9b], 10=[10a, 10b]}
=============================
Size of maps 12
{11=[11a, 11b], 12=[12a, 12b], 1=[1a, 1b], 13=[13b, 13a], 2=[2b, 2a], 3=[3a, 3b], 4=[4a, 4b], 5=[5a, 5b], 6=[6a, 6b], 7=[7a, 7b], 8=[8a, 8b], 9=[9a, 9b]}
{10=[10a, 10b]}
=============================
Size of maps 15
{11=[11a, 11b], 12=[12a, 12b], 13=[13b, 13a], 1=[1a, 1b], 2=[2b, 2a], 3=[3a, 3b], 4=[4a, 4b], 5=[5a, 5b], 6=[6a, 6b], 7=[7a, 7b], 8=[8a, 8b], 9=[9a, 9b], 10=[10a, 10b]}

I've not extensively tested it. Also I think you can modify it so that it's even more generic.

For instance you might accept arbitrary objects instead and use two functions to produce a key and a value for each instance you're processing.

private static <T, K, V> Collector<T, ?, List<Map<K, V>>> mapSize(Function<T, K> keyFunc, Function<T, V> mapFunc, int limit) {

with

l.get(l.size() - 1).put(keyFunc.apply(e), mapFunc.apply(e));

and call it like:

.collect(mapSize(Map.Entry::getKey, Map.Entry::getValue, size));
Share:
14,696
Tal Eden
Author by

Tal Eden

Updated on June 19, 2022

Comments

  • Tal Eden
    Tal Eden almost 2 years

    I have hashmap: Map<String, Set<String>> myMap

    and I want to split it to list that contains Map:

    List<Map<String,Set<String>>> listofMaps;
    

    , and each map will be max 100 keys. I know how to do it in the regular way..(foreach on the entryset , and every 100 items create new map). Is there any option to do it with java 8 lambda or something? (something like Lists.partitions() ..) ?

  • Alexis C.
    Alexis C. over 8 years
    Largely more reusable than with what I came off, nice work.. again :-).
  • Sachin HR
    Sachin HR over 2 years
    @Tagir Valeev. I am getting some error when i use unorderedlist method. Can you please help