How to map to multiple elements with Java 8 streams?

119,345

Solution 1

It's an interesting question, because it shows that there are a lot of different approaches to achieve the same result. Below I show three different implementations.


Default methods in Collection Framework: Java 8 added some methods to the collections classes, that are not directly related to the Stream API. Using these methods, you can significantly simplify the implementation of the non-stream implementation:

Collection<DataSet> convert(List<MultiDataPoint> multiDataPoints) {
    Map<String, DataSet> result = new HashMap<>();
    multiDataPoints.forEach(pt ->
        pt.keyToData.forEach((key, value) ->
            result.computeIfAbsent(
                key, k -> new DataSet(k, new ArrayList<>()))
            .dataPoints.add(new DataPoint(pt.timestamp, value))));
    return result.values();
}

Stream API with flatten and intermediate data structure: The following implementation is almost identical to the solution provided by Stuart Marks. In contrast to his solution, the following implementation uses an anonymous inner class as intermediate data structure.

Collection<DataSet> convert(List<MultiDataPoint> multiDataPoints) {
    return multiDataPoints.stream()
        .flatMap(mdp -> mdp.keyToData.entrySet().stream().map(e ->
            new Object() {
                String key = e.getKey();
                DataPoint dataPoint = new DataPoint(mdp.timestamp, e.getValue());
            }))
        .collect(
            collectingAndThen(
                groupingBy(t -> t.key, mapping(t -> t.dataPoint, toList())),
                m -> m.entrySet().stream().map(e -> new DataSet(e.getKey(), e.getValue())).collect(toList())));
}

Stream API with map merging: Instead of flattening the original data structures, you can also create a Map for each MultiDataPoint, and then merge all maps into a single map with a reduce operation. The code is a bit simpler than the above solution:

Collection<DataSet> convert(List<MultiDataPoint> multiDataPoints) {
    return multiDataPoints.stream()
        .map(mdp -> mdp.keyToData.entrySet().stream()
            .collect(toMap(e -> e.getKey(), e -> asList(new DataPoint(mdp.timestamp, e.getValue())))))
        .reduce(new HashMap<>(), mapMerger())
        .entrySet().stream()
        .map(e -> new DataSet(e.getKey(), e.getValue()))
        .collect(toList());
}

You can find an implementation of the map merger within the Collectors class. Unfortunately, it is a bit tricky to access it from the outside. Following is an alternative implementation of the map merger:

<K, V> BinaryOperator<Map<K, List<V>>> mapMerger() {
    return (lhs, rhs) -> {
        Map<K, List<V>> result = new HashMap<>();
        lhs.forEach((key, value) -> result.computeIfAbsent(key, k -> new ArrayList<>()).addAll(value));
        rhs.forEach((key, value) -> result.computeIfAbsent(key, k -> new ArrayList<>()).addAll(value));
        return result;
    };
}

Solution 2

To do this, I had to come up with an intermediate data structure:

class KeyDataPoint {
    String key;
    DateTime timestamp;
    Number data;
    // obvious constructor and getters
}

With this in place, the approach is to "flatten" each MultiDataPoint into a list of (timestamp, key, data) triples and stream together all such triples from the list of MultiDataPoint.

Then, we apply a groupingBy operation on the string key in order to gather the data for each key together. Note that a simple groupingBy would result in a map from each string key to a list of the corresponding KeyDataPoint triples. We don't want the triples; we want DataPoint instances, which are (timestamp, data) pairs. To do this we apply a "downstream" collector of the groupingBy which is a mapping operation that constructs a new DataPoint by getting the right values from the KeyDataPoint triple. The downstream collector of the mapping operation is simply toList which collects the DataPoint objects of the same group into a list.

Now we have a Map<String, List<DataPoint>> and we want to convert it to a collection of DataSet objects. We simply stream out the map entries and construct DataSet objects, collect them into a list, and return it.

The code ends up looking like this:

Collection<DataSet> convertMultiDataPointToDataSet(List<MultiDataPoint> multiDataPoints) {
    return multiDataPoints.stream()
        .flatMap(mdp -> mdp.getData().entrySet().stream()
                           .map(e -> new KeyDataPoint(e.getKey(), mdp.getTimestamp(), e.getValue())))
        .collect(groupingBy(KeyDataPoint::getKey,
                    mapping(kdp -> new DataPoint(kdp.getTimestamp(), kdp.getData()), toList())))
        .entrySet().stream()
        .map(e -> new DataSet(e.getKey(), e.getValue()))
        .collect(toList());
}

I took some liberties with constructors and getters, but I think they should be obvious.

Share:
119,345
skinnypinny
Author by

skinnypinny

Updated on July 09, 2022

Comments

  • skinnypinny
    skinnypinny almost 2 years

    I have a class like this:

    class MultiDataPoint {
      private DateTime timestamp;
      private Map<String, Number> keyToData;
    }
    

    and i want to produce , for each MultiDataPoint

    class DataSet {
            public String key;    
            List<DataPoint> dataPoints;
    }
    
    class DataPoint{
      DateTime timeStamp;
      Number data;
    }
    

    of course a 'key' can be the same across multiple MultiDataPoints.

    So given a List<MultiDataPoint>, how do I use Java 8 streams to convert to List<DataSet>?

    This is how I am currently doing the conversion without streams:

    Collection<DataSet> convertMultiDataPointToDataSet(List<MultiDataPoint> multiDataPoints)
    {
    
        Map<String, DataSet> setMap = new HashMap<>();
    
        multiDataPoints.forEach(pt -> {
            Map<String, Number> data = pt.getData();
            data.entrySet().forEach(e -> {
                String seriesKey = e.getKey();
                DataSet dataSet = setMap.get(seriesKey);
                if (dataSet == null)
                {
                    dataSet = new DataSet(seriesKey);
                    setMap.put(seriesKey, dataSet);
                }
                dataSet.dataPoints.add(new DataPoint(pt.getTimestamp(), e.getValue()));
            });
        });
    
        return setMap.values();
    }
    
  • skinnypinny
    skinnypinny about 10 years
    i have removed the 'Series' constructor from my code now to align with the answer
  • skinnypinny
    skinnypinny about 10 years
    is it just me or the stream version looks more complicated and requires another data structure than the imperative version..
  • Stuart Marks
    Stuart Marks about 10 years
    There might be a way to remove the intermediate data structure. It will require more thought. Whether the streams version is more complicated is a matter of taste, I think. If you know what groupingBy does, it makes perfect sense, but I had to stare at the original code for a while to realize that it was essentially doing a grouping operation.
  • nosid
    nosid about 10 years
    @pdeva: I agree. In particular, as long as developers are not used to functional programming, I think you should avoid overusing the Stream API.
  • Stuart Marks
    Stuart Marks about 10 years
    Nice set of alternatives.
  • Ian Robertson
    Ian Robertson over 6 years
    I had no idea that you could type-safely refer to fields of an anonymous class later on in a stream; very nice!