Can the MongoDB aggregation framework $group return an array of values?

19,299

Solution 1

Combining two fields into an array of values with the Aggregation Framework is possible, but definitely isn't as straightforward as it could be (at least as at MongoDB 2.2.0).

Here is an example:

db.metrics.aggregate(

    // Find matching documents first (can take advantage of index)
    { $match : {
        'array_serial' : array, 
        'port_name' : { $in : ports},
        'datetime' : { $gte : from, $lte : to}
    }},

    // Project desired fields and add an extra $index for # of array elements
    { $project: {
        port_name: 1,
        datetime: 1,
        metric: 1,
        index: { $const:[0,1] }
    }},

    // Split into document stream based on $index
    { $unwind: '$index' },

    // Re-group data using conditional to create array [$datetime, $metric]
    { $group: {
        _id: { id: '$_id', port_name: '$port_name' },
        data: {
            $push: { $cond:[ {$eq:['$index', 0]}, '$datetime', '$metric'] }
        },
    }},

    // Sort results
    { $sort: { _id:1 } },

    // Final group by port_name with data array and count
    { $group: {
        _id: '$_id.port_name',
        data: { $push: '$data' },
        count: { $sum: 1 }
    }}
)

Solution 2

MongoDB 2.6 made this a lot easier by introducing $map, which allows a simplier form of array transposition:

db.metrics.aggregate([
   { "$match": {
       "array_serial": array, 
       "port_name": { "$in": ports},
       "datetime": { "$gte": from, "$lte": to }
    }},
    { "$group": {
        "_id": "$port_name",
        "data": {
            "$push": {
                "$map": {
                    "input": [0,1],
                    "as": "index",
                    "in": {
                        "$cond": [
                            { "$eq": [ "$$index", 0 ] },
                            "$datetime",
                            "$metric"
                        ]
                    }
                }
            }
        },
        "count": { "$sum": 1 }
    }}
])

Where much like the approach with $unwind, you supply an array as "input" to the map operation consisting of two values and then essentially replace those values with the field values you want via the $cond operation.

This actually removes all the pipeline juggling required to transform the document as was required in previous releases and just leaves the actual aggregation to the job at hand, which is basically accumulating per "port_name" value, and the transformation to array is no longer a problem area.

Solution 3

Building arrays in the aggregation framework without $push and $addToSet is something that seems to be lacking. I've tried to get this to work before, and failed. It would be awesome if you could just do:

data : {$push: [$datetime, $metric]}

in the $group, but that doesn't work.

Also, building "literal" objects like this doesn't work:

data : {$push: {literal:[$datetime, $metric]}}
or even data : {$push: {literal:$datetime}}

I hope they eventually come up with some better ways of massaging this sort of data.

Share:
19,299

Related videos on Youtube

Chris Matta
Author by

Chris Matta

Updated on September 15, 2022

Comments

  • Chris Matta
    Chris Matta over 1 year

    How flexible is the aggregate function for output formatting in MongoDB?

    Data format:

    {
            "_id" : ObjectId("506ddd1900a47d802702a904"),
            "port_name" : "CL1-A",
            "metric" : "772.0",
            "port_number" : "0",
            "datetime" : ISODate("2012-10-03T14:03:00Z"),
            "array_serial" : "12345"
    }
    

    Right now I'm using this aggregate function to return an array of DateTime, an array of metrics, and a count:

    {$match : { 'array_serial' : array, 
                                'port_name' : { $in : ports},
                                'datetime' : { $gte : from, $lte : to}
                            }
                    },
                   {$project : { port_name : 1, metric : 1, datetime: 1}},
                   {$group : { _id : "$port_name", 
                                datetime : { $push : "$datetime"},
                                metric : { $push : "$metric"},
                                count : { $sum : 1}}}
    

    Which is nice, and very fast, but is there a way to format the output so there's one array per datetime/metric? Like this:

    [
        {
          "_id" : "portname",
          "data" : [
                    ["2012-10-01T00:00:00.000Z", 1421.01],
                    ["2012-10-01T00:01:00.000Z", 1361.01],
                    ["2012-10-01T00:02:00.000Z", 1221.01]
                   ]
        }
    ]
    

    This would greatly simplify the front-end as that's the format the chart code expects.

  • Chris Matta
    Chris Matta over 11 years
    These are the exact methods I tried, I was just assuming it would work. I guess not :(
  • Chris Matta
    Chris Matta over 11 years
    Ah! I didn't know $group could be called more than once. I'll give this a try, thanks!
  • maxdec
    maxdec about 11 years
    What does '$const' do exactly? It doesn't seem to be documented.
  • Stennie
    Stennie about 11 years
    Turns out that $const is an internal implementation detail for serialization between mongos and mongod and is not meant to be (ab)used by end user queries (see jira.mongodb.org/browse/SERVER-6769). In particular this may not work properly through a mongos. At the time I didn't realize it wasn't a documented expression, and I'd seen it (ab)used to add constants to documents as I've done here. I'll try to revisit this answer after MongoDB 2.4.0 is released, as there may be an alternative (and documented) approach.
  • Stennie
    Stennie about 11 years
    Please vote & watch the MongoDB feature request SERVER-8141, which proposes adding an $array aggregation expression :).