Group by date intervals

10,271

Solution 1

There are two different ways to do this. One is to issue a separate count() query for each of the ranges. This is pretty easy, and if the datetime field is indexed, it will be fast.

The second way is to combine them all into one query using a similar method as your SQL example. To do this, you need to use the aggregate() method, creating a pipeline of $project to create the 0 or 1 values for the new "last_day", "last_week", and "last_month" fields, and then use the $group operator to do the sums.

Solution 2

There are date aggregation operators available to the aggregation framework of MongoDB. So for example a $dayOfYear operator is used to get that value from the date for use in grouping:

db.collection.aggregate([
    { "$group": {
        "_id": { "$dayOfYear": "$datetime" },
        "total": { "$sum": "$count" }
    }}
])

Or you can use a date math approach instead. By applying the epoch date you convert the date object to a number where the math can be applied:

db.collection.aggregate([
    { "$group": {
        "_id": { 
            "$subtract": [
                { "$subtract": [ "$datetime", new Date("1970-01-01") ] },
                { "$mod": [
                    { "$subtract": [ "$datetime", new Date("1970-01-01") ] },
                    1000 * 60 * 60 * 24
                ]}
            ]
        },
        "total": { "$sum": "$count" }
    }}
])

If what you are after is intervals from a current point in time then what you want is basically the date math approach and working in some conditionals via the $cond operator:

db.collection.aggregate([
    { "$match": {
        "datetime": { 
            "$gte": new Date(new Date().valueOf() - ( 1000 * 60 * 60 * 24 * 365 ))
        }
    }},
    { "$group": {
        "_id": null,
        "24hours": { 
            "$sum": {
                "$cond": [
                    { "$gt": [
                        { "$subtract": [ "$datetime", new Date("1970-01-01") ] },
                        new Date().valueOf() - ( 1000 * 60 * 60 * 24 )
                    ]},
                    "$count",
                    0
                ]
            }
        },
        "30days": { 
            "$sum": {
                "$cond": [
                    { "$gt": [
                        { "$subtract": [ "$datetime", new Date("1970-01-01") ] },
                        new Date().valueOf() - ( 1000 * 60 * 60 * 24 * 30 )
                    ]},
                    "$count",
                    0
                ]
            }
        },
        "OneYear": { 
            "$sum": {
                "$cond": [
                    { "$gt": [
                        { "$subtract": [ "$datetime", new Date("1970-01-01") ] },
                        new Date().valueOf() - ( 1000 * 60 * 60 * 24 * 365 )
                    ]},
                    "$count",
                    0
                ]
            }
        }
    }}
])

It's essentially the same approach as the SQL example, where the query conditionally evaluates whether the date value falls within the required range and decides whether or not to add the value to the sum.

The one addition here is the additional $match stage to restrict the query to only act on those items that would possibly be within the maximum one year range you are asking for. That makes it a bit better than the presented SQL in that an index could be used to filter those values out and you don't need to "brute force" through non matching data in the collection.

Always a good idea to restrict the input with $match when using an aggregation pipeline.

Share:
10,271
jonasasx
Author by

jonasasx

Updated on June 06, 2022

Comments

  • jonasasx
    jonasasx almost 2 years

    I have a collection with documents like this:

    { datetime: new Date(), count: 1234 }
    

    I want to get sums of count by 24 hours, 7 days and 30 days intervals.

    The result should be like:

    { "sum": 100,  "interval": "day" }
    { "sum": 700,  "interval": "week" }
    { "sum": 3000, "interval": "month" }
    

    In more abstract terms, I need to group results by multiple conditions (in this case — multiple time intervals)

    The MySQL equivalent would be:

    SELECT 
        IF (time>CURRENT_TIMESTAMP() - INTERVAL 24 HOUR, 1, 0) last_day,
        IF (time>CURRENT_TIMESTAMP() - INTERVAL 168 HOUR, 1, 0) last_week,
        IF (time>CURRENT_TIMESTAMP() - INTERVAL 720 HOUR, 1, 0) last_month,
        SUM(count) count
    FROM table
    GROUP BY    last_day,
                last_week,
                last_month
    
  • jonasasx
    jonasasx over 9 years
    I don't need sums of every day. I need sums of only these three time intervals.
  • Neil Lunn
    Neil Lunn over 9 years
    @jonasasx Then what do you think you do? What are you expecting? All sums in a single query result or is running separate queries okay? You need to explain what you expect as a result.
  • jonasasx
    jonasasx over 9 years
    I want to get three entries for each interval (day, week, month) as a result. Ideally, in a single query.
  • Neil Lunn
    Neil Lunn over 9 years
    @jonasasx That is very possible once you explain it clearly. The approach with MongoDB is very much the same.