How to get last two values from array with aggregation group by date in mongodb?

Question

UserActivity.aggregate([
        {
            $match: {
                user_id: {$in: user_id},
                "tracker_id": {$in:getTrackerId},
                date: { $gte: req.app.locals._mongo_date(req.params[3]),$lte: req.app.locals._mongo_date(req.params[4]) }
            }
        },
        { $sort: {date: 1 } },
        { $unwind: "$duration" },
        {
            $group: {
                _id: { 
                    tracker: '$tracker_id',
                    $last:"$duration",
                    year:{$year: '$date'}, 
                    month: {$month: '$date'},
                    day: {$dayOfMonth: '$date'}
                },
                resultData: {$sum: "$duration"}
            }
        },
        {
            $group: {
                _id: {
                    year: "$_id.year",
                    $last:"$duration",
                    month:"$_id.month", 
                    day: "$_id.day"
                },
                resultData: {
                    $addToSet: {
                        tracker: "$_id.tracker",
                        val: "$resultData"
                    }
                }
            }
        }
    ], function (err,tAData) {
        tAData.forEach(function(key){
           console.log(key);
        });
});

I got output from this collection

{ _id: { year: 2015, month: 11, day: 1 },
resultData:[ { tracker: 55d2e6b043d77c0877105397, val: 60 },
  { tracker: 55d2e6b043d77c0877105397, val: 75 },
  { tracker: 55d2e6b043d77c0877105397, val: 25 },
  { tracker: 55d2e6b043d77c0877105397, val: 21 } ] }
{ _id: { year: 2015, month: 11, day: 2 },
resultData:[ { tracker: 55d2e6b043d77c0877105397, val: 100 },
  { tracker: 55d2e6b043d77c0877105397, val: 110 },
  { tracker: 55d2e6b043d77c0877105397, val: 40 },
  { tracker: 55d2e6b043d77c0877105397, val: 45 } ] }

But I need this output from this collection, I want to fetch two last record from each collection:

{ _id: { year: 2015, month: 11, day: 1 },
   resultData:[ { tracker: 55d2e6b043d77c0877105397, val: 25 },
      { tracker: 55d2e6b043d77c0877105397, val: 21 } ] }
 { _id: { year: 2015, month: 11, day: 2 },
   resultData:[ { tracker: 55d2e6b043d77c0877105397, val: 40 },
      { tracker: 55d2e6b043d77c0877105397, val: 45 } ] }

Apart from this is not how you should format code for submissions to StackOverflow, there are clear syntax errors in your aggregation pipeline ( notably in `$group` stages. Perhaps read [editing-help](http://stackoverflow.com/editing-help) and review what you have posted since it is just going to throw errors in it's current state. — Blakes Seven, Nov 02 '15 at 11:00
You can have limit and skip in aggregation. This may help you http://stackoverflow.com/questions/17053323/how-to-order-mongodb-aggregation-with-match-sort-and-limit — Mariya James, Nov 02 '15 at 11:21
i have fetch out last seven days data from db according to group, it is return me whole day data, but i want to get last record from collection for each days like that -> { _id: { year: 2015, month: 11, day: 1 }, resultData:[ { tracker: 55d2e6b043d77c0877105397, val: 25 }, { tracker: 55d2e6b043d77c0877105397, val: 21 } ] } { _id: { year: 2015, month: 11, day: 2 }, resultData:[ { tracker: 55d2e6b043d77c0877105397, val: 40 }, { tracker: 55d2e6b043d77c0877105397, val: 45 } ] } — Mukesh Kumar Bijarniya, Nov 02 '15 at 11:54

score 3 · Accepted Answer · edited May 23 '17 at 10:28

You have clear syntax errors in your $group statement with $last as that is not a valid usage, but I suspect this has something to do with what you are "trying" to do rather than what you are using to get your actual result.

Getting a result with the "best n values" is a bit of a problem for the aggregation framework. There is this recent answer from myself with a longer explaination of the basic case, but it all boils down to the aggregation framework lacks the basic tools to do this "limitted" grouping per grouping key that you want.

Doing it badly

The horrible way to approach this is very "iterative" per the number of results you want to return. It basically means pushing everything into an array and then using operators like $first ( after sorting in reverse ) to return the result off the stack and subsequently "filter" that result ( think an array pop or shift operation ) out of the results and then do it again to get the next one.

Basically this with a 2 iteration example:

UserActivity.aggregate(
    [
        { "$match": {
            "user_id": { "$in": user_id },
            "tracker_id": { "$in": getTrackerId },
            "date": {
                "$gte": startDate,
                "$lt": endDate
            }
        }},
        { "$unwind": "$duration" },
        { "$group": {
            "_id": {
                "tracker_id": "$tracker_id",
                "date": {
                    "$add": [
                        { "$subtract": [
                            { "$subtract": [ "$date", new Date(0) ] },
                            { "$mod": [
                                { "$subtract": [ "$date", new Date(0) ] },
                                1000 * 60 * 60 * 24
                            ]}
                        ]},
                        new Date(0)
                    ]
                },
                "val": { "$sum": "$duration" }
            }
        }},
        { "$sort": { "_id": 1, "val": -1 } },
        { "$group": {
            "_id": "$_id.date",
            "resultData": {
                "$push": {
                    "tracker_id": "$_id.tracker_id",
                    "val": "$val"
                }
            }
        }},
        { "$unwind": "$resultData " },
        { "$group": {
            "_id": "$_id",
            "last": { "$first": "$resultData" },
            "resultData": { "$push": "$resultData" }
        }},
        { "$unwind": "$resultData" },
        { "$redact": {
            "if": { "$eq": [ "$resultData", "$last" ] },
            "then": "$$PRUNE",
            "else": "$$KEEP"
        }},
        { "$group": {
            "_id": "$_id",
            "last": { "$first": "$last" },
            "secondLast": { "$first": "$resultData" }
        }},
        { "$project": {
            "resultData": {
                "$map": {
                    "input": [0,1],
                    "as": "index",
                    "in": {
                        "$cond": {
                            "if": { "$eq": [ "$$index", 0 ] },
                            "$last",
                        }
                    }
                }
            }
        }}
    ],
    function (err,tAData) {
        console.log(JSON.stringify(tAData,undefined,2))
    }
);

Also simplifying your date inputs to startDate and endDate as pre determined date object values before the pipeline code. But the principles here show this is not a performant or very scalable approach, and mostly due to needing to put all results into an array and then deal with that to just get the values.

Doing it better

A much better approach is to send an aggregation query to the server for each date in the range, as date is what you want as the eventual key. Since you only return each "key" at once, it is easy to just apply $limit to restrict the response.

The ideal case is to perform these queries in parallel and then combine them. Fortunately the node async library provides an async.map or specifically async.mapLimit which performs this function exactly:

N.B You don't want async.mapSeries for the best performance since queries are "serially executed in order" and that means only one operation occurs on the server at a time. The results are array ordered, but it's going to take longer. A client sort makes more sense here.

var dates = [],
    returnLimit = 2,
    OneDay = 1000 * 60 * 60 * 24;

// Produce an array for each date in the range
for ( 
    var myDate = startDate; 
    myDate < endDate;
    myDate = new Date( startDate.valueOf() + OneDay ) 
) {
    dates.push(myDate);
}

async.mapLimit(
    dates,
    10,
    function (date,callback) {
        UserActivity.aggregate(
            [
                { "$match": {
                    "user_id": { "$in": user_id },
                    "tracker_id": { "$in": getTrackerId },
                    "date": {
                        "$gte": date,
                        "$lt": new Date( date.valueOf() + OneDay )
                    }
                }},
                { "$unwind": "$duration" },
                { "$group": {
                    "_id": {
                        "tracker_id": "$tracker_id",
                        "date": {
                            "$add": [
                                { "$subtract": [
                                    { "$subtract": [ "$date", new Date(0) ] },
                                    { "$mod": [
                                        { "$subtract": [ "$date", new Date(0) ] },
                                        OneDay
                                    ]}
                                ]},
                                new Date(0)
                            ]
                        },
                        "val": { "$sum": "$duration" }
                    }
                }},
                { "$sort": { "_id": 1, "val": -1 } },
                { "$limit": returnLimit },
                { "$group": {
                    "_id": "$_id.date",
                    "resultData": {
                        "$push": {
                            "tracker_id": "$_id.tracker_id",
                            "val": "$val"
                        }
                    }
                }}
            ],
            function (err,result) {
                callback(err,result[0]);
            }
        );
    },
    function (err,results) {
        if (err) throw err;
        results.sort(function (a,b) {
            return a._id > b._id;
        });
        console.log( JSON.stringify( results, undefined, 2 ) );
    }
 );

Now that is a much cleaner listing and a lot more efficient and scalable than the first approach. By issuing each aggregation per single date and then combining the results, the "limit" there allows up to 10 queries to execute on the server at the same time ( tune to your needs ) and ultimately return a singular response.

Since these are "async" and not performed in series ( the best performance option ) then you just need to sort the returned array as is done in the final block:

        results.sort(function (a,b) {
            return a._id > b._id;
        });

And then everything is ordered as you would expect in the response.

Forcing the aggregation pipeline to do this where it really is not necessary is a sure path to code that will fail in the future if it does not already do so now. Parallel query operations and combining the results "just makes sense" for efficient and scalable output.

Also note that you should not use $lte with range selections on dates. Even if you though about it, the better approach is "startDate" with "endDate" being the next "whole day" ( start ) after the range you want. This makes a cleaner distinction on the selection that say "the last second of the day".

How to get last two values from array with aggregation group by date in mongodb?

1 Answers1

Doing it badly

Doing it better