You have clear syntax errors in your $group
statement with $last
as that is not a valid usage, but I suspect this has something to do with what you are "trying" to do rather than what you are using to get your actual result.
Getting a result with the "best n
values" is a bit of a problem for the aggregation framework. There is this recent answer from myself with a longer explaination of the basic case, but it all boils down to the aggregation framework lacks the basic tools to do this "limitted" grouping per grouping key that you want.
Doing it badly
The horrible way to approach this is very "iterative" per the number of results you want to return. It basically means pushing everything into an array and then using operators like $first
( after sorting in reverse ) to return the result off the stack and subsequently "filter" that result ( think an array pop or shift operation ) out of the results and then do it again to get the next one.
Basically this with a 2
iteration example:
UserActivity.aggregate(
[
{ "$match": {
"user_id": { "$in": user_id },
"tracker_id": { "$in": getTrackerId },
"date": {
"$gte": startDate,
"$lt": endDate
}
}},
{ "$unwind": "$duration" },
{ "$group": {
"_id": {
"tracker_id": "$tracker_id",
"date": {
"$add": [
{ "$subtract": [
{ "$subtract": [ "$date", new Date(0) ] },
{ "$mod": [
{ "$subtract": [ "$date", new Date(0) ] },
1000 * 60 * 60 * 24
]}
]},
new Date(0)
]
},
"val": { "$sum": "$duration" }
}
}},
{ "$sort": { "_id": 1, "val": -1 } },
{ "$group": {
"_id": "$_id.date",
"resultData": {
"$push": {
"tracker_id": "$_id.tracker_id",
"val": "$val"
}
}
}},
{ "$unwind": "$resultData " },
{ "$group": {
"_id": "$_id",
"last": { "$first": "$resultData" },
"resultData": { "$push": "$resultData" }
}},
{ "$unwind": "$resultData" },
{ "$redact": {
"if": { "$eq": [ "$resultData", "$last" ] },
"then": "$$PRUNE",
"else": "$$KEEP"
}},
{ "$group": {
"_id": "$_id",
"last": { "$first": "$last" },
"secondLast": { "$first": "$resultData" }
}},
{ "$project": {
"resultData": {
"$map": {
"input": [0,1],
"as": "index",
"in": {
"$cond": {
"if": { "$eq": [ "$$index", 0 ] },
"$last",
}
}
}
}
}}
],
function (err,tAData) {
console.log(JSON.stringify(tAData,undefined,2))
}
);
Also simplifying your date inputs to startDate
and endDate
as pre determined date object values before the pipeline code. But the principles here show this is not a performant or very scalable approach, and mostly due to needing to put all results into an array and then deal with that to just get the values.
Doing it better
A much better approach is to send an aggregation query to the server for each date in the range, as date is what you want as the eventual key. Since you only return each "key" at once, it is easy to just apply $limit
to restrict the response.
The ideal case is to perform these queries in parallel and then combine them. Fortunately the node async
library provides an async.map
or specifically async.mapLimit
which performs this function exactly:
N.B You don't want async.mapSeries
for the best performance since queries are "serially executed in order" and that means only one operation occurs on the server at a time. The results are array ordered, but it's going to take longer. A client sort makes more sense here.
var dates = [],
returnLimit = 2,
OneDay = 1000 * 60 * 60 * 24;
// Produce an array for each date in the range
for (
var myDate = startDate;
myDate < endDate;
myDate = new Date( startDate.valueOf() + OneDay )
) {
dates.push(myDate);
}
async.mapLimit(
dates,
10,
function (date,callback) {
UserActivity.aggregate(
[
{ "$match": {
"user_id": { "$in": user_id },
"tracker_id": { "$in": getTrackerId },
"date": {
"$gte": date,
"$lt": new Date( date.valueOf() + OneDay )
}
}},
{ "$unwind": "$duration" },
{ "$group": {
"_id": {
"tracker_id": "$tracker_id",
"date": {
"$add": [
{ "$subtract": [
{ "$subtract": [ "$date", new Date(0) ] },
{ "$mod": [
{ "$subtract": [ "$date", new Date(0) ] },
OneDay
]}
]},
new Date(0)
]
},
"val": { "$sum": "$duration" }
}
}},
{ "$sort": { "_id": 1, "val": -1 } },
{ "$limit": returnLimit },
{ "$group": {
"_id": "$_id.date",
"resultData": {
"$push": {
"tracker_id": "$_id.tracker_id",
"val": "$val"
}
}
}}
],
function (err,result) {
callback(err,result[0]);
}
);
},
function (err,results) {
if (err) throw err;
results.sort(function (a,b) {
return a._id > b._id;
});
console.log( JSON.stringify( results, undefined, 2 ) );
}
);
Now that is a much cleaner listing and a lot more efficient and scalable than the first approach. By issuing each aggregation per single date and then combining the results, the "limit" there allows up to 10 queries to execute on the server at the same time ( tune to your needs ) and ultimately return a singular response.
Since these are "async" and not performed in series ( the best performance option ) then you just need to sort the returned array as is done in the final block:
results.sort(function (a,b) {
return a._id > b._id;
});
And then everything is ordered as you would expect in the response.
Forcing the aggregation pipeline to do this where it really is not necessary is a sure path to code that will fail in the future if it does not already do so now. Parallel query operations and combining the results "just makes sense" for efficient and scalable output.
Also note that you should not use $lte
with range selections on dates. Even if you though about it, the better approach is "startDate" with "endDate" being the next "whole day" ( start ) after the range you want. This makes a cleaner distinction on the selection that say "the last second of the day".