I have a collection whose data resembles the shape below:
[
{
"_id": "67tghy",
"organization": "Organization A",
"marketValue": 500000,
"month": 8,
"year": 2018,
"project": "Project Blue"
},
{
"_id": "h67thgk",
"organization": "Organization B",
"marketValue": 900000,
"month": 1,
"year": 2018,
"project": "Project Red"
}
]
There's >10K documents in this collection. I need to get the latest subset of data within each project
group.
In other words, group the data by project
, get the most recent month/year data for each project
, and then merge all the data back together (for the application I'm working with, it has to result in the same shape it began with).
The aggregation pipeline below is the best I've come up with to accomplish the aforementioned goal.
[
{
$group: {
_id: {
project: '$project',
year: '$year',
month: '$month'
},
data: { $push: '$$ROOT' }
}
},
{
$sort: {
'_id.year': -1,
'_id.month': -1
}
},
{
$group: {
_id: {
project: '$_id.project'
},
data: { $push: '$data'}
}
},
{
$project: {
latestData: { $slice: ['$data', 1] }
}
},
{
$unwind: '$latestData'
},
{
$unwind: '$latestData'
},
{
$replaceRoot: {
newRoot: '$latestData'
}
}
]
I wish there was a way to sort and limit data at different levels of nesting the way one can do it with d3.nest, but haven't seen anything like that in the MongoDB docs.
What would be a more optimal way to write the aggregation pipeline?