What's the best way to sort and limit within subgroups of a collection in MongoDB?

Question

I have a collection whose data resembles the shape below:

[
  {
    "_id": "67tghy",
    "organization": "Organization A",
    "marketValue": 500000,
    "month": 8,
    "year": 2018,
    "project": "Project Blue"
  },
  {
    "_id": "h67thgk",
    "organization": "Organization B",
    "marketValue": 900000,
    "month": 1,
    "year": 2018,
    "project": "Project Red"
  }
]

There's >10K documents in this collection. I need to get the latest subset of data within each project group.

In other words, group the data by project, get the most recent month/year data for each project, and then merge all the data back together (for the application I'm working with, it has to result in the same shape it began with).

The aggregation pipeline below is the best I've come up with to accomplish the aforementioned goal.

[
  {
    $group: {
      _id: {
        project: '$project',
        year: '$year',
        month: '$month'
      },
      data: { $push: '$$ROOT' }
    }
  },
  {
    $sort: {
      '_id.year': -1,
      '_id.month': -1
    }
  },
  {
    $group: {
      _id: {
        project: '$_id.project'
      },
      data: { $push: '$data'}
    }
  },
  {
    $project: {
      latestData: { $slice: ['$data', 1] }
    }
  },
  {
    $unwind: '$latestData'
  },
  {
    $unwind: '$latestData'
  },
  {
    $replaceRoot: {
      newRoot: '$latestData'
    }
  }
]

I wish there was a way to sort and limit data at different levels of nesting the way one can do it with d3.nest, but haven't seen anything like that in the MongoDB docs.

What would be a more optimal way to write the aggregation pipeline?

As for the "best way" it really depends on the version of MongoDB you have. Certainly should *"should not"* be presently running anything less than 3.4 to be within official support, and from 3.6 onward `$lookup` can be used with a "sub-pipeline" which can avoid breaching the BSON 16MB limit as the present code you have runs the risk of doing. Oddly titled duplicate actually demonstrates these approaches of "Top N results per grouping". — Neil Lunn, Mar 16 '19 at 02:53

What's the best way to sort and limit within subgroups of a collection in MongoDB?

0 Answers0