0

My collection is structured like this:

{
    "_id": 1,
    "Trips": [
        {
            "EndID": 5,
            "Tripcount": 12
        },
        {
            "EndID": 6,
            "Tripcount": 19
        }
     ],
     "_id": 2,
     "Trips": [
        {
            "EndID": 4,
            "Tripcount": 12
        },
        {
            "EndID": 5,
            "Tripcount": 19
        }
     ], ...
}

As it can be seen, every document has a Trips array. Now what I want to find, is the top N Tripcounts of all the Trips arrays combined across the documents in the collection. Is that possible?

I already have the following, however this only takes the single greatest Tripcount from each Trips array and then outputs 50 of them. So actually having the top 2 trips in one Trips array results in this query dropping the second one:

var group = db.eplat1.aggregate([
  {   "$unwind": "$Trips"},
  {   "$sort": {
          "Trips.Tripcount": -1
  }
  },
  {   "$limit": 50 },
  {   "$group": {
        "_id": 1,
        "Trips": {
          "$push": {
            "Start": "$_id",
            "Trips": "$Trips"
          }
        }
  }}
  ], {allowDiskUse: true})

Note that I believe this problem is different to this one, as there only one document is given.

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
ffritz
  • 2,180
  • 1
  • 28
  • 64
  • So what is the question? You `$unwind` the arrays, so you have all entries. You `$sort` them so the counts are on top. Then you `$limit`, so that's 50 top entries. What more do you think you need? – Neil Lunn Jun 02 '17 at 11:25
  • @NeilLunn As described, having e.g. 2 of the top trips in one of the Trips arrays results in loosing this second top trip, as only the single top trip per `Trips` array gets output here. – ffritz Jun 02 '17 at 11:26
  • So you want the "top 2" trips from "each array" in "every document"? Really your question is not as clear as you think it is so you really should answer. – Neil Lunn Jun 02 '17 at 11:28
  • No, I want the top N trips after you would combine all of the `Trips` arrays. My problem: Assume there are 2 documents and so 2 Trips arrays, first one contains 2 trips (count 5 and 6) and second one contains one trip with count of 4. This query above outputs 6 and 4 with limit of 2, not 6 and 5 as it should. – ffritz Jun 02 '17 at 11:31
  • @NeilLunn Nvm, found it. Should be `"_id": "$_id"` and `"Start": "$_id"` should be `"End": "$EndID"` in the group stage. Yes I have 3.4.4. – ffritz Jun 02 '17 at 11:35

1 Answers1

2

Basically you need to sort the array elements ($unwind/$sort/$group) and then you can do your $sort for the top values and $limit the results.

Finally you $slice for the "top N" in the documents in the array.

db.eplat1.aggregate([
  { "$unwind": "$Trips" },
  { "$sort": { "_id": 1, "Tips.TripCount": -1 } },
  { "$group": {
    "_id": "$_id",
    "Trips": { "$push": "$Trips" },
    "maxTrip": { "$max": "$Trips.TripCount" }
  }},
  { "$sort": { "maxTrip": -1 } },
  { "$limit": 50 },
  { "$addFields": { "Trips": { "$slice": [ "$Trips", 0 , 2 ] } } }
])
Neil Lunn
  • 148,042
  • 36
  • 346
  • 317