27

I would like to use mongoDB's Aggregation Framework to run what in SQL would look a bit like:

SELECT SUM(A), B, C from myTable GROUP BY B, C;

The docs state:

You can specify a single field from the documents in the pipeline, a previously computed value, or an aggregate key made up from several incoming fields.

But it's unclear what 'an aggregate key made from several incoming fields' actually is?

My dataset is a bit like this:

[{ "timeStamp" : 1341834988666, "label" : "sharon", "responseCode" : "200", "value" : 10, "success" : "true"},
{ "timeStamp" : 1341834988676, "label" : "paul", "responseCode" : "200", "value" : 60, "success" : "true"},
{ "timeStamp" : 1341834988686, "label" : "paul", "responseCode" : "404", "value" : 15, "success" : "true"},
{ "timeStamp" : 1341834988696, "label" : "sharon", "responseCode" : "200", "value" : 35, "success" : "false"},
{ "timeStamp" : 1341834988166, "label" : "paul", "responseCode" : "200", "value" : 40, "success" : "true"},
{ "timeStamp" : 1341834988266, "label" : "paul", "responseCode" : "404", "value" : 99, "success" : "false"}]

My query looks like this:

resultsCollection.aggregate(
    { $match : { testid : testid} },
    { $skip : alreadyRead },
    { $project : {
            timeStamp : 1 ,
            label : 1,
            responseCode : 1 ,
            value : 1,
            success : 1
        }},
    { $group : {
            _id : "$label",
            max_timeStamp : { $timeStamp : 1 },
            count_responseCode : { $sum : 1 },
            avg_value : { $sum : "$value" },
            count_success : { $sum : 1 }
        }},
    { $group : {
            ?
        }}
);

My instinct was to try to pipe the results through to a second group, I know you can do this but it won't work because the first group already reduces the dataset too much and the required level of detail is lost.

What I want to do is group using label, responseCode and success and get the sum of value from the result. It should look a bit like:

label   | code | success | sum_of_values | count
sharon  | 200  |  true   |      10       |   1
sharon  | 200  |  false  |      35       |   1
paul    | 200  |  true   |      100      |   2
paul    | 404  |  true   |      15       |   1
paul    | 404  |  false  |      99       |   1

Where there are five groups:

1. { "timeStamp" : 1341834988666, "label" : "sharon", "responseCode" : "200", "value" : 10, "success" : "true"}

2. { "timeStamp" : 1341834988696, "label" : "sharon", "responseCode" : "200", "value" : 35, "success" : "false"}

3. { "timeStamp" : 1341834988676, "label" : "paul", "responseCode" : "200", "value" : 60, "success" : "true"}
   { "timeStamp" : 1341834988166, "label" : "paul", "responseCode" : "200", "value" : 40, "success" : "true"}

4. { "timeStamp" : 1341834988686, "label" : "paul", "responseCode" : "404", "value" : 15, "success" : "true"}

5. { "timeStamp" : 1341834988266, "label" : "paul", "responseCode" : "404", "value" : 99, "success" : "false"}
Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
Oliver Lloyd
  • 4,936
  • 7
  • 33
  • 55

1 Answers1

39

OK, so the solution is to specify an aggregate key for the _id value. This is documented here as:

You can specify a single field from the documents in the pipeline, a previously computed value, or an aggregate key made up from several incoming fields.

But it doesn't actually define the format for an aggregate key. Reading the earlier documentation here I saw that the previous collection.group method could take multiple fields and that the same structure is used in the new framework.

So, to group over multiple fields you could use _id : { success:'$success', responseCode:'$responseCode', label:'$label'}

As in:

resultsCollection.aggregate(
{ $match : { testid : testid} },
{ $skip : alreadyRead },
{ $project : {
        timeStamp : 1 ,
        label : 1,
        responseCode : 1 ,
        value : 1,
        success : 1
    }},
{ $group : {
        _id :  { success:'$success', responseCode:'$responseCode', label:'$label'},
        max_timeStamp : { $timeStamp : 1 },
        count_responseCode : { $sum : 1 },
        avg_value : { $sum : "$value" },
        count_success : { $sum : 1 }
    }}
);
Hertzel Guinness
  • 5,912
  • 3
  • 38
  • 43
Oliver Lloyd
  • 4,936
  • 7
  • 33
  • 55
  • 1
    what is the purpose of your $project? is the original document much larger? also skip doesn't make much sense without a sort before it. and if you need avg_value why not use $avg? if you want sum why not call it sum_value? Also count_success and count_responseCode is going to be the same, not sure why you need both. – Asya Kamsky Jul 11 '12 at 03:02
  • 9
    Note : As of 2.2.0-rc1 slight changes have been made to grouping by multiple fields : _id : { success:1, responseCode:1, label:1} would become _id : { success:"$success",responseCode:"$responseCode",label:"$label"}. https://groups.google.com/forum/?fromgroups#!topic/mongodb-user/1cYch580h0w – Richard Aug 23 '12 at 23:36
  • 1
    answer fixed according to @Richard. – Hertzel Guinness Mar 10 '13 at 20:04