I am trying to group multiple times to create a NxM matrix of actions over a very large data set.
I have people who can perform actions (set size 10) in locations (set size 1000 per operator, 5 million possible) and I want to produce a report that gives me:
for each operator
for each action
a total count of this type of action performed by this operator
the top N locations where this action was performed by this operator
My input data looks like this:
{ time: 1, operator: 'John', action: 'up', location: 'a' },
{ time: 2, operator: 'Jane', action: 'down', location: 'b' },
{ time: 3, operator: 'John', action: 'down', location: 'a' },
{ time: 4, operator: 'Sean', action: 'charm', location: 'c' },
{ time: 5, operator: 'John', action: 'up', location: 'a' },
{ time: 6, operator: 'Jane', action: 'down', location: 'c' },
...
So for the first stage of the group, I do:
$group: {
_id: {
operator: '$operator',
action: '$action',
location: '$location',
},
count: {$sum: 1}
}
to create:
{ operator: 'John': action: 'up', location: 'a', count: 2},
{ operator: 'John': action: 'down', location: 'a', count: 1},
{ operator: 'Jane': action: 'down', location: 'b', count: 1},
{ operator: 'Jane': action: 'down', location: 'c', count: 1},
{ operator: 'Sean': action: 'charm', location: 'c', count: 1}
Now I want to count every action performed by the operator (could be thousands) but only retain the top 5 locations each operator performed each operation... I want my final output to have records that look something like:
{ operator: 'John',
total_actions: 10576,
actions: {
up: { count: 2052, most: [{a: 92}, {b: 91}, {c: 82}, {qqz: 60}, {d: 54}]},
down: { count: 8482, most: [{loc: count}, {loc: count}...]}
strange: { count: 39, most: [{loc: count}...]}
charm: {count: 3, most: ...}
}
},
{ operator: 'Jane',
total_actions: 38223,
actions: {...}
}
I'm not really fixated on "most" being an array, and it certainly doesn't need to be sorted.
I keep getting stuck on pushing...
I originally wrote the second stage to group on operator/action and pushed # of actions at location onto the array, but there is no way to sort/limit an array.
{ $group: {
_id: {
operator: '$_id.operator',
action: '$action'
},
action_count: {$sum: '$count'},
locations: {
$push: {
location: '$_id.location',
count: '$count'
}
}
}
Which further reduces the set down to:
{ _id: { operator: 'John', action: 'up' }, action_count: 2, locations: [{location: 'a', count: 2}] },
{ _id: { operator: 'John', action: 'down' }, action_count: 1, locations: [{location: 'a', count: 1}] },
{ _id: { operator: 'Jane', action: 'down' }, action_count: 2, locations: [{location: 'b', count: 1}, {location: 'c', count: 1}] },
My mongo-sense said that was wrong because locations
is an array that could have potentially 1000s of entries in it per operator/action. Additionally, I have no operations for sorting and limiting this mess, but if I $unwind
it at this stage, it seems like I've just reversed stage 2.
Question 1: What's the right way to proceed from here?
Thoughts:
So instead my next stage, non-intuitively, groups on locations because there could be a ton of locations and relatively few actions, and if I'm grouping on locations, I might be able to still achieve a total-action count before sorting/limiting the locations? I just don't know how to proceed onto the third stage...?
{ $group: {
_id: {
operator: '$_id.operator',
location: '$_id.location'
},
actions: {
$push: {
action: '$_id.action',
count: '$count'
}
}
}
Gives me:
{ _id: { operator: 'John', location: 'a' }, actions: [{action: 'up', count: 2}, {action: 'down', count: 1}] },
{ _id: { operator: 'Jane', location: 'b' }, actions: [{action: 'down', count: 1}]
{ _id: { operator: 'Jane', location: 'c' }, actions: [{action: 'down', count: 1}]
...
I'm not sure I should be using $push
at all. Is this a case for $addToSet? My brain is melting.
What is the mongo-sane way to do NxM group reports?
Specifically, I want to count all of the actions performed by the user but only report on the top N locations.