0

This question may be seen as an extension to this one.

I have two 1D tensors, counts and idx. Counts is length 20 and stores the occurrences of events that fall into 1 of 20 bins. idx is very long, and each entry is an integer which corresponds to the occurrence of 1 of the 20 events, and each event can occur multiple times. I'd like a vectorized or very fast way to add the number of times event i occurred in idx to the i'th bucket in counts. Furthermore, it would be ideal if the solution was compatible with operation on batches of count's and idx's during a training loop.

My first thought was to simply use this strategy of indexing counts with idx:

counts = torch.zeros(5)

idx = torch.tensor([1,1,1,2,3])

counts[idx] += 1

But it did not work, with counts ending at

tensor([0., 1., 1., 1., 0.])

instead of the desired

tensor([0., 3., 1., 1., 0.])

What's the fastest way I can do this? My next best guess is

for i in range(20):
   counts[i] += idx[idx == i].sum()

1 Answers1

0

Please consider the following proposal implemented with the bincount function which counts the frequency of each value in tensor of non-negative ints (The only constraint).

import torch

EVENT_TYPES = 20
counts = torch.zeros(EVENT_TYPES)
events = torch.tensor([1, 1, 1, 2, 3, 9])
batch_counts = torch.bincount(events, minlength=EVENT_TYPES)

print(counts + batch_counts)

Result:

tensor([0., 3., 1., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.,0., 0.])

You can evaluate that for every batch being only in torch tensor environment. You control the number of event types using the minlength argument in the bincount function. In this case 20 as you described in the problem.