This method computes a leading window count, where the window is a number of seconds into the future from each event. The count includes the data point, so is always at least 1. If three events occur in less than lead_window_s
seconds, then the count is 3.
from datetime import timedelta
def lead_count(s, lead_in_s):
ssort = s.sort_index()
lead = ssort.index + timedelta(seconds=lead_in_s)
inds = np.searchsorted(ssort.index.astype(np.int64), lead.astype(np.int64)) - 1
cs = ssort.cumsum()
return pd.Series(cs[inds].values - cs.values + ssort[inds].values, index=ssort.index)
Your example code randomly generates events (out of chronological order). I'm augmenting the timestamps with a count of 1, so we can sum up the number of events that occur within the window of time, then adding an index on the timestamps.
>>> s = pd.Series([1]*len(timestamps), index=timestamps)
>>> s
2015-01-01 00:00:26 1
2015-01-01 00:05:15 1
2015-01-01 00:13:57 1
2015-01-01 00:10:45 1
2015-01-01 00:05:46 1
2015-01-01 00:00:01 1
2015-01-01 00:15:00 1
2015-01-01 00:13:12 1
2015-01-01 00:16:23 1
2015-01-01 00:13:18 1
2015-01-01 00:07:56 1
2015-01-01 00:00:47 1
2015-01-01 00:04:23 1
2015-01-01 00:02:58 1
2015-01-01 00:03:24 1
2015-01-01 00:11:34 1
dtype: int64
Then, if you call lead_count
with a 30-second window
>>> lead_30s = lead_count(s, 30)
>>> df = pd.DataFrame({'s': s, 's_lead30s': lead_30s})
>>> print df.sort_index()
s s_lead30s
2015-01-01 00:00:01 1 2
2015-01-01 00:00:26 1 2
2015-01-01 00:00:47 1 1
2015-01-01 00:02:58 1 2
2015-01-01 00:03:24 1 1
2015-01-01 00:04:23 1 1
2015-01-01 00:05:15 1 1
2015-01-01 00:05:46 1 1
2015-01-01 00:07:56 1 1
2015-01-01 00:10:45 1 1
2015-01-01 00:11:34 1 1
2015-01-01 00:13:12 1 2
2015-01-01 00:13:18 1 1
2015-01-01 00:13:57 1 1
2015-01-01 00:15:00 1 1
2015-01-01 00:16:23 1 1
This is modified from this answer, which uses the same binary search method for inserted values to find a rolling cumulative sum, but it looks into the past (lag) rather than the future (lead).