Background
I have two dataframes. The first one, df_player_snapshot_master
, has snapshot data for individual poker players at different times in a tournament and looks like this:
snapshotDateTime playerID deactivationRecord
0 2021-05-06 09:28:24.987995 1679613 False
1 2021-05-06 09:28:24.987995 2660567 False
2 2021-05-06 09:28:24.987995 2668394 False
3 2021-05-06 09:28:24.987995 2280604 False
4 2021-05-06 09:28:24.987995 2018271 False
The second dataframe, df_tournament_summary_master
, summarizes this data at a tournament level by snapshot and looks like this:
intervalStartDateTime intervalEndDateTime playersDeactivatedThisInterval
0 2021-05-06 09:28:24.987995 2021-05-06 09:28:38.605930 NaN
1 2021-05-06 09:28:38.605930 2021-05-06 09:28:47.860595 NaN
2 2021-05-06 09:28:47.860595 2021-05-06 09:28:57.187734 NaN
3 2021-05-06 09:28:57.187734 2021-05-06 09:29:07.187734 NaN
My goal is to populate df_tournament_summary_master['playersDeactivatedThisInterval']
by counting the number of records in df_player_snapshot_master
where df_player_snapshot_master['snapshotDateTime']
equals df_tournament_summary_master['intervalStartDateTime']
and df_player_snapshot_master['deactivationRecord']
is True
. So essentially a COUNTIFS exercise between two dataframes with multiple conditions.
What I've Tried
I translated the solution in this post to my situation which resulted in this code:
tournament_info = df_tournament_summary_master[['intervalStartDateTime']].values
deacs = []
for iSDT in tournament_info:
deacs.append(len(df_player_snapshot_master[(df_player_snapshot_master['snapshotDateTime']==iSDT) &
(df_player_snapshot_master['deactivationRecord']==True)])
)
df_tournament_summary_master['playersDeactivatedThisInterval'] = deacs
but it keeps throwing an error:
ValueError: ('Lengths must match to compare', (332,), (1,))
The post I modeled my code after does mention the need to convert to datetimes and that seems like it might be the issue, but when I check my types for both dataframes/series they all seem to be datetimes already:
In:
display(tournament_info)
Out:
array([['2021-05-06T09:28:24.987995000'],
['2021-05-06T09:28:38.605930000'],
['2021-05-06T09:28:47.860595000'],
['2021-05-06T09:28:57.187734000']], dtype='datetime64[ns]')
In:
df_player_snapshot_master.dtypes
Out:
snapshotDateTime datetime64[ns]
playerID object
deactivationRecord bool
dtype: object
Of course, I've looked at other posts that deal with filtering dataframes by comparing dates and they all seem to use different solutions than the for
loop I'd like to use, so I'm not sure how to re-purpose and adapt their advice.
Questions
- What is causing the ValueError I'm getting?
- How do I populate
df_tournament_summary_master['playersDeactivatedThisInterval']
with the conditional count using a date comparison condition?