1

I have time series data with user id :

df
timestamp           id    var
2020-10-10 12:00    4     10
2020-10-10 12:52    8     20
2020-10-10 15:32    2     30
2020-10-11 10:21    5     15
2020-10-11 11:32    8     50
2020-10-11 16:32    2     5
...

I want to calculate the standard deviation of column 'var' between each user id, for each day. Then write it in a new column as such:

df
timestamp           id    var   std
2020-10-10 12:00    4     10    8.2
2020-10-10 12:52    8     20    8.2 
2020-10-10 15:32    2     30    8.2 
2020-10-11 10:21    5     15    19.3
2020-10-11 11:32    8     50    19.3
2020-10-11 16:32    2     5     19.3
...

How can I achieve this?

prof32
  • 157
  • 6
  • 1
    `df.groupby(df['timestamp'].dt.normalize())['var'].transform('std')`, your `timestamp` needs to be of datetime type. – Quang Hoang Jan 31 '23 at 17:14
  • 1
    `df.groupby([pd.to_datetime(df['timestamp']).dt.normalize(), 'id'])['var'].transform('std')` then you don't need to worry about it ;) – mozway Jan 31 '23 at 17:16
  • 1
    `df.groupby(['id', pd.Grouper(key='timestamp', freq='D'])['var'].transform('std')` then you can easily try out other frequencies :) – Harry Haller Jan 31 '23 at 17:17

0 Answers0