2

I have a crime dataset where every row is one recorded offence that is to be used in an ARIMA time series model.

    Date

0   2015-09-05
1   2015-09-05
2   2015-07-08
3   2017-09-05
4   2018-09-05
4   2018-09-05

I would like to group by data, so that offences that occurred on the same day are aggregated.

    Date          Count

0   2015-09-05    2
1   2015-07-08    1
2   2017-09-05    1
3   2018-09-05    2

I'm struggling because I'm trying to both group by weeks per year, and because I'm not aggregating the contents of a column, I'm trying to count how many rows are grouped into it.

Thank you.

AMGU
  • 23
  • 3
  • Does this answer your question? [Python Pandas Group by date using datetime data](https://stackoverflow.com/questions/39400115/python-pandas-group-by-date-using-datetime-data) – Paul Brennan Mar 23 '22 at 18:55

2 Answers2

1

If your dataset is a dataframe, you can use:

df.assign(Count=1).groupby('Date')['Count'].count()

If it's a series:

series.to_frame().assign(Count=1).groupby('Date')['Count'].count()

For example:

df = pd.DataFrame({'Date':['2015-09-05',
                           '2015-09-05',
                           '2015-07-08',
                           '2017-09-05',
                           '2018-09-05',
                           '2018-09-05']})
df.assign(Count=1).groupby('Date')['Count'].count().reset_index()

Returns:

         Date  Count
0  2015-07-08      1
1  2015-09-05      2
2  2017-09-05      1
3  2018-09-05      2
Celius Stingher
  • 17,835
  • 6
  • 23
  • 53
  • This is great. Would there be any way to name the new count column with a 'Count' header? – AMGU Mar 23 '22 at 19:54
  • Sure, add `.reset_index()` at the end! Or check my edit. If it helps you solve the issue, accepting the answer will mark it as solved and if you feel like it, upvoting it will also reward me with some score rep! – Celius Stingher Mar 23 '22 at 19:58
1

One way to do it is to use Python rather than pandas for the heavy lifting:

import pandas as pd
import datetime
df = pd.DataFrame([datetime.datetime.strptime(x, "%Y-%m-%d").date() for x in ['2015-09-05', '2015-09-05', '2015-07-08', '2017-09-05', '2018-09-05', '2018-09-05']], columns=['Date'])

from collections import Counter
c = Counter(list(df['Date']))
df2 = pd.DataFrame(zip(list(c.keys()), list(c.values())), columns=['Date', 'Count'])
print(df2)

Output:

         Date  Count
0  2015-09-05      2
1  2015-07-08      1
2  2017-09-05      1
3  2018-09-05      2
constantstranger
  • 9,176
  • 2
  • 5
  • 19