Get last date in each month of a time series pandas

Question

Currently I'm generating a DateTimeIndex using a certain function, zipline.utils.tradingcalendar.get_trading_days. The time series is roughly daily but with some gaps.

My goal is to get the last date in the DateTimeIndex for each month.

.to_period('M') & .to_timestamp('M') don't work since they give the last day of the month rather than the last value of the variable in each month.

As an example, if this is my time series I would want to select '2015-05-29' while the last day of the month is '2015-05-31'.

['2015-05-18', '2015-05-19', '2015-05-20', '2015-05-21', '2015-05-22', '2015-05-26', '2015-05-27', '2015-05-28', '2015-05-29', '2015-06-01']

Sorry won't `df.groupby([df.index.year,df.index.month]).last()` give what you want? — EdChum, Jun 09 '15 at 22:34
No, that doesn't unfortunately. Take a look at my post for an example. — ikemblem, Jun 09 '15 at 22:42
I'm not sure why Ed's comment won't work. Your post first converts to a period, whereas Ed's comment just looks at the last trading day in any given month for any given year. — Alexander, Jun 10 '15 at 07:17

score 6 · Accepted Answer · answered Jun 10 '15 at 12:15

Condla's answer came closest to what I needed except that since my time index stretched for more than a year I needed to groupby by both month and year and then select the maximum date. Below is the code I ended up with.

# tempTradeDays is the initial DatetimeIndex
dateRange = []  
tempYear = None  
dictYears = tempTradeDays.groupby(tempTradeDays.year)
for yr in dictYears.keys():
    tempYear = pd.DatetimeIndex(dictYears[yr]).groupby(pd.DatetimeIndex(dictYears[yr]).month)
    for m in tempYear.keys():
        dateRange.append(max(tempYear[m]))
dateRange = pd.DatetimeIndex(dateRange).order()

Like it. But maybe there is a pandas inhouse solution for this nowdays... — Pat, Apr 19 '17 at 11:51

score 6 · Answer 2 · answered May 24 '19 at 20:56

Suppose your data frame looks like this

original dataframe

Then the following Code will give you the last day of each month.

df_monthly = df.reset_index().groupby([df.index.year,df.index.month],as_index=False).last().set_index('index')

transformed_dataframe

This one line code does its job :)

Condla · Answer 3 · 2015-06-10T07:17:36.940

My strategy would be to group by month and then select the "maximum" of each group:

If "dt" is your DatetimeIndex object:

last_dates_of_the_month = []
dt_month_group_dict = dt.groupby(dt.month)
for month in dt_month_group_dict:
    last_date = max(dt_month_group_dict[month])
    last_dates_of_the_month.append(last_date)

The list "last_date_of_the_month" contains all occuring last dates of each month in your dataset. You can use this list to create a DatetimeIndex in pandas again (or whatever you want to do with it).

Maxim · Answer 4 · 2018-02-21T18:46:51.923

This is an old question, but all existing answers here aren't perfect. This is the solution I came up with (assuming that date is a sorted index), which can be even written in one line, but I split it for readability:

month1 = pd.Series(apple.index.month)
month2 = pd.Series(apple.index.month).shift(-1)
mask = (month1 != month2)
apple[mask.values].head(10)

Few notes here:

Shifting a datetime series requires another pd.Series instance (see here)
Boolean mask indexing requires .values (see here)

By the way, when the dates are the business days, it'd be easier to use resampling: apple.resample('BM')

score 2 · Answer 5 · answered Aug 21 '15 at 08:04

2

Maybe the answer is not needed anymore, but while searching for an answer to the same question I found maybe a simpler solution:

import pandas as pd 

sample_dates = pd.date_range(start='2010-01-01', periods=100, freq='B')
month_end_dates = sample_dates[sample_dates.is_month_end]

answered Aug 21 '15 at 08:04

MMCM_

617
5
18

1

OP clearly states "if this is my time series I would want to select '2015-05-29' while the last day of the month is '2015-05-31'." Your solution would miss 2015-05-29 since it is not a month-end. – C S Apr 06 '17 at 21:23

score 0 · Answer 6 · answered Aug 06 '20 at 15:21

0

Try this, to create a new diff column where the value 1 points to the change from one month to the next.

     df['diff'] = np.where(df['Date'].dt.month.diff() != 0,1,0)

answered Aug 06 '20 at 15:21

Guido Guerra

1
1

Get last date in each month of a time series pandas

6 Answers6