1

I have financial time series for several countries and for each of those time series I want to drop observations that fall on public holidays of that particular country. To do that I'm creating a new column in my time series with boolean values to indicate if the date is a holiday or not.

So I found this code to assign boolean values and it works great for my US time series: Pandas: Checking if a date is a holiday and assigning boolean value

But I'm not able to get it to work for other countries. I tried using workalendar such as the code below but get error messages. I would appreciate any advice on using workalendar or other methods.

from datetime import date
from workalendar.europe import UnitedKingdom
cal = UnitedKingdom()

holidays = cal.holidays(start=uk_daily['Date for PH'].min(), 
                    end=uk_daily['Date for PH'].max()).to_pydatetime()
uk_daily['Holiday'] = uk_daily['Date for PH'].isin(holidays)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-26-2d820caa4432> in <module>
      4 
      5 holidays = cal.holidays(start=uk_daily['Date for PH'].min(),
----> 6                         end=uk_daily['Date for PH'].max()).to_pydatetime()

TypeError: holidays() got an unexpected keyword argument 'start'

The data is just a Pandas dataframe with a time index and a few columns. I get the same error message using this reproducible example:

import pandas as pd
from datetime import date
from workalendar.europe import UnitedKingdom
cal = UnitedKingdom()

dr = pd.date_range(start='1996-01-01', end='2019-06-28')
df = pd.DataFrame()
df['Date'] = dr

holidays = cal.holidays(start=dr.min(), end=dr.max()).to_pydatetime()
df['Holiday'] = df['Date'].isin(holidays)
df.head(10)
sunshine
  • 23
  • 1
  • 5
  • [Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example) & [Provide a copy of the data](https://stackoverflow.com/questions/52413246/how-do-i-provide-a-reproducible-copy-of-my-existing-dataframe) – Trenton McKinney Aug 08 '19 at 00:43
  • First you are basing your code on the question rather than the answer, secondly, the answer is not using `workalendar` but a pandas data set. If you want to use the `workalendar` data you will need to convert it for use in Pandas. – Tim Aug 08 '19 at 00:57
  • Thanks for pointing that out. How would I go about converting workalendar data? – sunshine Aug 08 '19 at 01:25

1 Answers1

0

The constructor for holidays only takes the year, e.g. cal.holidays(2019), not a start and end year. The returned values is a list of tuple pairs consisting of the datetime.date value for the holiday and its respective name.

>>> cal.holidays(2019)
[(datetime.date(2019, 1, 1), 'New year'),
 (datetime.date(2019, 4, 19), 'Good Friday'),
 (datetime.date(2019, 4, 21), 'Easter Sunday'),
 (datetime.date(2019, 4, 22), 'Easter Monday'),
 (datetime.date(2019, 5, 6), 'Early May Bank Holiday'),
 (datetime.date(2019, 5, 27), 'Spring Bank Holiday'),
 (datetime.date(2019, 8, 26), 'Late Summer Bank Holiday'),
 (datetime.date(2019, 12, 25), 'Christmas Day'),
 (datetime.date(2019, 12, 26), 'Boxing Day')]

So you need to get the range of years between the start and end dates and then call the constructor on each year using a conditional set comprehension with the condition being that the holiday is between the start and end dates (both inclusive). We take the first element of each tuple pair to get the holiday date (holiday[0]).

start = uk_daily['Date for PH'].min()
start_year = start.year  # Assuming dates are Timestamp objects.
end = uk_daily['Date for PH'].max()
end_year = end.year 

holidays = set(holiday[0] 
               for year in range(start_year, end_year + 1)
               for holiday in cal.holidays(year)
               if start.date() <= holiday[0] <= end.date())

I used a set comprehension instead of a list comprehension because testing for date membership should be faster.

Then just test for membership as you have before:

uk_daily['Holiday'] = uk_daily['Date for PH'].isin(holidays)
Alexander
  • 105,104
  • 32
  • 201
  • 196