0

I'm just getting started playing around with Python and Pandas, with ~10 hours total invested to far. I have a dataframe of daily stock data and I've resampled it weekly. The problem lies in weeks where Friday is a holiday, I get NaN in my dataset. Is there a way to accommodate for this scenario? (Same issue as well when I resample monthly, where the final day is a weekend).

sample = 'W-FRI'
for i in range(tickerCount):
    datalist.append(yf.download(stock_list[i], start, end))
    datalist[i]['High'] = datalist[i]['High'].resample(sample).max()
    datalist[i]['Low'] = datalist[i]['Low'].resample(sample).min()
    datalist[i]['Open'] = datalist[i]['Open'].resample(sample).first()
    datalist[i]['Close'] = datalist[i]['Close'].resample(sample).last()
    datalist[i] = datalist[i].asfreq(sample, method='pad')

n

As you can see the week of Good Friday could not be sampled properly. I know its possible to remove these from the dataframe:

    datalist[i] = datalist[i][datalist[i]['High'].notna()]

But ideally I would like to grab the last day of data for the specified resampled period (In this case, use Thursday's data. I've looked at this answer

Is there a way to accomplish this?

EDIT:

@ElliottCollins had an idea to use .ffill() to backfill the Friday with the previous data (from Thursday). This also backfills every Saturday and Sunday with the previous data. Unfortunately when I do this and then resample W-FRI my Open values are incorrect; They become Previous Friday's open rather than Monday's Open

EDIT 2

I just realized if I set index again after all this, I'm able to resample as desired. I'll post the solution below

splatto
  • 3,159
  • 6
  • 36
  • 69
  • 1
    Fwiw, a much more natural way to remove those missing values would be `datalist=datalist.dropna()` – Elliott Collins Jun 28 '20 at 17:18
  • 1
    Would you want to set Friday's value in the dataset to be the same as Thursday's? If so, you can use `.ffill()` to fill in missing values with the most recent non-missing value. I.e. `datalist = datalist.ffill()`. – Elliott Collins Jun 28 '20 at 17:22
  • @ElliottCollins Great idea but unfortunately doesn't work. This backfills Sunday with Friday values. When I resample W-FRI my intention is to get M-F, but I now get S-F. But since Sunday has Friday data, effectively it gives me F-F which messes up calculates further down the road. I've experimented with first reindexing so I can then remove weekend data, but then I cannot resample because I can't resample an Int64Index. Back to the drawing board unfortunately – splatto Jun 28 '20 at 22:13
  • Actually I can reset the index after that then resample as desired, so this does work – splatto Jun 28 '20 at 22:22

1 Answers1

0

Thanks @ElliottCollins tip about backfilling data.

datalist[i] = datalist[i].ffill()

This also backfills weekends, which I don't want. So I need to create a column from the index

datalist[i] = datalist[i].reset_index()

And then remove weekends

datalist[i] = datalist[i][datalist[i]['Date'].dt.dayofweek < 5]

And I need the Date column to be reset as the index for transformations later on, so

datalist[i] = datalist[i].set_index('Date')

And I was able to effectively get the data I needed

splatto
  • 3,159
  • 6
  • 36
  • 69