1

I have a df that includes high and low stock prices by day in 2 minute increments. I am trying to find the high and low for each day. I am able to do so by using the code below but the output only gives me the date and price data. I need to have the time column available as well. I've tried about 100 different ways but cannot get it to work.

high = df.groupby('Date')['High'].max()
low = df.groupby('Date')['Low'].min()

Below are my columns and dtypes.

 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  

 0   High          4277 non-null   float64
 1   Low           4277 non-null   float64 
 2   Date          4277 non-null   object 
 3   Time          4277 non-null   object 

Any suggestions?

user3088202
  • 2,714
  • 5
  • 22
  • 36
  • Does this answer your question? [Pandas: Creating aggregated column in DataFrame](https://stackoverflow.com/questions/13256917/pandas-creating-aggregated-column-in-dataframe) – mcskinner Apr 16 '20 at 19:05

3 Answers3

2

transform with boolean indexing:

# sample data
np.random.seed(10)
df = pd.DataFrame([pd.date_range('2020-01-01', '2020-01-03', freq='H'),
                   np.random.randint(1,10000, 49), np.random.randint(1,10,49)]).T
df.columns = ['date', 'high', 'low']
df['time'] = df['date'].dt.time
df['date'] = df['date'].dt.date


# transform max and min then assign to a variable
mx = df.groupby('date')['high'].transform(max)
mn = df.groupby('date')['low'].transform(min)

# boolean indexing
high = df[df['high'] == mx]
low = df[df['low'] == mn]

# high
          date  high low      time
4   2020-01-01  9373   9  04:00:00
42  2020-01-02  9647   2  18:00:00
48  2020-01-03    45   5  00:00:00

# low
          date  high low      time
14  2020-01-01  2103   1  14:00:00
15  2020-01-01  3417   1  15:00:00
23  2020-01-01   654   1  23:00:00
27  2020-01-02  2701   1  03:00:00
30  2020-01-02   284   1  06:00:00
36  2020-01-02  6160   1  12:00:00
38  2020-01-02   631   1  14:00:00
40  2020-01-02  3417   1  16:00:00
44  2020-01-02  6860   1  20:00:00
45  2020-01-02  8989   1  21:00:00
47  2020-01-02  2811   1  23:00:00
48  2020-01-03    45   5  00:00:00
It_is_Chris
  • 13,504
  • 2
  • 23
  • 41
  • Thanks! Worked like a charm. I will review to really understand the code after I grab some dinner. Not familiar with boolean indexing so will have to look that up. – user3088202 Apr 16 '20 at 20:56
  • @user3088202 Glad I could help. Here are the pandas docs for [boolean indexing](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#boolean-indexing) – It_is_Chris Apr 16 '20 at 21:03
0

Do you wan this:

# should use datetime type:
df['Date'] = pd.to_datetime(df['Date'])

df.groupby(df.Date.dt.normalize()).agg({'High': 'max', 'Low': 'min'})
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
0

After you apply groupby and min or max function, you can select the columns using loc or iloc:

df.groupby('Date').max().loc[:,['High','Time']]
SonerY
  • 13
  • 2
  • Thank for reply. Looks like you are applying the max function to Date column but I need to find the highest value for High column by for each date. I tried adding ['High'] but got Too many indexes error. – user3088202 Apr 16 '20 at 19:18
  • This code applies the max function to all columns except for the one in groupby. Then you can select the columns you want to display. You can also do it like df[['High','Time', 'Date]].groupby('Date').max() – SonerY Apr 16 '20 at 19:36