Grouping data in DF but keeping all columns in Python

Question

I have a df that includes high and low stock prices by day in 2 minute increments. I am trying to find the high and low for each day. I am able to do so by using the code below but the output only gives me the date and price data. I need to have the time column available as well. I've tried about 100 different ways but cannot get it to work.

high = df.groupby('Date')['High'].max()
low = df.groupby('Date')['Low'].min()

Below are my columns and dtypes.

 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  

 0   High          4277 non-null   float64
 1   Low           4277 non-null   float64 
 2   Date          4277 non-null   object 
 3   Time          4277 non-null   object

Any suggestions?

Does this answer your question? [Pandas: Creating aggregated column in DataFrame](https://stackoverflow.com/questions/13256917/pandas-creating-aggregated-column-in-dataframe) — mcskinner, Apr 16 '20 at 19:05

It_is_Chris · Accepted Answer · 2020-04-16T20:07:26.610

transform with boolean indexing:

# sample data
np.random.seed(10)
df = pd.DataFrame([pd.date_range('2020-01-01', '2020-01-03', freq='H'),
                   np.random.randint(1,10000, 49), np.random.randint(1,10,49)]).T
df.columns = ['date', 'high', 'low']
df['time'] = df['date'].dt.time
df['date'] = df['date'].dt.date


# transform max and min then assign to a variable
mx = df.groupby('date')['high'].transform(max)
mn = df.groupby('date')['low'].transform(min)

# boolean indexing
high = df[df['high'] == mx]
low = df[df['low'] == mn]

# high
          date  high low      time
4   2020-01-01  9373   9  04:00:00
42  2020-01-02  9647   2  18:00:00
48  2020-01-03    45   5  00:00:00

# low
          date  high low      time
14  2020-01-01  2103   1  14:00:00
15  2020-01-01  3417   1  15:00:00
23  2020-01-01   654   1  23:00:00
27  2020-01-02  2701   1  03:00:00
30  2020-01-02   284   1  06:00:00
36  2020-01-02  6160   1  12:00:00
38  2020-01-02   631   1  14:00:00
40  2020-01-02  3417   1  16:00:00
44  2020-01-02  6860   1  20:00:00
45  2020-01-02  8989   1  21:00:00
47  2020-01-02  2811   1  23:00:00
48  2020-01-03    45   5  00:00:00

Thanks! Worked like a charm. I will review to really understand the code after I grab some dinner. Not familiar with boolean indexing so will have to look that up. — user3088202, Apr 16 '20 at 20:56
@user3088202 Glad I could help. Here are the pandas docs for [boolean indexing](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#boolean-indexing) — It_is_Chris, Apr 16 '20 at 21:03

score 0 · Answer 2 · answered Apr 16 '20 at 19:04

0

Do you wan this:

# should use datetime type:
df['Date'] = pd.to_datetime(df['Date'])

df.groupby(df.Date.dt.normalize()).agg({'High': 'max', 'Low': 'min'})

answered Apr 16 '20 at 19:04

Quang Hoang

146,074
10
56
74

Thanks for a quick reply. I ran your code but nothing changed. df looks exactly the same. – user3088202 Apr 16 '20 at 19:14

score 0 · Answer 3 · answered Apr 16 '20 at 19:12

0

After you apply groupby and min or max function, you can select the columns using loc or iloc:

df.groupby('Date').max().loc[:,['High','Time']]

answered Apr 16 '20 at 19:12

SonerY

13
2

Thank for reply. Looks like you are applying the max function to Date column but I need to find the highest value for High column by for each date. I tried adding ['High'] but got Too many indexes error. – user3088202 Apr 16 '20 at 19:18
This code applies the max function to all columns except for the one in groupby. Then you can select the columns you want to display. You can also do it like df[['High','Time', 'Date]].groupby('Date').max() – SonerY Apr 16 '20 at 19:36

Grouping data in DF but keeping all columns in Python

3 Answers3