0

I can't seem to understand what the difference is between <M8[ns] and date time formats on how these operations relate to why this does or doesn't work.

import pandas as pd 
import datetime as dt 
import numpy as np 

my_dates = ['2021-02-03','2021-02-05','2020-12-25', '2021-12-27','2021-12-12']
my_numbers = [100,200,0,400,500]
df = pd.DataFrame({'a':my_dates, 'b':my_numbers})

df['a']=pd.to_datetime(df['a')

# ultimate goal is to be able to go. * df.mean() * and be able to see mean DATE
# but this doesn't seem to work so...

df['a'].mean().strftime('%Y-%m-%d') ### ok this works... I can mess around and concat stuff...

# But why won't this work?  

df2 = df.select_dtypes('datetime')
df2.mean() # WONT WORK
df2['a'].mean() # WILL WORK?

What I seem to be running into unless I am missing something is the difference between 'datetime' and '<M8[ns]' and how that works when I'm trying to get the mean date.

runningbirds
  • 6,235
  • 13
  • 55
  • 94

2 Answers2

0

You can try passing numeric_only parameter in mean() method:

out=df.select_dtypes('datetime').mean(numeric_only=False)

output of out:

a   2021-06-03 04:48:00
dtype: datetime64[ns]

Note: It will throw you an error If the dtype is string

Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41
0

mean function you apply is different in each case.

import pandas as pd 
import datetime as dt 
import numpy as np 

my_dates = ['2021-02-03','2021-02-05','2020-12-25', '2021-12-27','2021-12-12']
my_numbers = [100,200,0,400,500]
df = pd.DataFrame({'a':my_dates, 'b':my_numbers})

df['a']=pd.to_datetime(df['a'])
df.mean()

This mean function its the DataFrame mean function, and it works on numeric data. To see who is numeric, do:

df._get_numeric_data()

    b
0   100
1   200
2   0
3   400
4   500

But df['a'] is a datetime series.

df['a'].dtype, type(df)
(dtype('<M8[ns]'), pandas.core.frame.DataFrame)

So df['a'].mean() apply different mean function that works on datetime values. That's why df['a'].mean() output the mean of datetime values.

df['a'].mean()
Timestamp('2021-06-03 04:48:00')

read more here: difference-between-data-type-datetime64ns-and-m8ns DataFrame.mean() ignores datetime series #28108

Naomi Fridman
  • 2,095
  • 2
  • 25
  • 36
  • Yes, I get this, but this isn't answering the question I was trying to ask. WHY does df['a'].mean() work just fine, but if I do df.select_dtypes('datetime').mean() NOT WORK? After I **select_dtypes**, the format changes from – runningbirds Jul 18 '21 at 04:04
  • why do you need the select_dtypes for ? – Naomi Fridman Jul 20 '21 at 07:28