1

i have a column with strings which is to convert to datetime (spanish date format)

>>> df['Date'].head()

0    31/10/2019
1    31/10/2019
2    30/10/2019
3    30/10/2019
4    29/10/2019
Name: Date, dtype: object

Convert

>>>pd.to_datetime(df['Date'], dayfirst = True)
>>>df['Date'].head()
0   2019-10-31
1   2019-10-31
2   2019-10-30
3   2019-10-30
4   2019-10-29
Name: Date, dtype: datetime64[ns]

And now I want to sort it by date, and the output converts strangely to:

>>>df['Date'] =df.sort_values(by=['Date'], ascending = True)
>>>df['Date'].head()
0    9443248_19
1    9443205_19
2    9441864_19
3    9441809_19
4    9440310_19
Name: Date, dtype: object

Any clue what happened here? Why the type converts back to object?

  • note that 1) your 'Date' column is of type `object` (should be `datetime` I assume) and 2) you have duplicate dates which means ambiguous results from sorting – FObersteiner Mar 10 '20 at 08:57
  • Does this answer your question? [Sort Pandas Dataframe by Date](https://stackoverflow.com/questions/28161356/sort-pandas-dataframe-by-date) – Raghul Raj Mar 10 '20 at 08:58
  • Yes, first it is object, but after converting, the type is "Name: Date, dtype: datetime64[ns]". What happens when results are ambiguous? I have a time column for a second sort argument. – programmierboy Mar 10 '20 at 09:02
  • @programmierboy: you could combine date and time into one `datetime` type column (and sort by that) to avoid the ambiguity – FObersteiner Mar 10 '20 at 09:03

1 Answers1

0

make sure your 'Date' column is converted to datetime first, then the sorting should work fine:

import pandas as pd

df = pd.DataFrame({'Date': ['31/10/2019', '31/10/2019', '30/10/2019', '30/10/2019', '29/10/2019']})

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)

df.sort_values(by='Date', ascending=True, inplace=True)
# df['Date']
# 4   2019-10-29
# 2   2019-10-30
# 3   2019-10-30
# 0   2019-10-31
# 1   2019-10-31
# Name: Date, dtype: datetime64[ns
FObersteiner
  • 22,500
  • 8
  • 42
  • 72
  • Thanks, with the inplace=True argument it works, but do you know what is the difference if I just "overwrite" the df or use the inplace argument? Why my first approach does not work, but this one? – programmierboy Mar 10 '20 at 09:04
  • @programmierboy: did you try `df = df.sort_values(by=['Date'], ascending = True)`, I mean overwrite `df` instead of `df['Date']`? `inplace=True` does the same thing internally. – FObersteiner Mar 10 '20 at 09:09
  • yes, i exactly tried this and the output was like in the my original post some no-making-sense objects. – programmierboy Mar 10 '20 at 09:12
  • @programmierboy: not sure what happend there... in any case, I think it's important to convert to datetime first, using `dayfirst=True`. Otherwise, pandas is just guessing what comes first (e.g. 10/10/2019 would be ambiguous). – FObersteiner Mar 10 '20 at 09:15