0

I have a pandas dataframe (df) like below.

       value        date    count1  hours   column_name
 0      eps     2021-02-17  127185   0       EPS
 1      eps     2021-02-17  129792   1       EPS
 2      eps     2021-02-17  155645   2       EPS
 3      eps     2021-02-17  160214   4       EPS
 4      eps     2021-02-17  164315   5       EPS
 5      eps     2021-02-16  164987   1       EPS

And I want new dataframe which does not contain record of date 2021-02-17 and hours=1. For these i try these

     df.loc[(df['date1']<='2021-02-17') & (df['hours']!= 1)]

It give output like below :-

       value        date    count1  hours   column_name
 0      eps     2021-02-17  127185   0       EPS
 1      eps     2021-02-17  155645   2       EPS
 2      eps     2021-02-17  160214   4       EPS
 3      eps     2021-02-17  164315   5       EPS

But I want like below :-

       value        date    count1  hours   column_name
 0      eps     2021-02-17  127185   0       EPS
 1      eps     2021-02-17  155645   2       EPS
 2      eps     2021-02-17  160214   4       EPS
 3      eps     2021-02-17  164315   5       EPS
 4      eps     2021-02-16  164987   1       EPS
vishwajeet Mane
  • 344
  • 1
  • 3
  • 13

2 Answers2

2

You need to use the operator | instead of &:

import pandas as pd

#if needed you can change your 'date' to datetime
df['date'] = pd.to_datetime(df['date'])

out = df.loc[(df['date']!='2021-02-17') | (df['hours']!=1)]

  value       date  count1  hours column_name
0   eps 2021-02-17  127185      0         EPS
1   eps 2021-02-17  129792      1         EPS
2   eps 2021-02-17  155645      2         EPS
3   eps 2021-02-17  160214      4         EPS
4   eps 2021-02-17  164315      5         EPS
5   eps 2021-02-16  164987      1         EPS
sophocles
  • 13,593
  • 3
  • 14
  • 33
  • 1
    Beware this can lead to the setting with copy warning, if you decide to do other stuff with `out`. Better to use `loc` or make an express copy() – cs95 Feb 17 '21 at 10:01
  • Thanks for the tip. Please continue giving feedback whenever you feel it's needed. It is definitely appreciated. – sophocles Feb 17 '21 at 10:05
  • 1
    Note [I have not downvoted in any way](https://i.stack.imgur.com/iax2v.jpg), this is a correct answer – cs95 Feb 17 '21 at 10:06
  • Even if you did that's completely fine, as long as it's associated with feedback. – sophocles Feb 17 '21 at 10:08
  • It is taking too much time for converting into datetime format i have dataframe with 2 billion records. – vishwajeet Mane Feb 18 '21 at 07:02
  • As shown [here](https://stackoverflow.com/questions/32034689/why-is-pandas-to-datetime-slow-for-non-standard-time-format-such-as-2014-12-31/32034914#32034914), you can improve the performance if you specify a format string to ```datetime```. I have tested ```pd.to_datetime(large_df['date'],infer_datetime_format=True)``` with ```pd.to_datetime(large_df['date'])``` on my personal machine and the former runs faster than the latter. Try changing your code to that, or specify a specific ```format```. – sophocles Feb 18 '21 at 09:47
2

Compare both for not equal by !=, change | for bitwise OR and for correct ouput need datetimes:

df['date'] = pd.to_datetime(df['date'])
df =  df.loc[(df['date'] != '2021-02-17') | (df['hours'] != 1)]
print (df)
  value       date  count1  hours column_name
0   eps 2021-02-17  127185      0         EPS
2   eps 2021-02-17  155645      2         EPS
3   eps 2021-02-17  160214      4         EPS
4   eps 2021-02-17  164315      5         EPS
5   eps 2021-02-16  164987      1         EPS
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252