3

I am following this solution with Spark 2.0.0 Filtering a spark dataframe based on date

Without the Column module, I get the following error when I try to use expression gt.

DataFrame' object has no attribute 'gt'

I tried to import Column module to use expressions like lt,gt, geq etc.

from pyspark.sql.column import *

I get the error:

AttributeError: 'module' object has no attribute 'DataFrame'

Any tips on how can I use gt expression?

eliasah
  • 39,588
  • 11
  • 124
  • 154
user3311147
  • 281
  • 2
  • 5
  • 16

1 Answers1

4

pyspark doesn't support those functions but you can still use the operator module as followed :

>>> from operator import ge
>>> from pyspark.sql import functions as F
>>> df = spark.range(1, 50)
>>> df.filter(ge(df.id, F.lit(45))).show()
# +---+
# | id|
# +---+
# | 45|
# | 46|
# | 47|
# | 48|
# | 49|
# +---+

Or you can even use the >= operator:

>>> df.filter(df.id >= F.lit(45)).show()
# +---+
# | id|
# +---+
# | 45|
# | 46|
# | 47|
# | 48|
# | 49|
# +---+
zero323
  • 322,348
  • 103
  • 959
  • 935
eliasah
  • 39,588
  • 11
  • 124
  • 154