1

I am experiencing the same situation as described in Spark structured steaming from kafka - last message processed again after resume from checkpoint. When I restart my spark job after a failure the last message gets processed again. One of the answers suggest that the sink has to be idempotent. I am not sure I understand this well.

Right now I write to ES sink and the 3 methods are implemented as follows:

  1. open method returns true
  2. process method does Http post to ES
  3. close method closes the connection

I would like to know how to make the ES sink idempotent and also how to use the 2 parameters partitionId and version in the open method to return false if the data has already been processed.

Thanks in advance.

zero323
  • 322,348
  • 103
  • 959
  • 935
fledgling
  • 991
  • 4
  • 25
  • 48

0 Answers0