Making ES ForEachWriter sink idempotent with structured streaming in spark

Asked Jul 20 '17 at 03:09

Active Oct 27 '18 at 13:00

Viewed 504 times

I am experiencing the same situation as described in Spark structured steaming from kafka - last message processed again after resume from checkpoint. When I restart my spark job after a failure the last message gets processed again. One of the answers suggest that the sink has to be idempotent. I am not sure I understand this well.

Right now I write to ES sink and the 3 methods are implemented as follows:

open method returns true
process method does Http post to ES
close method closes the connection

I would like to know how to make the ES sink idempotent and also how to use the 2 parameters partitionId and version in the open method to return false if the data has already been processed.

Thanks in advance.

edited Oct 27 '18 at 13:00

zero323

322,348
103
959
935

asked Jul 20 '17 at 03:09

fledgling

Have you read the accepted answer in the link that you have shared ? – eliasah Jul 20 '17 at 07:06
I think the answer was updated a couple of hours after I posted the question.Seems to work with Spark version 2.2.0 – fledgling Jul 20 '17 at 18:34
1

I however think you still have to answer the question how to make _"process method does Http post to ES"_ idempotent. – Jacek Laskowski Jul 20 '17 at 19:27

Making ES ForEachWriter sink idempotent with structured streaming in spark

0 Answers0