I want to access accumulator inside pyspark udf :
import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, udf
from pyspark.sql.types import StringType
accum=spark.sparkContext.accumulator(0)
def prob(g,s):
if g=='M':
accum.add(1)
return 1
else:
accum.add(2)
return accum.value
convertUDF = udf(lambda g,s : prob(g,s),IntegerType())
problem i am getting :
raise Exception("Accumulator.value cannot be accessed inside tasks")
Exception: Accumulator.value cannot be accessed inside tasks
Please let me know how to access accumulator value and how can we change it inside Pyspark UDF .