0

I am currently running a real time Spark Streaming job on a cluster with 50 nodes on Spark 1.3 and Python 2.7. The Spark streaming context reads from a directory in HDFS with a batch interval of 180 seconds. Below are the configuration for the Spark Job:

spark-submit --master yarn-client --executor-cores 5 --num-executors 10 --driver-memory 10g --conf spark.yarn.executor.memoryOverhead=2048 --conf spark.yarn.driver.memoryOverhead=2048 --conf spark.network.timeout=300 --executor-memory 10g

The job runs fine for the most part. However, it throws a Py4j Exception after around 15 hours citing it cannot obtain a communication channel.

I tried reducing the Batch Interval size but then it creates an issue where the Processing time is greater than the Batch Interval.

Below is the Screenshot of the Error

Py4jError

I did some research and found that it might be an issue with Socket descriptor leakage from here SPARK-12617

However, I am not able to work around the error and resolve it. Is there a way to manually close the open connections which might be preventing to provide ports. Or do I Have to make any specific changes in the code to resolve this.

TIA

Nitin Singh
  • 76
  • 1
  • 1
  • 8

0 Answers0