0

I configured eclipse with pyspark

i am using latest version of SPARK and PYTHON.

when i try to code something and run. i get below error.

java.io.IOException: Cannot run program "python": CreateProcess error=2, The system cannot find the file specified

code i have written is below

'''
Created on 23-Dec-2017

@author: lenovo
'''
from pyspark import SparkContext,SparkConf

from builtins import int

#from org.spark.com.PySparkDemo import data

from pyspark.sql import Row

from pyspark.sql.context import SQLContext



conf = SparkConf().setAppName("FileSystem").setMaster("local")

sc=SparkContext(conf=conf)

sqlContext=SQLContext(sc)

a = sc.textFile("C:/Users/lenovo/Desktop/file.txt")

b = a.map(lambda x:x.split(",")).map(lambda x:Row(id=int(x[0]),name=x[1],marks=int(x[2])))

c = sqlContext.createDataFrame(b)

c.show()

please suggest

2 Answers2

3

Assuming you have installed pydev

under Windows > preferences > Pydev > interpreters > python interpreters > go to environment

under environment you need to give path of pyhton.exe file, Variable name as PYSPARK_PYTHON

LUZO
  • 1,019
  • 4
  • 19
  • 42
1

I faced the same issue on Windows 10 with:

  • Spark version 3.1.1
  • Python version 3.9.4

Here's what I did:

  • The directory "C:\spark\conf" had a file spark-env.sh.template. I changed it to spark-env.cmd.

  • Kept all the existing text commented. (You'll have to replace # with :: for windows to comment)

  • Added the following line to set PYSPARK_PYTHON variable.

    set PYSPARK_PYTHON=C:\python\python.exe

And it resolved the error. Referred this thread on stackoverflow : encountered a ERROR that Can't run program on pyspark