Reading from Elasticsearch v6.2
into spark using the prescribed spark connector org.elasticsearch:elasticsearch-spark-20_2.11:6.3.2
is horrendously slow. This is from a 3 node ES cluster with index:
curl https://server/_cat/indices?v
green open db MmVwAwYfTz4eE_L-tncbwQ 5 1 199983131 9974871 105.1gb 51.8gb
Reading on a (10 node, 1tb mem, >50 VCPUs) spark cluster:
val query = """{
"query": {
"match_all": {}
}
}"""
val df = spark.read
.format("org.elasticsearch.spark.sql")
.option("es.nodes","server")
.option("es.port", "443")
.option("es.net.ssl","true")
.option("es.nodes.wan.only","true")
.option("es.input.use.sliced.partitions", "false")
.option("es.scroll.size", "1000")
.option("es.read.field.include", "f1,f2,f3")
.option("es.query",query)
.load("db")
df.take(1)
That took 10 minutes to execute.
Is this how (slowly) it's supposed to work, or am I doing something wrong?