1

I have a multiple files on hdfs I want to be queryable through spark sql JDBC. I can start a spark shell and use "Sqlcontext", etc. What happens if I want to keep the sqlcontext open so that I can have a separate application connect via JDBC to issue queries to it?

Note I know I can run "spark-shell" and open up a local instance of spark, and do import sqlcontext, but the files I have are big in size (100GB), I only have at most 16GB on a single machine, so I want it to take advantage of my 50 node cluster of a single master and 49 slaves for performance. Or spark sql only possible with a single node?

Rolando
  • 58,640
  • 98
  • 266
  • 407
  • I don't I understand what you want want here. If you work on files why do you want JDBC connection? SQLContext, same as SparkContext is handled by a Driver. There is no need, not to mention it is not possible, to create SQLContext per worker node. It doesn't mean you data will be handled on a Driver though. – zero323 Sep 23 '15 at 10:08
  • I want a separate application to be able to issue database queries against the sqlcontext. How do you start up that driver to use/take advantage of the entire cluster? I am used to writing jobs and spark-submitting to local cluster or master. – Rolando Sep 23 '15 at 12:55
  • Something like this: http://stackoverflow.com/q/27108863/1560062 – zero323 Sep 23 '15 at 12:57

0 Answers0