Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Spark (reusing Hive code) #14

Open
karth295 opened this issue Sep 10, 2020 · 1 comment
Open

Add support for Spark (reusing Hive code) #14

karth295 opened this issue Sep 10, 2020 · 1 comment

Comments

@karth295
Copy link
Contributor

Spark has a fork of HiveServer2 it uses to support JDBC: https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-thrift-server.html. And you use Hive's JDBC client to interact with it.

This means that the bulk of what we need is already done. Here are remaining TODOs afaik:

  1. Create and document an init action to start a Spark thrift server. Part of that init action will be to configure Knox to expose the Spark thrift server.

  2. Change the JDBC connector to accept jdbc:dataproc://spark and translate it into using the component gateway path for Spark.

  3. Update the README to reflect this.

@karth295
Copy link
Contributor Author

Init action: gs://hive-http-mode-init-action/spark-http-config.sh. Note that it disables the regular hive-server2 and runs Spark's hive-server2 in its place.

Now it's just a matter of documenting this init action in the README -- I'll leave this issue open for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant