Tableau Apache Hive and Spark

Reading some interesting things from the AMP group and especially the Shark server. Following these easy to install steps and executing the example, I was inspired to connect using the Cloudera ODBC Hive connector. A bit of fiddling around but the example seems to work really well out of the box.

https://github.com/amplab/shark/wiki/Running-Shark-Locally

CREATE TABLE src(key INT, value STRING);
LOAD DATA LOCAL INPATH '${env:HIVE_HOME}/examples/files/kv1.txt' INTO TABLE src;
SELECT COUNT(1) FROM src;
CREATE TABLE src_cached AS SELECT * FROM SRC;
SELECT COUNT(1) FROM src_cached;

Installing the server outside my firewall meant a little SSH tunnelling (don’t tell the FW guy’s) to get to the example server on the public internet.

Image

The Cloudera Hive connection has to be setup as below, if you are using an SSH putty tunnel to the port that you have exposed on your Hive/Shark server.

Image

Image

You can now monitor two log files the Shark server, running on your linux box, in this case an Ubuntu server.

Image

You can also monitor what Tableau is doing locally on the local machine, for this I use a custom Python script that you can use to monitor the log files.

Image

As you move the control around on the Tableau workbook, the SQL query bounces between the local log, the Shark server and back.

HIVE

 

 

 

 

 

 

 

 

 

The shark server running locally as a server, seems to log ‘OK’ alot?

Image