Translate

Tuesday 6 December 2016

Query Oracle Using Spark

Just for Testing
================

Prereqs
========
1. Install Java jdk 1.7 or later
2. Oracle Database Software & DB
3. Spark Software

Note : HDFS/Hadoop is not really required.

High Level Steps
=================
1. Make sure Oracle Database is up and running
2. Unzip the Spark Software
3. Launch the Spark in Stand alone mode
4. In the command prompt invoke the spark shell
5. Query Oracle Data from Spark Shell using JDBC

Oracle:
=======
create table spark.test (name varchar(20));

insert into spark.test values ('WELCOME TO SPARK');

commit;

Spark:
=======
java -version
export JAVA_HOME=/usr/java/jdk1.8.0_45
export PATH=$PATH:$JAVA_HOME/bin



echo $JAVA_HOME

export SPARK_CLASSPATH=/u01/app/oracle/product/12.1.0/dbhome_1/jdbc/lib/ojdbc7.jar

Start Spark:
============
cd $SPARK_HOME/sbin

./start-master.sh

./start-slave.sh spark://localhost:7077

Invoke Spark Shell:
===================
cd $SPARK_HOME/bin

val test = sqlContext.load ("jdbc", Map("url" -> "jdbc:oracle:thin:spark/spark@//localhost:1521/ORCL","dbtable" ->"test"))
test.count()
test.printSchema
test.show

Demo :