Running Spark in a Sun/Oracle Grid Engine

Ideally Spark is used on a dedicated cluster or using Mesos tools, but many existing clusters are running Sun Grid Engine or similar frameworks and unlikely to change in the near future (legacy codes). Spark can, however, be run very efficiently run inside these environments

Repository Structure

jobs

A folder with the SGE-style job scripts to run for the workers (setup for the Merlin Cluster at Paul Scherrer Institute but easily adaptable to other systems. The jobs include a consumable resource called "ram" which is not default on most Sun Grid Engine clusters but should be consumed on a per core basis (smp 6 using 6 cores and ram=8G will need a total of 48GB of memory). It avoids many of the issues with JVMs reserving but not allocating all available memory

sbin

A folder for a script to setup the path locally and create a few aliases to make starting easier

bin

A modified compute-classpath.sh to include the image processing library separately

conf

A modified environment file to setup the local storage correctly (clusters usually have a home filesystem which is shared and not idea for the worker directory

Instructions

Start the master node on one of the machines (locally)

start-master.sh

Submit the code to run to the queue management system

qsub mysparkjob.sge -master=spark://masternode.me:7077

~~(replace masta.me accordingly)~~

Submit many worker jobs connecting to the master node

for i in {1..100}; 
do 
qsub jobs/SparkWorker.sge spark://masternode.me:7077; 
done

###Note

As worker jobs are scheduled by Sun Grid Engine they connect to the master and work on whatever needs to be done. Workers will not quit until they exceed the maximum running time (s_rt)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
bin		bin
conf		conf
jobs		jobs
sbin		sbin
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Running Spark in a Sun/Oracle Grid Engine

Repository Structure

jobs

sbin

bin

conf

Instructions

About

Releases

Packages

Languages

License

4QuantOSS/sge_spark

Folders and files

Latest commit

History

Repository files navigation

Running Spark in a Sun/Oracle Grid Engine

Repository Structure

jobs

sbin

bin

conf

Instructions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages