Skip to content

Latest commit

 

History

History
33 lines (23 loc) · 1.16 KB

README.md

File metadata and controls

33 lines (23 loc) · 1.16 KB

Analysing billion-objects catalogue interactively: Apache Spark for physicists

This repository contains supplementary material for arXiv:1807.03078.

How to run the notebook

You must have Apache Spark and Jupyter notebook installed on your machine or your cluster. Other Python dependencies are described in the notebook.

On a local machine

PACK="com.github.astrolabsoftware:spark-fits_2.11:0.7.2"
PYSPARK_DRIVER_PYTHON_OPTS="jupyter-notebook" pyspark \
     --master local[*] \
     --packages $PACK 

On a cluster

Standalone mode:

PACK="com.github.astrolabsoftware:spark-fits_2.11:0.7.2"
PYSPARK_DRIVER_PYTHON_OPTS="jupyter-notebook --debug --no-browser --port=$PORT1" pyspark \
     --master $SPARKURL \
     --packages $PACK \
     --driver-memory $MEMDRIVER --executor-memory $MEMEXEC --executor-cores $EXECCORES --total-executor-cores $TOTALCORES

DESC members: working at NERSC

Source your DESC environment. Then go to the Jupyter Lab web interface, and execute the notebook with the desc-pyspark kernel.