Analysing billion-objects catalogue interactively: Apache Spark for physicists

This repository contains supplementary material for arXiv:1807.03078.

How to run the notebook

You must have Apache Spark and Jupyter notebook installed on your machine or your cluster. Other Python dependencies are described in the notebook.

On a local machine

PACK="com.github.astrolabsoftware:spark-fits_2.11:0.7.2"
PYSPARK_DRIVER_PYTHON_OPTS="jupyter-notebook" pyspark \
     --master local[*] \
     --packages $PACK

On a cluster

Standalone mode:

PACK="com.github.astrolabsoftware:spark-fits_2.11:0.7.2"
PYSPARK_DRIVER_PYTHON_OPTS="jupyter-notebook --debug --no-browser --port=$PORT1" pyspark \
     --master $SPARKURL \
     --packages $PACK \
     --driver-memory $MEMDRIVER --executor-memory $MEMEXEC --executor-cores $EXECCORES --total-executor-cores $TOTALCORES

DESC members: working at NERSC

Source your DESC environment. Then go to the Jupyter Lab web interface, and execute the notebook with the desc-pyspark kernel.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Analysing billion-objects catalogue interactively: Apache Spark for physicists

How to run the notebook

On a local machine

On a cluster

DESC members: working at NERSC

Files

README.md

Latest commit

History

README.md

File metadata and controls

Analysing billion-objects catalogue interactively: Apache Spark for physicists

How to run the notebook

On a local machine

On a cluster

DESC members: working at NERSC