Here a proposal:
- MINIAOD in Brazos
- c++ code will add HME to MINIAOD and same them on stica ()
- python miniAOD2RDD.py to save RDD on stica (/data/RDD)
- used thos DNN to run main analysis
- main analysis will produce a DNN with additional bool tables (collection of 0 and 1, each representing a selection)
- training DNN using keras selection and save RDD with DNN ourput
- create inputs for limits
- TO BE FILLED
- Inputs are defined in utilities/
- Outputs will be in /data/RDD
- Execution:
spark-submit --class org.apache.spark.deploy.master.Master \
--packages org.diana-hep:spark-root_2.11:0.1.15,org.diana-hep:histogrammar-sparksql_2.11:1.0.4 \
--master spark://cmstca:7077 \
--deploy-mode client $PWD/miniAOD2RDD.py
(or python miniAOD2RDD.py)
- Inputs are defined in utilities/
- Execution:
spark-submit --class org.apache.spark.deploy.master.Master \
--packages org.diana-hep:histogrammar-sparksql_2.11:1.0.4 \
--master spark://cmstca:7077 \
--deploy-mode client $PWD/analyzeRDD.py
(or python analyzeRDD.py)
- TO BE FILLED
First you can take the initial Ntuples and convert them in .parquet files
- python DYminiAOD2RDD.py (or use spark-submit) Then you need to run the ML code
- python trainDY.py (or use spark-submit)
Start the master:
/data/spark-2.3.0-bin-hadoop2.7/sbin/start-master.sh
- Give you a file where you can see the marster url: spark://cmstca:7077 Start the slave:
/data/spark-2.3.0-bin-hadoop2.7/sbin/start-slave.sh spark://cmstca:7077
Then you can submit the job:
spark-submit --class org.apache.spark.deploy.master.Master \
--master spark://cmstca:7077 \
--deploy-mode client /home/lpernie/HHWWbb/HHWWbb_analysis/analyzeRDD.py
And at the end:
/data/spark-2.3.0-bin-hadoop2.7/sbin/stop-all.sh
- On stica:
ipython notebook --no-browser --port=8889
jupyter notebook list
- And save the token after token=XXX message you receive
- On your local browser go to 'localhost:8888' and use the token as a password
hdfs dfs -df -h hdfs dfs -df -h /data/ hadoop fs -ls /
#Exporting stuff
export PYTHONPATH=/home/demarley/Downloads/root/lib:$PYTHONPATH
export LD_LIBRARY_PATH=/home/demarley/anaconda2/lib/:/home/demarley/Downloads/root/lib:$LD_LIBRARY_PATH
export SPARK_HOME=/data/spark
export PATH="/home/demarley/anaconda2/bin:/home/demarley/.local/bin:$PATH"
#SPARK and HADOOP
export PATH=$SPARK_HOME/bin:$PATH
export HADOOP_HOME=/data/hadoop-3.1.0
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export PATH=$PATH:$HADOOP_HOME/bin
#JAVA
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
export PATH=$JAVA_HOME/bin:$PATH
#ROOT
source /data/root/bin/thisroot.sh
#Histogrammar
export PYTHONPATH=/data/histogrammar/histogrammar-python:$PYTHONPATH
- 'java.io.FileNotFoundException: [...] (Too many open files)'
- sudo vim /etc/security/limits.conf And increase the maximum number of open file for your user
- username hard nofile 10000
- username soft nofile 6000