This repository contains the code and documentation for a distributed computing analysis conducted for identifying high-redshift Lyman-break galaxies from spectral files. The project was undertaken as part of Statistics 405 at the University of Wisconsin-Madison, in collaboration with the Center for High Throughput Computing (CHTC).
This project involves the development and execution of a distributed computing strategy to identify high-redshift Lyman-break galaxies from a large dataset of spectral files. The analysis was conducted using R and Bash scripting on the HTCondor platform, optimizing job scheduling and data processing efficiency.
Research Objective: The primary objective of this project is to identify high-redshift Lyman-break galaxies from a dataset consisting of 2.5 million spectral files. Tools Used: R, Bash scripting, HTCondor, Git, Shell Collaborators: Center for High Throughput Computing (CHTC), University of Wisconsin-Madison, Statistics 405 Key Achievements: Developed and executed a distributed computing strategy for analyzing spectral files. Orchestrated 2459 parallel computing jobs on the HTCondor platform. Implemented data analysis techniques to filter and prioritize galaxy candidates. Automated data merging and analysis workflow for efficient handling of large datasets.
Data Preprocessing: Includes scripts and code for preprocessing spectral files before analysis. Job Scheduling: Scripts and documentation related to job scheduling and optimization using HTCondor. Data Analysis: R scripts and documentation for analyzing spectral data and identifying Lyman-break galaxies. Automation: Shell scripts and utilities for automating workflow processes and data merging.
[Jake Christensen] [Center for High Throughput Computing (CHTC)] [University of Wisconsin-Madison, Statistics 405]