Leverage the capabilities of Balsam to employ a vLLM service API, utilizing Llama 70B for Protein-Protein Interaction (PPI) identification on Polaris. Balsam enables code execution on remote systems. This guide details the steps for setting up a Balsam site on Polaris and executing a vLLM service for PPI discovery.
- Clone this repository.
- Obtain the necessary validation files for PPI analysis from this link.
Create a conda environment for vllm and Balsam on Polaris:
module load conda
conda create -n balsam-vllm-polaris python=3.9 -y
conda activate balsam-vllm-polaris
pip install --pre balsam
pip install vllm pandas
- To establish a Balsam site on Polaris for job submission:
balsam login
balsam site init polaris-site
cd polaris-site
balsam site start
- Review and adjust your job-template.sh as demonstrated here.
Note: Balsam offers elastic queue features that adjust automatically based on job demand. Consult my settings.yml for configuration details.
Note: For remote job submission, after configuring the Polaris site, install Balsam, vllm, and pandas locally (Python=3.9) to manage jobs and retrieve results from a remote location. You will select Local after
balsam site init
Within the Polaris site:
-
Make sure the csv files are placed in the directory from which you are operating the app and submitting jobs.
-
Update the conda environment in the define_app.py script within the
def shell_preamble
method to match your setup. Also ensurecommand_template
references vllm_batch.py appropriately.
Note: In the define_jobs.py script, modify
df = df.iloc[0:99]
to not process all 19k proteins. Also, currently, the application processes 100 proteins per instance. Adjust the batch size inself.get_word_batches(df,100)
as required.
- Finally run
python3 define_app.py
python3 define_jobs.py
Note: Use
qstat
andbalsam job ls
to check if the jobs are running and have finished. Other balsam commands are found in balsam docs.
- Following job completion, select either parallel_dot_construction.py or serial_dot_construction.py to process the output. Check that
output_path
anddot_file_path
are set correctly.
python3 parallel_dot_construction.py