Skip to content

Latest commit

 

History

History

Polaris

Protein-Protein Interaction Analysis with Balsam on Polaris

Leverage the capabilities of Balsam to employ a vLLM service API, utilizing Llama 70B for Protein-Protein Interaction (PPI) identification on Polaris. Balsam enables code execution on remote systems. This guide details the steps for setting up a Balsam site on Polaris and executing a vLLM service for PPI discovery.

Prerequisites

  • Clone this repository.
  • Obtain the necessary validation files for PPI analysis from this link.

Table of Contents

Setup vllm and Balsam Environments

Create a conda environment for vllm and Balsam on Polaris:

module load conda
conda create -n balsam-vllm-polaris python=3.9 -y
conda activate balsam-vllm-polaris
pip install --pre balsam
pip install vllm pandas

Create and Start Balsam Sites

  1. To establish a Balsam site on Polaris for job submission:
balsam login
balsam site init polaris-site
cd polaris-site
balsam site start
  1. Review and adjust your job-template.sh as demonstrated here.

Note: Balsam offers elastic queue features that adjust automatically based on job demand. Consult my settings.yml for configuration details.

Note: For remote job submission, after configuring the Polaris site, install Balsam, vllm, and pandas locally (Python=3.9) to manage jobs and retrieve results from a remote location. You will select Local after balsam site init

Execute the Application and Submit Jobs

Within the Polaris site:

  1. Make sure the csv files are placed in the directory from which you are operating the app and submitting jobs.

  2. Update the conda environment in the define_app.py script within the def shell_preamble method to match your setup. Also ensure command_template references vllm_batch.py appropriately.

Note: In the define_jobs.py script, modify df = df.iloc[0:99] to not process all 19k proteins. Also, currently, the application processes 100 proteins per instance. Adjust the batch size in self.get_word_batches(df,100) as required.

  1. Finally run
python3 define_app.py
python3 define_jobs.py

Note: Use qstat and balsam job ls to check if the jobs are running and have finished. Other balsam commands are found in balsam docs.

  1. Following job completion, select either parallel_dot_construction.py or serial_dot_construction.py to process the output. Check that output_path and dot_file_path are set correctly.
python3 parallel_dot_construction.py