Instructions for AIDA cluster

GPU allocation suggestions

For loading these Huggingface models in bfloat16 format, we recommend the following allocation. How do we calculate these allocations? bfloat16 is 2 bytes per parameter, and so a 1B parameter model requires 2B of GPU RAM. We suggest a bit more.

LLM Name	Required GPU RAM	Recommended GPU Allocation
Llama-3.2-11B-Vision-Instruct	>25GB	1 A100 or H100
Llama-3.2-90B-Vision-Instruct	>190GB	4 A100 or H100

Running the Chat Interface

First, request a node with an A100 (for the Llama 3.2 11B model) and note the compute node's ID, e.g. c0021. Then, on the compute node,

# If you have already downloaded Llama 3.2 to a local directory
export LLAMA_MODEL_PATH=/mnt/beegfs/bulk/mirror/hf_models/llama-3.2-11b
./run_chat.sh --model_name meta-llama/Llama-3.2-11B-Vision-Instruct --model_local_path $LLAMA_MODEL_PATH

# If not downloaded, ask huggingface to download for you
# By default huggingface will download in your local cache https://huggingface.co/docs/huggingface_hub/en/guides/manage-cache
huggingface-cli login
./run_chat.sh --model_name meta-llama/Llama-3.2-11B-Vision-Instruct

Now on your local machine (not on the cluster), open a new terminal and run ./scripts/forward_streamlit_port_slurm_to_mac.sh aida c0021. Here aida is a Host in my local machine's ~/.ssh/config, or you can use any other Host cluster that you've setup in your SSH config. Otherwise, specify the username@remote such as username@aida.cac.cornell.edu.

Then open http://localhost:8501 in your browser. This script forwards port 8501 from the compute node c0021 -> AIDA head node -> your laptop.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aida.md

aida.md

Instructions for AIDA cluster

GPU allocation suggestions

Running the Chat Interface

Files

aida.md

Latest commit

History

aida.md

File metadata and controls

Instructions for AIDA cluster

GPU allocation suggestions

Running the Chat Interface