From 1391c58a33aada9dea23963dc537f1edb19c8254 Mon Sep 17 00:00:00 2001
From: "promptless[bot]" <179508745+promptless[bot]@users.noreply.github.com>
Date: Tue, 4 Nov 2025 15:59:31 +0000
Subject: [PATCH 1/3] Add storage location and file path details to model
 caching documentation

---
 serverless/endpoints/model-caching.mdx | 38 ++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)
diff --git a/serverless/endpoints/model-caching.mdx b/serverless/endpoints/model-caching.mdx
index dd61db62..307247f8 100644
--- a/serverless/endpoints/model-caching.mdx
+++ b/serverless/endpoints/model-caching.mdx
@@ -59,6 +59,44 @@ flowchart TD
 ```
 </div>
 
+## Where models are stored
+
+Cached models are stored on the worker container's local disk, separate from any attached network volumes. Runpod automatically manages this internal storage to optimize loading speed.
+
+The cache persists across requests on the same worker, so once a worker initializes, you'll see consistent performance. Since the models live on local disk rather than network volumes, they won't appear on your attached network volumes.
+
+## Accessing cached models
+
+Cached models are stored at `/runpod-volume/huggingface-cache/hub/`. The directory structure follows Hugging Face cache conventions, where forward slashes (`/`) in the model name are replaced with double dashes (`--`).
+
+The path structure follows this pattern:
+
+```
+/runpod-volume/huggingface-cache/hub/models--{organization}--{model-name}/
+```
+
+For example, `meta-llama/Llama-3.2-1B-Instruct` would be stored at:
+
+```
+/runpod-volume/huggingface-cache/hub/models--meta-llama--Llama-3.2-1B-Instruct/
+```
+
+## Using cached models in applications
+
+You can access cached models in your application two ways:
+
+**Direct configuration**: Configure your application to load models directly from `/runpod-volume/huggingface-cache/hub/`. Many frameworks and tools let you specify a custom cache directory for Hugging Face models.
+
+**Symbolic links**: Create symbolic links from your application's expected model directory to the cache location. This is particularly useful for applications like ComfyUI that expect models in specific directories.
+
+For example, create a symbolic link like this:
+
+```bash
+ln -s /runpod-volume/huggingface-cache/hub/models--meta-llama--Llama-3.2-1B-Instruct/ /workspace/models/llama-3.2
+```
+
+This lets your application access cached models without modifying its configuration.
+
 ## Enabling cached models
 
 Follow these steps to select and add a cached model to your Serverless endpoint:

From f96793e16e4fc8c84b3b43404c635605955f5841 Mon Sep 17 00:00:00 2001
From: "promptless[bot]" <179508745+promptless[bot]@users.noreply.github.com>
Date: Tue, 18 Nov 2025 21:05:54 +0000
Subject: [PATCH 2/3] Sync documentation updates

---
 serverless/endpoints/model-caching.mdx | 30 +++++++++-----------------
 1 file changed, 10 insertions(+), 20 deletions(-)

diff --git a/serverless/endpoints/model-caching.mdx b/serverless/endpoints/model-caching.mdx
index 307247f8..34b1e1f7 100644
--- a/serverless/endpoints/model-caching.mdx
+++ b/serverless/endpoints/model-caching.mdx
@@ -61,41 +61,31 @@ flowchart TD
 
 ## Where models are stored
 
-Cached models are stored on the worker container's local disk, separate from any attached network volumes. Runpod automatically manages this internal storage to optimize loading speed.
+Cached models are stored in a Runpod-managed Docker volume and mounted at `/runpod-volume/huggingface-cache/hub/`. This creates a "blended view" where you can see both your network volume contents and cached models under the same `/runpod-volume/` path.
 
-The cache persists across requests on the same worker, so once a worker initializes, you'll see consistent performance. Since the models live on local disk rather than network volumes, they won't appear on your attached network volumes.
+The model cache loads significantly faster than network volumes, reducing cold start times. The cache is automatically managed and persists across requests on the same worker. You'll see cached models overlaid onto your network volume mount point.
 
-## Accessing cached models
+## Accessing cached models in your application
 
-Cached models are stored at `/runpod-volume/huggingface-cache/hub/`. The directory structure follows Hugging Face cache conventions, where forward slashes (`/`) in the model name are replaced with double dashes (`--`).
+Runpod caches models at `/runpod-volume/huggingface-cache/hub/` following Hugging Face cache conventions. The directory structure replaces forward slashes (`/`) from the original model name with double dashes (`--`), and includes a version hash subdirectory.
 
 The path structure follows this pattern:
 
 ```
-/runpod-volume/huggingface-cache/hub/models--{organization}--{model-name}/
+/runpod-volume/huggingface-cache/hub/models--{organization}--{model-name}/snapshots/{version-hash}/
 ```
 
-For example, `meta-llama/Llama-3.2-1B-Instruct` would be stored at:
+For example, the model `gensyn/qwen2.5-0.5b-instruct` would be stored at:
 
 ```
-/runpod-volume/huggingface-cache/hub/models--meta-llama--Llama-3.2-1B-Instruct/
+/runpod-volume/huggingface-cache/hub/models--gensyn--qwen2.5-0.5b-instruct/snapshots/317b7eb96312eda0c431d1dab1af958a308cb35e/
 ```
 
-## Using cached models in applications
+### Current limitations
 
-You can access cached models in your application two ways:
+The version hash in the path currently prevents direct integration with some applications (like ComfyUI worker) that expect to predict paths based solely on model name. We're working on removing the version hash requirement.
 
-**Direct configuration**: Configure your application to load models directly from `/runpod-volume/huggingface-cache/hub/`. Many frameworks and tools let you specify a custom cache directory for Hugging Face models.
-
-**Symbolic links**: Create symbolic links from your application's expected model directory to the cache location. This is particularly useful for applications like ComfyUI that expect models in specific directories.
-
-For example, create a symbolic link like this:
-
-```bash
-ln -s /runpod-volume/huggingface-cache/hub/models--meta-llama--Llama-3.2-1B-Instruct/ /workspace/models/llama-3.2
-```
-
-This lets your application access cached models without modifying its configuration.
+If your application requires specific paths, configure it to scan `/runpod-volume/huggingface-cache/hub/` for models.
 
 ## Enabling cached models
 

From 495de1805b797711b83cecd765af0c2ad8cbef16 Mon Sep 17 00:00:00 2001
From: Mo King <muhsinking@gmail.com>
Date: Tue, 18 Nov 2025 17:35:30 -0500
Subject: [PATCH 3/3] Clarify cached models storage and access details

Updated the section on cached models to clarify storage and access details, and removed outdated limitations.
---
 serverless/endpoints/model-caching.mdx | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/serverless/endpoints/model-caching.mdx b/serverless/endpoints/model-caching.mdx
index 34b1e1f7..f1b00fb9 100644
--- a/serverless/endpoints/model-caching.mdx
+++ b/serverless/endpoints/model-caching.mdx
@@ -61,18 +61,20 @@ flowchart TD
 
 ## Where models are stored
 
-Cached models are stored in a Runpod-managed Docker volume and mounted at `/runpod-volume/huggingface-cache/hub/`. This creates a "blended view" where you can see both your network volume contents and cached models under the same `/runpod-volume/` path.
+Cached models are stored in a Runpod-managed Docker volume mounted at `/runpod-volume/huggingface-cache/hub/`. The model cache is automatically managed and persists across requests on the same worker.
 
-The model cache loads significantly faster than network volumes, reducing cold start times. The cache is automatically managed and persists across requests on the same worker. You'll see cached models overlaid onto your network volume mount point.
+<Note>
+While cached models use the same mount path as network volumes (`/runpod-volume/`), the model loaded from the cache will load significantly faster than the same model loaded from a network volume.
+</Note>
 
 ## Accessing cached models in your application
 
-Runpod caches models at `/runpod-volume/huggingface-cache/hub/` following Hugging Face cache conventions. The directory structure replaces forward slashes (`/`) from the original model name with double dashes (`--`), and includes a version hash subdirectory.
+Models are cached on your workers at `/runpod-volume/huggingface-cache/hub/` following Hugging Face cache conventions. The directory structure replaces forward slashes (`/`) from the original model name with double dashes (`--`), and includes a version hash subdirectory.
 
 The path structure follows this pattern:
 
 ```
-/runpod-volume/huggingface-cache/hub/models--{organization}--{model-name}/snapshots/{version-hash}/
+/runpod-volume/huggingface-cache/hub/models--HF_ORGANIZATION--MODEL_NAME/snapshots/VERSION_HASH/
 ```
 
 For example, the model `gensyn/qwen2.5-0.5b-instruct` would be stored at:
@@ -81,10 +83,6 @@ For example, the model `gensyn/qwen2.5-0.5b-instruct` would be stored at:
 /runpod-volume/huggingface-cache/hub/models--gensyn--qwen2.5-0.5b-instruct/snapshots/317b7eb96312eda0c431d1dab1af958a308cb35e/
 ```
 
-### Current limitations
-
-The version hash in the path currently prevents direct integration with some applications (like ComfyUI worker) that expect to predict paths based solely on model name. We're working on removing the version hash requirement.
-
 If your application requires specific paths, configure it to scan `/runpod-volume/huggingface-cache/hub/` for models.
 
 ## Enabling cached models
@@ -109,4 +107,4 @@ Follow these steps to select and add a cached model to your Serverless endpoint:
   </Step>
 </Steps>
 
-You can add a cached model to an existing endpoint by selecting **Manage → Edit Endpoint** in the endpoint details page and updating the **Model (optional)** field.
\ No newline at end of file
+You can add a cached model to an existing endpoint by selecting **Manage → Edit Endpoint** in the endpoint details page and updating the **Model (optional)** field.