Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support downloading multiple model files #3216

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

umialpha
Copy link

@umialpha umialpha commented Sep 28, 2024

fix #3181

Changes

  1. Add two more more fields for ModelInfo
pub struct ModelInfo {
pub urls_sha256: Option<Vec<String>>,
pub entrypoint: String,
}
  • urls and urls_sha256 are for multiple files besides the original use(find the download host)
  • entrypoint is for the entrypoint of the models. it is set to "model.gguf" for legacy case, if there are multiple parts of model, entrypoint must be set.
  1. If urls are set, the filename are inferenced by the function tryget_download_filename

example:

{
    "license_name": "Apache 2.0",
    "license_url": "https://choosealicense.com/licenses/apache-2.0/",
    "prompt_template": "<|fim_prefix|>{prefix}<|fim_suffix|>{suffix}<|fim_middle|>",
    "name": "Qwen2.5-Coder-7B-Instruct-GGUF",
    "provider_url": "https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF",
    "urls": [
      "https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF/resolve/main/qwen2.5-coder-7b-instruct-q8_0-00001-of-00003.gguf",
      "https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF/resolve/main/qwen2.5-coder-7b-instruct-q8_0-00002-of-00003.gguf",
      "https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF/resolve/main/qwen2.5-coder-7b-instruct-q8_0-00003-of-00003.gguf"
    ],
    "urls_sha256":[
   "e2fc5918a2b579d8e03a3752ad74dd191bc0f43204c90a29070f273f5283fee1",
   "912b7876d43dc19bbcf09368f4472f6cfea3458067a5bcaa660a68a9958276db",
   "478f6a6b37072eeda02a98a59b6ef0b1a9131c9eae9a1181b6077f5e255fa6b2",
],
    "sha256": "e2fc5918a2b579d8e03a3752ad74dd191bc0f43204c90a29070f273f5283fee1",
"entrypoint": "qwen2.5-coder-7b-instruct-q8_0-00001-of-00003.gguf"
  }

@umialpha
Copy link
Author

@zwpaper @wsxiaoys PTAL

@umialpha
Copy link
Author

related registry-tabby pr
TabbyML/registry-tabby#16

@wsxiaoys
Copy link
Member

I would not recommend reusing the urls field for a partitioned model file. Instead:

  1. Create a new partitioned_urls field for models with partitions.
  2. For each partition, have a corresponding sha256 value.

@umialpha
Copy link
Author

umialpha commented Sep 29, 2024

I would not recommend reusing the urls field for a partitioned model file. Instead:

  1. Create a new partitioned_urls field for models with partitions.
  2. For each partition, have a corresponding sha256 value.

I considered adding partitioned_urls and partitioned_sha256, but this requires considering:

  • The structure of partitioned_urls is the same with urls, which may lead to confusion.
  • If we continue supporting different download_hosts semantics, the content of partitioned_urls are much like urls.

So, I've expanded the semantic of urls to include various partitioned_sha256 for different download_hosts, then filtering them based on download_hosts and override variables.

fn filter_download_urls(model_info: &ModelInfo) -> Vec<String> {
    let download_host = tabby_common::env::get_download_host();
    model_info
        .urls
        .iter()
        .flatten()
        .filter_map(|f| {
            if f.contains(&download_host) {
                if let Some(mirror_host) = tabby_common::env::get_huggingface_mirror_host() {
                    Some(f.replace("huggingface.co", &mirror_host))
                } else {
                    Some(f.to_owned())
                }
            } else {
                None
            }
        })
        .collect()

Let me know if partitioned_urls is necessary and how to maintain compatibility with existing urls. Adding partitioned_urls is not troublesome for me.

@wsxiaoys
Copy link
Member

wsxiaoys commented Oct 4, 2024

  1. I still recommend adding partitioned_urls (perhaps using a typed struct that contains both the URL and the SHA256 value).
  2. With proper refactoring, the mirror host semantics can also be applied to partitioned_urls.

@umialpha
Copy link
Author

umialpha commented Oct 6, 2024 via email

@zwpaper
Copy link
Member

zwpaper commented Oct 9, 2024

Hi @umialpha, sorry for the late reply and thanks for raising the PR!

before heading to the code changes, may I ask some questions to make sure we are on the same page.

  1. I would also stand for creating a new field partitioned_urls
  2. with 1., we should also support multiple hosts for each part of the partitioned_urls
  3. what is the filenames for partitioned model? it seems that 00001-00003 is some kinds of standard the splitted gguf models used, maybe we can keep this convention
  4. like 3., llama-cpp-server use the first part, 00001-00003, as default entrypoint of splitted models, maybe we can also follow this and drop the entry point field?

WDYT @wsxiaoys ?

@umialpha
Copy link
Author

umialpha commented Oct 10, 2024

Hi @umialpha, sorry for the late reply and thanks for raising the PR!

before heading to the code changes, may I ask some questions to make sure we are on the same page.

  1. I would also stand for creating a new field partitioned_urls
  2. with 1., we should also support multiple hosts for each part of the partitioned_urls
  3. what is the filenames for partitioned model? it seems that 00001-00003 is some kinds of standard the splitted gguf models used, maybe we can keep this convention
  4. like 3., llama-cpp-server use the first part, 00001-00003, as default entrypoint of splitted models, maybe we can also follow this and drop the entry point field?

WDYT @wsxiaoys ?

@zwpaper The filename is inferred, first by looking for "CONTENT_DISPOSITION" in the HTTP header, and second by deducing from the URL.

I reviewed the llma.cpp documentation and found only one relevant content, here. It doesn't provide a detailed explanation of name. I am not sure whether it is a good idea to couple our implementation with other internal implementations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should support download multiple files model, e.g., qwen2.5
3 participants