-
Notifications
You must be signed in to change notification settings - Fork 149
add llama.cpp+qwen3vl gguf notebook #41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @yrk111222, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new, detailed Jupyter Notebook tutorial designed to empower users to deploy and interact with the Qwen3-VL visual language model locally. By leveraging Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
本次 PR 新增了一个关于如何使用 llama.cpp 部署 Qwen3-VL GGUF 模型的 Jupyter Notebook 教程。这个教程内容详实,覆盖了从环境准备、模型下载、编译到启动服务和测试的全过程,对用户非常有帮助。
我在 review 中提出了一些改进建议,主要集中在提升 Notebook 的易用性和健壮性方面:
- 健壮性: 改进了服务器启动检查逻辑,使用轮询代替固定时延,并使用了更具体的异常捕获。
- 易用性: 建议移除硬编码的路径和需要手动复制粘贴的步骤,通过代码自动获取路径和数据,让用户可以更流畅地运行整个教程。
- 正确性: 指出了客户端代码中不恰当的 IP 地址使用,并修正了文档中的一处拼写错误。
整体来说,这是一个高质量的教程,稍作修改后会更加出色。
| " time.sleep(20)\n", | ||
| " try:\n", | ||
| " resp = requests.get(\"http://localhost:8080/health\", timeout=5)\n", | ||
| " if resp.status_code == 200:\n", | ||
| " print(f\"\\n🎊 服务器启动成功!\")\n", | ||
| " print(f\" 📍 本地访问: http://localhost:8080\")\n", | ||
| " print(f\" 💬 API已就绪: http://localhost:8080/v1/chat/completions\")\n", | ||
| " return process\n", | ||
| " except:\n", | ||
| " print(\"⏳ 服务器正在努力加载模型,请再等待一分钟...\")\n", | ||
| " print(\" 完成后可手动在浏览器访问 http://localhost:8080 查看\")\n", | ||
| " return process\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
当前的服务器启动检查机制不够健壮。time.sleep(20) 使用了固定的等待时间,但这在不同性能的机器上可能不足以让模型完全加载(注释中也提到可能需要1-3分钟)。此外,except: 会捕获所有类型的异常,包括用户中断(Ctrl+C),这是一种不好的实践。建议使用轮询(polling)机制,在一段时间内(例如3分钟)定期检查健康检查端点 /health,直到服务就绪或超时。同时,异常捕获应更具体,例如只捕获 requests.exceptions.RequestException。
time.sleep(5) # 短暂等待进程启动
print("\n⏳ 正在等待服务器就绪...(预计需要1-3分钟)")
max_wait = 180 # 最长等待3分钟
start_time = time.time()
while time.time() - start_time < max_wait:
try:
resp = requests.get("http://localhost:8080/health", timeout=5)
if resp.status_code == 200:
print(f"\n🎊 服务器启动成功!")
print(f" 📍 本地访问: http://localhost:8080")
print(f" 💬 API已就绪: http://localhost:8080/v1/chat/completions")
return process
except requests.exceptions.RequestException:
# 服务器尚未就绪,继续轮询
pass
time.sleep(10) # 等待10秒后重试
print("⏳ 服务器启动超时或仍在加载。请检查子进程输出。")
print(" 完成后可手动在浏览器访问 http://localhost:8080 查看")
return process
| " time.sleep(20)\n", | ||
| " try:\n", | ||
| " resp = requests.get(\"http://localhost:8080/health\", timeout=5)\n", | ||
| " if resp.status_code == 200:\n", | ||
| " print(f\"\\n🎊 服务器启动成功!\")\n", | ||
| " print(f\" 📍 本地访问: http://localhost:8080\")\n", | ||
| " print(f\" 💬 API已就绪: http://localhost:8080/v1/chat/completions\")\n", | ||
| " return process\n", | ||
| " except:\n", | ||
| " print(\"⏳ 服务器正在努力加载模型,请再等待一分钟...\")\n", | ||
| " print(\" 完成后可手动在浏览器访问 http://localhost:8080 查看\")\n", | ||
| " return process\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同样地,此处的服务器启动检查逻辑不够健壮。固定的 time.sleep(20) 可能不足以等待模型加载,而宽泛的 except: 语句会掩盖潜在的错误。建议改为轮询机制,在设定的超时时间内反复检查健康端点,并捕获更具体的 requests.exceptions.RequestException 异常。
time.sleep(5) # 短暂等待进程启动
print("\n⏳ 正在等待服务器就绪...(预计需要1-3分钟)")
max_wait = 180 # 最长等待3分钟
start_time = time.time()
while time.time() - start_time < max_wait:
try:
resp = requests.get("http://localhost:8080/health", timeout=5)
if resp.status_code == 200:
print(f"\n🎊 服务器启动成功!")
print(f" 📍 本地访问: http://localhost:8080")
print(f" 💬 API已就绪: http://localhost:8080/v1/chat/completions")
return process
except requests.exceptions.RequestException:
# 服务器尚未就绪,继续轮询
pass
time.sleep(10) # 等待10秒后重试
print("⏳ 服务器启动超时或仍在加载。请检查子进程输出。")
print(" 完成后可手动在浏览器访问 http://localhost:8080 查看")
return process
| "%%bash\n", | ||
| "curl -X POST http://localhost:8080/v1/chat/completions \\\n", | ||
| "-H \"Content-Type: application/json\" \\\n", | ||
| "-d '{\n", | ||
| " \"model\": \"qwen3-vl\",\n", | ||
| " \"messages\": [\n", | ||
| " {\n", | ||
| " \"role\": \"user\",\n", | ||
| " \"content\": [\n", | ||
| " {\n", | ||
| " \"type\": \"text\",\n", | ||
| " \"text\": \"图片里有什么?\"\n", | ||
| " },\n", | ||
| " {\n", | ||
| " \"type\": \"image_url\",\n", | ||
| " \"image_url\": {\n", | ||
| " \"url\": \"data:image/jpeg;base64,在这里粘贴你刚刚复制的完整Base64字符串\"\n", | ||
| " }\n", | ||
| " }\n", | ||
| " ]\n", | ||
| " }\n", | ||
| " ],\n", | ||
| " \"max_tokens\": 300,\n", | ||
| " \"temperature\": 0.6\n", | ||
| "}'\n" | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个 curl 命令要求用户手动复制和粘贴 Base64 编码的图片字符串,这在 Jupyter Notebook 中操作不便且容易出错。可以利用 shell 变量来自动化这个过程:先用 base64 命令将图片编码并存入一个变量,然后在 curl 的 JSON 数据中引用该变量。这样用户就无需手动操作了。
%%bash
IMAGE_B64=$(base64 -i bird.jpg | tr -d '\n')
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d "{
\"model\": \"qwen3-vl\",
\"messages\": [
{
\"role\": \"user\",
\"content\": [
{
\"type\": \"text\",
\"text\": \"图片里有什么?\"
},
{
\"type\": \"image_url\",
\"image_url\": {
\"url\": \"data:image/jpeg;base64,${IMAGE_B64}\"
}
}
]
}
],
\"max_tokens\": 300,
\"temperature\": 0.6
}"
| " paths = {\n", | ||
| " 'server': '/your/path/to/llama-server',\n", | ||
| " 'model': '/your/path/to/Qwen3VL-2B-Instruct-Q4_K_M.gguf',\n", | ||
| " 'mmproj': '/your/path/to/mmproj-Qwen3VL-2B-Instruct-F16.gguf'\n", | ||
| " }\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| " path_server = '/your/path/to/llama-server'\n", | ||
| " path_mmproj = '/your/path/to/mmproj.gguf'\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| " img_b64 = base64.b64encode(f.read()).decode('utf-8')\n", | ||
| " \n", | ||
| " # 构建请求\n", | ||
| " url = \"http://0.0.0.0:8080/v1/chat/completions\"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "# 替换为你的图片路径,也可以修改问题\n", | ||
| "ask_ai_about_image(\"/your/path/to/xxx.png\", \"图片里有什么?\")" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "import base64\n", | ||
| "\n", | ||
| "# 读取图片并转换为Base64\n", | ||
| "image_path = \"/your/path/to/xxx.png\" # 替换为你的图片路径\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "outputs": [], | ||
| "source": [ | ||
| "# 带图片的 curl\n", | ||
| "!base64 -i /your/path/to/xxx.png" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
README.md
Outdated
| 另外,我们新增了Blogs板块,为大家带来前沿技术的深度分析。 | ||
|
|
||
| ## 🌿更新日志 | ||
| - ⭐️**2025.12.17** 添加了llama.cpp部署Qwen3vl GGUF的使用样例,请查看[LLama.cpp样例](./LLM-tutorial/notebook/llamacpp+qwen3vl+gguf.ipynb). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
链接文本 LLama.cpp 中存在拼写错误,建议更正为 Llama.cpp 以与项目名称 llama.cpp 保持一致。
| - ⭐️**2025.12.17** 添加了llama.cpp部署Qwen3vl GGUF的使用样例,请查看[LLama.cpp样例](./LLM-tutorial/notebook/llamacpp+qwen3vl+gguf.ipynb). | |
| - ⭐️**2025.12.17** 添加了llama.cpp部署Qwen3vl GGUF的使用样例,请查看[Llama.cpp样例](./LLM-tutorial/notebook/llamacpp+qwen3vl+gguf.ipynb). |
No description provided.