Skip to content

Conversation

@yrk111222
Copy link
Collaborator

No description provided.

@gemini-code-assist
Copy link

Summary of Changes

Hello @yrk111222, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new, detailed Jupyter Notebook tutorial designed to empower users to deploy and interact with the Qwen3-VL visual language model locally. By leveraging llama.cpp and the GGUF model format, the tutorial provides step-by-step instructions for downloading the necessary model files, compiling the llama.cpp engine, and launching a local AI server capable of both text-based conversations and image understanding. This addition aims to simplify the process of running powerful multimodal AI models in a local, privacy-preserving environment.

Highlights

  • New Jupyter Notebook: A comprehensive tutorial notebook, llamacpp+qwen3vl+gguf.ipynb, has been added to guide users through local deployment.
  • Local Qwen3-VL Deployment: The notebook provides step-by-step instructions for deploying the Qwen3-VL visual language model locally using llama.cpp and the GGUF format.
  • Multimodal Interaction: It demonstrates how to set up a local server for Qwen3-VL and interact with it for both text-based chat and image understanding capabilities.
  • README Update: The README.md file has been updated to include a new entry in the update log, linking to this newly added tutorial.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

本次 PR 新增了一个关于如何使用 llama.cpp 部署 Qwen3-VL GGUF 模型的 Jupyter Notebook 教程。这个教程内容详实,覆盖了从环境准备、模型下载、编译到启动服务和测试的全过程,对用户非常有帮助。

我在 review 中提出了一些改进建议,主要集中在提升 Notebook 的易用性和健壮性方面:

  • 健壮性: 改进了服务器启动检查逻辑,使用轮询代替固定时延,并使用了更具体的异常捕获。
  • 易用性: 建议移除硬编码的路径和需要手动复制粘贴的步骤,通过代码自动获取路径和数据,让用户可以更流畅地运行整个教程。
  • 正确性: 指出了客户端代码中不恰当的 IP 地址使用,并修正了文档中的一处拼写错误。

整体来说,这是一个高质量的教程,稍作修改后会更加出色。

Comment on lines +306 to +317
" time.sleep(20)\n",
" try:\n",
" resp = requests.get(\"http://localhost:8080/health\", timeout=5)\n",
" if resp.status_code == 200:\n",
" print(f\"\\n🎊 服务器启动成功!\")\n",
" print(f\" 📍 本地访问: http://localhost:8080\")\n",
" print(f\" 💬 API已就绪: http://localhost:8080/v1/chat/completions\")\n",
" return process\n",
" except:\n",
" print(\"⏳ 服务器正在努力加载模型,请再等待一分钟...\")\n",
" print(\" 完成后可手动在浏览器访问 http://localhost:8080 查看\")\n",
" return process\n",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

当前的服务器启动检查机制不够健壮。time.sleep(20) 使用了固定的等待时间,但这在不同性能的机器上可能不足以让模型完全加载(注释中也提到可能需要1-3分钟)。此外,except: 会捕获所有类型的异常,包括用户中断(Ctrl+C),这是一种不好的实践。建议使用轮询(polling)机制,在一段时间内(例如3分钟)定期检查健康检查端点 /health,直到服务就绪或超时。同时,异常捕获应更具体,例如只捕获 requests.exceptions.RequestException

    time.sleep(5)  # 短暂等待进程启动
    print("\n⏳ 正在等待服务器就绪...(预计需要1-3分钟)")
    max_wait = 180  # 最长等待3分钟
    start_time = time.time()
    
    while time.time() - start_time < max_wait:
        try:
            resp = requests.get("http://localhost:8080/health", timeout=5)
            if resp.status_code == 200:
                print(f"\n🎊 服务器启动成功!")
                print(f"   📍 本地访问: http://localhost:8080")
                print(f"   💬 API已就绪: http://localhost:8080/v1/chat/completions")
                return process
        except requests.exceptions.RequestException:
            # 服务器尚未就绪,继续轮询
            pass
        time.sleep(10)  # 等待10秒后重试
    
    print("⏳ 服务器启动超时或仍在加载。请检查子进程输出。")
    print("   完成后可手动在浏览器访问 http://localhost:8080 查看")
    return process

Comment on lines +381 to +392
" time.sleep(20)\n",
" try:\n",
" resp = requests.get(\"http://localhost:8080/health\", timeout=5)\n",
" if resp.status_code == 200:\n",
" print(f\"\\n🎊 服务器启动成功!\")\n",
" print(f\" 📍 本地访问: http://localhost:8080\")\n",
" print(f\" 💬 API已就绪: http://localhost:8080/v1/chat/completions\")\n",
" return process\n",
" except:\n",
" print(\"⏳ 服务器正在努力加载模型,请再等待一分钟...\")\n",
" print(\" 完成后可手动在浏览器访问 http://localhost:8080 查看\")\n",
" return process\n",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

同样地,此处的服务器启动检查逻辑不够健壮。固定的 time.sleep(20) 可能不足以等待模型加载,而宽泛的 except: 语句会掩盖潜在的错误。建议改为轮询机制,在设定的超时时间内反复检查健康端点,并捕获更具体的 requests.exceptions.RequestException 异常。

    time.sleep(5)  # 短暂等待进程启动
    print("\n⏳ 正在等待服务器就绪...(预计需要1-3分钟)")
    max_wait = 180  # 最长等待3分钟
    start_time = time.time()
    
    while time.time() - start_time < max_wait:
        try:
            resp = requests.get("http://localhost:8080/health", timeout=5)
            if resp.status_code == 200:
                print(f"\n🎊 服务器启动成功!")
                print(f"   📍 本地访问: http://localhost:8080")
                print(f"   💬 API已就绪: http://localhost:8080/v1/chat/completions")
                return process
        except requests.exceptions.RequestException:
            # 服务器尚未就绪,继续轮询
            pass
        time.sleep(10)  # 等待10秒后重试
    
    print("⏳ 服务器启动超时或仍在加载。请检查子进程输出。")
    print("   完成后可手动在浏览器访问 http://localhost:8080 查看")
    return process

Comment on lines +632 to +657
"%%bash\n",
"curl -X POST http://localhost:8080/v1/chat/completions \\\n",
"-H \"Content-Type: application/json\" \\\n",
"-d '{\n",
" \"model\": \"qwen3-vl\",\n",
" \"messages\": [\n",
" {\n",
" \"role\": \"user\",\n",
" \"content\": [\n",
" {\n",
" \"type\": \"text\",\n",
" \"text\": \"图片里有什么?\"\n",
" },\n",
" {\n",
" \"type\": \"image_url\",\n",
" \"image_url\": {\n",
" \"url\": \"data:image/jpeg;base64,在这里粘贴你刚刚复制的完整Base64字符串\"\n",
" }\n",
" }\n",
" ]\n",
" }\n",
" ],\n",
" \"max_tokens\": 300,\n",
" \"temperature\": 0.6\n",
"}'\n"
]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

这个 curl 命令要求用户手动复制和粘贴 Base64 编码的图片字符串,这在 Jupyter Notebook 中操作不便且容易出错。可以利用 shell 变量来自动化这个过程:先用 base64 命令将图片编码并存入一个变量,然后在 curl 的 JSON 数据中引用该变量。这样用户就无需手动操作了。

%%bash
IMAGE_B64=$(base64 -i bird.jpg | tr -d '\n')

curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d "{
  \"model\": \"qwen3-vl\",
  \"messages\": [
    {
      \"role\": \"user\",
      \"content\": [
        {
          \"type\": \"text\",
          \"text\": \"图片里有什么?\"
        },
        {
          \"type\": \"image_url\",
          \"image_url\": {
            \"url\": \"data:image/jpeg;base64,${IMAGE_B64}\"
          }
        }
      ]
    }
  ],
  \"max_tokens\": 300,
  \"temperature\": 0.6
}"

Comment on lines +287 to +291
" paths = {\n",
" 'server': '/your/path/to/llama-server',\n",
" 'model': '/your/path/to/Qwen3VL-2B-Instruct-Q4_K_M.gguf',\n",
" 'mmproj': '/your/path/to/mmproj-Qwen3VL-2B-Instruct-F16.gguf'\n",
" }\n",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

此处的路径是硬编码的,要求用户手动填写。为了提升 notebook 的易用性并减少出错的可能,建议利用上一个代码单元格中已经获取的 model_pathmmproj_path 变量。同时,llama-server 的路径也可以通过相对路径 build/bin/llama-server 确定,因为之前的单元格已经切换到了 llama.cpp 目录。这样用户无需手动修改即可运行。如果担心用户可能运行了不同的下载单元格,可以添加说明或检查变量是否存在。

Comment on lines +354 to +355
" path_server = '/your/path/to/llama-server'\n",
" path_mmproj = '/your/path/to/mmproj.gguf'\n",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

与前一个代码单元格类似,此处的路径也是硬编码的。path_server 可以使用相对路径 build/bin/llama-server。对于 path_mmproj,需要用户手动提供。为了改善用户体验,可以考虑从之前的单元格中获取 mmproj_path 变量,或者提供更明确的指导,说明如何获取这个文件路径(例如,提示用户运行之前的下载单元格)。

" img_b64 = base64.b64encode(f.read()).decode('utf-8')\n",
" \n",
" # 构建请求\n",
" url = \"http://0.0.0.0:8080/v1/chat/completions\"\n",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

在客户端请求中,建议使用 http://localhost:8080http://127.0.0.1:8080 而不是 http://0.0.0.0:80800.0.0.0 通常用于服务器端,表示监听所有可用的网络接口,但在客户端发起连接时,应指定一个明确的地址。

    url = "http://localhost:8080/v1/chat/completions"

Comment on lines +465 to +466
"# 替换为你的图片路径,也可以修改问题\n",
"ask_ai_about_image(\"/your/path/to/xxx.png\", \"图片里有什么?\")"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

此处的图片路径是硬编码的,需要用户自己提供图片并修改路径。为了让教程更易于上手和复现,建议在 notebook 中提供一个下载示例图片的命令(例如使用 !wget),然后直接使用下载好的图片路径。这样用户无需任何手动修改就能运行此单元格并看到结果。

# 下载一张示例图片并提问
!wget -q --show-progress https://modelscope.oss-cn-beijing.aliyuncs.com/demo/images/frigate_bird.jpg -O bird.jpg
ask_ai_about_image("bird.jpg", "图片里有什么?")

"import base64\n",
"\n",
"# 读取图片并转换为Base64\n",
"image_path = \"/your/path/to/xxx.png\" # 替换为你的图片路径\n",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

与之前的单元格类似,此处的图片路径也是硬编码的。为了提高 notebook 的可复现性和用户体验,建议使用之前下载的示例图片,或者提供一个下载命令,这样用户无需手动修改即可运行。

image_path = "bird.jpg"  # 替换为你的图片路径,或使用之前下载的 bird.jpg

"outputs": [],
"source": [
"# 带图片的 curl\n",
"!base64 -i /your/path/to/xxx.png"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

此处的图片路径是硬编码的。为了方便用户,建议使用之前下载的示例图片 bird.jpg

!base64 -i bird.jpg

README.md Outdated
另外,我们新增了Blogs板块,为大家带来前沿技术的深度分析。

## 🌿更新日志
- ⭐️**2025.12.17** 添加了llama.cpp部署Qwen3vl GGUF的使用样例,请查看[LLama.cpp样例](./LLM-tutorial/notebook/llamacpp+qwen3vl+gguf.ipynb).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

链接文本 LLama.cpp 中存在拼写错误,建议更正为 Llama.cpp 以与项目名称 llama.cpp 保持一致。

Suggested change
- ⭐️**2025.12.17** 添加了llama.cpp部署Qwen3vl GGUF的使用样例,请查看[LLama.cpp样例](./LLM-tutorial/notebook/llamacpp+qwen3vl+gguf.ipynb).
- ⭐️**2025.12.17** 添加了llama.cpp部署Qwen3vl GGUF的使用样例,请查看[Llama.cpp样例](./LLM-tutorial/notebook/llamacpp+qwen3vl+gguf.ipynb).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant