BUG: ollama multimodal models don't support parallel requests yet #8531

AAEE86 · 2024-09-18T10:01:42Z

Self Checks

This is only for bug report, if you would like to ask a question, please head to Discussions.
I have searched for existing issues search for existing issues, including closed ones.
I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
Please do not modify this template :) and fill in all the required fields.

Dify version

0.8.2

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

ollama log:

time=2024-09-18T09:52:53.395Z level=WARN source=sched.go:137 msg="multimodal models don't support parallel requests yet"

Dify ：

Query or prefix prompt is too long, you can reduce the prefix prompt, or shrink the max token, or switch to a llm with a larger token limit size.

✔️ Expected Behavior

No response

❌ Actual Behavior

No response

The text was updated successfully, but these errors were encountered:

dosubot · 2024-09-18T10:04:28Z

I found some similar discussions and issues that might be relevant to your report:

Adding a local ollama model error - This discussion is still open and was last updated on September 12, 2024 ^[1].
An error occurred during credentials validation - This discussion is still open and was last updated on September 4, 2024 ^[2].
May I ask, is the issue I mentioned a bug? How can it be resolved? - This discussion is still open and was last updated on April 24, 2024 ^[3].

Additionally, there was a closed issue related to parallelism:

Add workflow multitasking parallel support - This issue is closed, and the suggested solution was that parallelism is on the roadmap, but the team is currently prioritizing resolving bottlenecks in the process ^[4].

For handling parallel requests, you can adjust the configuration options in Dify:

SERVER_WORKER_AMOUNT: Configure the number of API server workers.
```
SERVER_WORKER_AMOUNT: <number_of_workers>
```
CELERY_WORKER_AMOUNT: Configure the number of Celery workers.
```
CELERY_WORKER_AMOUNT: <number_of_workers>
```

These settings can help manage parallel processing more efficiently ^[5].

For the 'Query or prefix prompt is too long' error, consider the following optimizations:

Reduce the prefix prompt length.
Adjust the max_tokens parameter to a smaller value.
Use a model with a larger token limit, such as GPT-4.
Optimize the prompt structure by separating context, pre-prompt, history, and query ^[6]^[7]^[8]^[9].

Regarding the current state of multimodal model support, Dify supports ASR models and rich-text models up to GPT-4 specifications. The documentation lists various model providers and their supported capabilities, including speech-to-text and text-to-speech functionalities ^[10]^[11].

To continue talking to Dosu, mention @dosu.

AAEE86 · 2024-09-18T10:14:09Z

@dosu
Uploading pictures using a URL will prompt another error.
[ollama] Error: API request failed with status code 400: {"error":"illegal base64 data at input byte 5"}

dosubot bot added the 🐞 bug Something isn't working label Sep 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: ollama multimodal models don't support parallel requests yet #8531

BUG: ollama multimodal models don't support parallel requests yet #8531

AAEE86 commented Sep 18, 2024

dosubot bot commented Sep 18, 2024

AAEE86 commented Sep 18, 2024 •

edited

Loading

BUG: ollama multimodal models don't support parallel requests yet #8531

BUG: ollama multimodal models don't support parallel requests yet #8531

Comments

AAEE86 commented Sep 18, 2024

Self Checks

Dify version

Cloud or Self Hosted

Steps to reproduce

✔️ Expected Behavior

❌ Actual Behavior

dosubot bot commented Sep 18, 2024

AAEE86 commented Sep 18, 2024 • edited Loading

AAEE86 commented Sep 18, 2024 •

edited

Loading