Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use circular buffer of infer requests in VLM components #1833

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

mzegla
Copy link
Collaborator

@mzegla mzegla commented Mar 3, 2025

Use circular buffer of infer requests in vision encoder to enable parallel processing of images for VLM pipelines. This will reduce synchronized section

@github-actions github-actions bot added category: visual language Visual language pipeline category: continuous batching Continuous batching category: LLM LLM pipeline (stateful, static) no-match-files labels Mar 3, 2025
@mzegla mzegla force-pushed the parallel_embeddings branch from 7ebc9db to 3ff58a0 Compare March 3, 2025 14:40
@mzegla mzegla marked this pull request as ready for review March 3, 2025 16:15
@mzegla mzegla requested review from popovaan and ilya-lavrenov March 3, 2025 16:15
@ilya-lavrenov ilya-lavrenov added this to the 2025.1 milestone Mar 3, 2025
@github-actions github-actions bot added category: Python API Python API for GenAI category: GenAI C++ API Changes in GenAI C++ public headers labels Mar 4, 2025
@mzegla mzegla force-pushed the parallel_embeddings branch from 051a9ac to 581db0b Compare March 4, 2025 10:38
@mzegla mzegla force-pushed the parallel_embeddings branch from 44b3534 to 71242f8 Compare March 5, 2025 09:49
@mzegla mzegla requested review from Wovchena and ilya-lavrenov March 5, 2025 13:30
@mzegla mzegla force-pushed the parallel_embeddings branch from 71242f8 to 9dd1f21 Compare March 11, 2025 15:26
@github-actions github-actions bot added category: tokenizers Tokenizer class or submodule update labels Mar 12, 2025
mzegla added 2 commits March 12, 2025 12:29
use circular buffer of infer requests in embedder and hold shared pointer to single embedding model instead of making copies

fix non remote context path

rename InferRequest field

separate plugin config for inputs embedder

review fixes

py_openvino_genai adjustment

post rebase

use ireq queues in other VLM models

use infer() instead of start_async->wait flow

Update src/python/py_continuous_batching_pipeline.cpp

Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com>

remove additional enncode method in vision encoder
@mzegla mzegla force-pushed the parallel_embeddings branch from 9d14472 to a9fe0a5 Compare March 12, 2025 11:45
@ilya-lavrenov ilya-lavrenov enabled auto-merge March 12, 2025 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: continuous batching Continuous batching category: GenAI C++ API Changes in GenAI C++ public headers category: LLM LLM pipeline (stateful, static) category: Python API Python API for GenAI category: tokenizers Tokenizer class or submodule update category: visual language Visual language pipeline Code Freeze no-match-files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants