Use circular buffer of infer requests in VLM components #1833

mzegla · 2025-03-03T10:57:35Z

Use circular buffer of infer requests in vision encoder to enable parallel processing of images for VLM pipelines. This will reduce synchronized section

src/cpp/src/visual_language/vision_encoder.hpp

src/cpp/src/visual_language/vision_encoder.cpp

src/cpp/src/continuous_batching_pipeline.cpp

src/cpp/src/lm_encoding.cpp

src/cpp/src/model_runner.hpp

src/cpp/src/visual_language/embedding_model.cpp

src/cpp/include/openvino/genai/continuous_batching_pipeline.hpp

src/cpp/src/visual_language/embedding_model.cpp

src/cpp/src/visual_language/embedding_model.hpp

src/cpp/src/visual_language/minicpm/classes.cpp

src/cpp/src/visual_language/phi3_vision/classes.hpp

src/cpp/src/visual_language/embedding_model.hpp

src/cpp/src/visual_language/vision_encoder.hpp

src/cpp/src/visual_language/qwen2vl/classes.hpp

src/python/py_continuous_batching_pipeline.cpp

src/cpp/src/visual_language/vision_encoder.hpp

use circular buffer of infer requests in embedder and hold shared pointer to single embedding model instead of making copies fix non remote context path rename InferRequest field separate plugin config for inputs embedder review fixes py_openvino_genai adjustment post rebase use ireq queues in other VLM models use infer() instead of start_async->wait flow Update src/python/py_continuous_batching_pipeline.cpp Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com> remove additional enncode method in vision encoder

github-actions bot added category: visual language Visual language pipeline category: continuous batching Continuous batching category: LLM LLM pipeline (stateful, static) no-match-files labels Mar 3, 2025

mzegla force-pushed the parallel_embeddings branch from 7ebc9db to 3ff58a0 Compare March 3, 2025 14:40

mzegla marked this pull request as ready for review March 3, 2025 16:15

mzegla requested review from popovaan and ilya-lavrenov March 3, 2025 16:15

ilya-lavrenov reviewed Mar 3, 2025

View reviewed changes

src/cpp/src/visual_language/vision_encoder.hpp Outdated Show resolved Hide resolved

src/cpp/src/visual_language/vision_encoder.cpp Show resolved Hide resolved

ilya-lavrenov added this to the 2025.1 milestone Mar 3, 2025

ilya-lavrenov assigned ilya-lavrenov and Wovchena Mar 3, 2025

github-actions bot added category: Python API Python API for GenAI category: GenAI C++ API Changes in GenAI C++ public headers labels Mar 4, 2025

mzegla force-pushed the parallel_embeddings branch from 051a9ac to 581db0b Compare March 4, 2025 10:38

Wovchena requested changes Mar 4, 2025

View reviewed changes

src/cpp/src/continuous_batching_pipeline.cpp Outdated Show resolved Hide resolved

src/cpp/src/lm_encoding.cpp Outdated Show resolved Hide resolved

src/cpp/src/model_runner.hpp Outdated Show resolved Hide resolved

src/cpp/src/visual_language/embedding_model.cpp Outdated Show resolved Hide resolved

mzegla force-pushed the parallel_embeddings branch from 44b3534 to 71242f8 Compare March 5, 2025 09:49

mzegla requested review from Wovchena and ilya-lavrenov March 5, 2025 13:30

Wovchena approved these changes Mar 5, 2025

View reviewed changes

ilya-lavrenov reviewed Mar 6, 2025

View reviewed changes

mzegla force-pushed the parallel_embeddings branch from 71242f8 to 9dd1f21 Compare March 11, 2025 15:26

ilya-lavrenov reviewed Mar 12, 2025

View reviewed changes

src/python/py_continuous_batching_pipeline.cpp Outdated Show resolved Hide resolved

src/cpp/src/visual_language/vision_encoder.hpp Outdated Show resolved Hide resolved

github-actions bot added category: tokenizers Tokenizer class or submodule update labels Mar 12, 2025

mzegla added 2 commits March 12, 2025 12:29

adjust pyi

a9fe0a5

mzegla force-pushed the parallel_embeddings branch from 9d14472 to a9fe0a5 Compare March 12, 2025 11:45

ilya-lavrenov approved these changes Mar 12, 2025

View reviewed changes

ilya-lavrenov enabled auto-merge March 12, 2025 11:58

Merge branch 'master' into parallel_embeddings

1394478

ilya-lavrenov added the Code Freeze label Mar 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use circular buffer of infer requests in VLM components #1833

Use circular buffer of infer requests in VLM components #1833

mzegla commented Mar 3, 2025

Use circular buffer of infer requests in VLM components #1833

Are you sure you want to change the base?

Use circular buffer of infer requests in VLM components #1833

Conversation

mzegla commented Mar 3, 2025