GSOC 2026 : Project #6 Develop an OpenVINO-Domain Specialized Coder Model with SFT/GRPO/RAG #34261

abhayuvi · 2026-02-23T10:45:07Z

abhayuvi
Feb 23, 2026

Hello @7taozhou7 , @yinquan251 and Everyone! Hope you're doing well.

I'm Yuvraj, graduating this summer in B.Tech CS with AI & ML. For the past 10 months I've been working as a full-time intern at a US-based MNC, where I've built and shipped RAG and SFT pipelines that are currently in production and used globally . so the core technologies in this project are ones I work with daily, not just academically.

I've been following the OpenVINO repo since my 2nd year, and this year Project-6 "Develop an OpenVINO-Domain Specialized Coder Model with SFT/GRPO/RAG" immediately caught my attention. I spent this weekend doing a deep dive into the project description and the referenced HuggingFace cookbook, and put together a draft architecture covering the full pipeline:

After working through the design, I have a few questions I'd love your input on:

Base model : DeepSeek-Coder-7B or Qwen2.5-Coder-7B seem like strong fits since both are already code-pretrained, making SFT a domain adaptation task rather than teaching coding from scratch. Any preference or constraint on the model choice?
RAG knowledge base : Planning to index all publicly available OpenVINO data: official docs, GitHub repos, notebooks, forums, and issues. Any additional internal or curated data worth factoring in?
GRPO reward functions : Four signals assumed: compilation success, OpenVINO API validity, output correctness, and code efficiency. Any domain-specific signals that should be prioritized or added?
Latency target : Device-specific precision planned (INT8 for CPU, FP16 for GPU, INT4 for NPU) via NNCF to hit the <100ms target. Any recommended benchmark setup or hardware configuration for profiling?

I know the project is marked Hard and I've likely missed things in my initial design , that's exactly what I'm hoping to discuss. I'd love to collaborate and contribute meaningfully to this.
If anyone else in the community has worked on similar OpenVINO + LLM projects or has thoughts on the architecture, I'd love to hear your input too! Looking forward to the discussion.

pratikrath126 · 2026-02-23T10:54:47Z

pratikrath126
Feb 23, 2026

Hey @abhayuvi, this is a fantastic and well-thought-out breakdown for GSoC! Your focus on GRPO reward functions (compilation success/API validity) and quantization targets (NNCF/INT4) shows you've really looked into the technical constraints.

Regarding the base model, Qwen2.5-Coder-7B has been showing great performance in recent benchmarks, so that could be a strong contender. For the RAG knowledge base, don't forget the Jupyter Notebooks in the OpenVINO training materials—they often have the most up-to-date API usage patterns. Good luck with the proposal!

1 reply

abhayuvi Feb 24, 2026
Author

Thank you @pratikrath126 — I really appreciate your inputs.

Noted on Qwen2.5-Coder-7B and the OpenVINO Jupyter Notebooks.

7taozhou7 · 2026-02-24T02:34:36Z

7taozhou7
Feb 24, 2026

Hi @abhayuvi,
It's amazing to see the effort you've put into this over the weekend. The architecture diagram looks like a fantastic starting point, and your grasp of the SFT/GRPO/RAG pipeline is spot on.

Base Model Choice
Both Qwen2.5-Coder-7B and DeepSeek-Coder-7B are top-tier choices for this. Currently, Qwen2.5-Coder has shown exceptional performance and instruction-following capabilities.
RAG Knowledge Base
Your proposed sources are exactly what we need. However, here is a critical domain-specific nuance you should factor in:
API Versioning: OpenVINO underwent a major API change from the older InferenceEngine API to the new OpenVINO 2.0 API (openvino.runtime). LLMs are notoriously prone to generating deprecated InferenceEngine code because of older training data.

Curated Data: We should prioritize the latest OpenVINO 2024.x/2025.x documentation. Additionally, specifically curating examples of "Legacy API -> New API" migrations, OpenVINO GenAI API usage, and NNCF quantization snippets will drastically improve the RAG performance.

Benchmarking: I recommend using the OpenVINO benchmark_app and the openvino.genai C++/Python API for evaluating your final exported model.

Looking forward to discussing this further with you and seeing how your implementation evolves!

2 replies

abhayuvi Feb 24, 2026
Author

Thank you @7taozhou7 , I really appreciate the detailed inputs.

Noted on Qwen2.5-Coder-7B, and using benchmark_app and the openvino.genai C++/Python API for evaluation — I’ll prioritize all of these.

@7taozhou7 — your point on API versioning made the role of the teacher model much clearer. Based on this, I see two possible approaches for handling legacy vs. new APIs.

we can fine-tune the RAG retriever to prioritize newer-version documentation and generate responses based on the latest APIs. However, an edge case arises here: what if a query explicitly asks for previous-version code? In that case, we cannot hardcode the retriever to return only newer-version responses. The system needs to remain flexible.
The SFT dataset and SFT training will play an important role, with the teacher model being key. While generating the SFT dataset, the teacher model will assign higher weights to newer-version data. As a result, the model will prioritize modern-version data while still retaining knowledge of previous-versioned data in its training. This is important because there could be edge cases where a user provides previous-version code and explicitly asks for conversion into the newer API. In such scenarios, the model must have knowledge of the complete versioned data in order to perform accurate migration and debugging.

I’m personally more inclined toward the second approach. While we can further tweak the RAG system and rely more heavily on the first approach through prompt and retrieval tuning, But in practice, training bias is much stronger than prompt-based control. As a result, relying only on retrieval and prompting may not be sufficiently robust for consistent version-aware generation.

I’ve worked on a similar setup at an organizational level, where the goal was to generate newer-version responses from a RAG pipeline. This project is one level deeper, as it involves GRPO training on top of that.

Which is why I’m also planning to add an additional reward function in GRPO that rewards outputs based on newer-version APIs and penalizes deprecated code (such as InferenceEngine), while encouraging openvino.runtime and openvino.genai usage. This ensures that, although the model understands both APIs, it defaults to generating modern code.

And on the retrieval side, we’ll tune the RAG system accordingly, But with the major focus remaining on SFT/GRPO. At the same time, we’ll implement a fallback mechanism so that newer-version content is retrieved by default, while still allowing older sources when a query explicitly requires them. Since the project is planned for local deployment, the knowledge base will be kept lean and optimized to keep the footprint manageable on AIPC hardware.

Would love to hear your thoughts on whether this direction aligns with the vision for the project If you see any potential issues or areas for improvement, please let me know what changes I should consider.

Looking forward to discussing this further — is there a preferred channel (email, GitHub, or Discord) to stay in touch as the proposal develops?

MalakMoqbel03 Mar 29, 2026

Hi @7taozhou7 and @Quanyin,

Thank you @abhayuvi for sharing that link, very helpful context.

My name is Malak Moqbel, and I have also submitted a proposal for Project #6. After reading the mentor's feedback on API versioning and the discussion above, I want to share my perspective and ask for the prerequisite task as mentioned in the GSoC announcements.

On the API versioning problem, I agree with the dual-approach direction. My addition: I would implement a metadata tagging system in the RAG knowledge base where each document chunk is labeled with its OpenVINO version (e.g., 2022.x legacy, 2024.x modern). At retrieval time, the system defaults to modern-API chunks, but version tags are passed into the prompt context so the model can reason explicitly about which API it is generating. This makes version-awareness transparent to the model rather than hidden in retrieval weights alone, which I believe makes the GRPO reward signal cleaner to design.

On the GRPO reward function, I plan to use a multi-signal reward:

Executability: does the generated code run without error?
API correctness: does it use openvino.runtime / openvino.genai rather than InferenceEngine?
Output correctness: does the inference result match expected shape and values?
Latency: does it meet the under 100ms target on CPU?

The API correctness signal directly addresses the deprecated-code problem at the training level, which I think complements the RAG retrieval tuning rather than duplicating it.

I have already run openvino_notebooks examples, converted a PyTorch model using openvino.convert_model(), tested inference with ov.Core, benchmarked with benchmark_app, and fine-tuned a transformer (MARBERTv2) using Hugging Face TRL with a full SFT loop. I also built a RAG-based production assistant during my internship.

Could you please share the prerequisite task for this project? I am ready to start immediately and open a PR before the deadline.

Thank you.
Malak Moqbel

7taozhou7 · 2026-02-25T02:46:22Z

7taozhou7
Feb 25, 2026

Thank you for putting so much thought into this. This is an excellent analysis, and I completely agree with your assessment.

Here are my thoughts.

SFT Strategy & Approach 2 (Strongly Agree)
I am fully aligned with your preference for the second approach. In the context of OpenVINO, the transition from the legacy API (InferenceEngine) to the modern API (openvino.runtime and the new openvino.genai) is a massive pain point for many developers. We absolutely need the model to retain legacy knowledge, not just to generate it when asked, but more importantly, to understand it for code migration and debugging tasks.
Curating the SFT dataset to heavily weight modern APIs while including specific "Legacy -> Modern migration" pairs will lay the perfect foundation.
GRPO & The Reward Function (Great idea, with one refinement)
Your idea of adding a specific reward function for API versioning in GRPO is brilliant. Rewarding openvino.runtime/openvino.genai and penalizing InferenceEngine will effectively align the model's default behavior.

Using GitHub to communicate is fine.

Looking forward to your next move

1 reply

abhayuvi Feb 25, 2026
Author

Sure @7taozhou7, thank you for the validation and the refinement on the reward function , it's really helpful. Will plan and draft the proposal accordingly and share it here for a review before submitting. Looking forward to building this out!

abhayuvi · 2026-02-25T07:17:46Z

abhayuvi
Feb 25, 2026
Author

Also @7taozhou7, @yinquan251 — have been looking through the repo for good first issues to contribute to before the proposal, but couldn't find any open ones currently. Would appreciate any pointers on where a meaningful contribution could be made , whether that's documentation, tests, or any area related to the project scope.

1 reply

abhayuvi Mar 18, 2026
Author

After going through the codebase more thoroughly, found a bug and an enhancement opportunity in the NNCF repository and raised PRs for both. Reviews and discussions are currently ongoing on those PRs.

morty649 · 2026-03-09T19:16:55Z

morty649
Mar 9, 2026

Wow, this is sick.I am currently working on Reinforcement and RAG alignment work and I thought of drafting a proposal for this project and working on the project but i think @abhayuvi is the right fit here. The architecture explains a lot without being too complex. Good luck man.

0 replies

abhayuvi · 2026-03-18T08:47:55Z

abhayuvi
Mar 18, 2026
Author

Hi @7taozhou7 and @yinquan251, wanted to share a quick update on the architecture and the proposal.
Since the initial diagram, several improvements have been made based on the discussion and mentor feedback.
Changes from the initial architecture:

The SFT dataset curation now includes a seed question generator , real developer questions mined from forums and issues , which are fed into the Teacher Model alongside the RAG knowledge base to generate high quality and diverse (question → OpenVINO code) training pairs.
ONNX export step has been removed entirely. The model will now be directly exported to OpenVINO IR using optimum-intel (OVModelForCausalLM), which is cleaner and avoids the dynamic shape issues that come with exporting modern LLMs to ONNX.
The RAG pipeline has been strengthened. Instead of plain dense retrieval, the system now uses AST-based chunking for GitHub code (to preserve complete function and class boundaries), hybrid search combining BM25 and BGE-M3 dense embeddings, and a cross-encoder reranker before context injection.
The reward functions have been updated from R1-R4 to R1-R5, with the 5th function explicitly validating that generated code uses openvino.runtime / openvino.genai and penalizing deprecated InferenceEngine usage.
Evaluation now includes CodeBLEU alongside pass@1, pass@5, latency benchmarks, API validity rate, and RAG precision@k.
A lightweight Streamlit or terminal TUI interface will be included as part of the deployment, so users can interact with the model locally without any additional setup.
One thing worth mentioning is that a device-specific reward function for GRPO was considered but intentionally skipped. Running actual inference on each device per generation during GRPO training just to validate device-specific suggestions would be computationally very expensive and honestly impractical within the project scope. This is better handled at the RAG level itself — if the knowledge base has enough device-specific optimization examples, the model will naturally learn to suggest the right configurations through context rather than needing a separate reward signal for it.

The updated architecture diagram and full proposal draft have also been shared over email from abhayuvi.raj@gmail.com for review. Would really appreciate any feedback or suggestions before the deadline, happy to make changes if anything needs to be reconsidered. Also thanks for your previous feedbacks which led to correct and improve the architecture.
Looking forward to your thoughts!

0 replies

Prasannajaga · 2026-03-19T07:18:00Z

Prasannajaga
Mar 19, 2026

hey @abhayuvi , really nice work on the clean roadmap. I was exploring similar ideas yesterday, but looks like you’ve already structured it much better.

given the timeline, I’d be happy to collaborate and contribute wherever it helps. I genuinely think working together on a clear direction like this can speed things up a lot.

feel free to ping me anytime if you need any help, good luck!

0 replies

abhayuvi · 2026-03-23T10:04:25Z

abhayuvi
Mar 23, 2026
Author

Hello everyone, I have a few doubts regarding the GSoC 2026 proposal, would really appreciate if someone could help clear these up.
Have already drafted a rough proposal and submitted it on the GSoC portal, currently working on refining it further, which is why these questions came up.

Does the proposal need to be written in LaTeX or is a Google Doc / Word format fine?
Can we include extra information beyond what is mentioned in the GSoC 2026 guidelines on the OpenVINO wiki, or should we strictly stick to the asked format?

Thanks.

0 replies

TravisHaa · 2026-03-25T20:56:41Z

TravisHaa
Mar 25, 2026

@abhayuvi I think this is a really cool project too, I was also browsing the GSOC projects and was thinking of writing a proposal for this project, but it seems like you've got it down. I'd love to help out if needed as well, feel free to ping me!

0 replies

abhayuvi · 2026-03-26T10:03:28Z

abhayuvi
Mar 26, 2026
Author

Hey @7taozhou7 and @yinquan251,

While going through the project in more depth, found a few things worth discussing that were not covered in the initial architecture.

1. Teacher Model Cost

If going with GPT-4o or Claude API for generating the SFT dataset, the cost for 10K–50K samples could go anywhere between $500–$2500. Since we are already working with Qwen2.5-Coder-7B, why not use the same model on cloud compute (Kaggle/Colab) to generate the training samples and then train on those? It reduces the cost to zero and keeps everything within the same ecosystem.

Now there is one risk with this approach worth discussing: model collapse. If the model trains on its own outputs, it can start reinforcing its own mistakes and biases, gradually degrading in quality over iterations. The generated data starts looking like the model's existing patterns rather than diverse high quality examples.

But this can be handled:

Filter aggressively , run every generated sample through the code execution sandbox and only keep the ones that compile and produce correct output
Keep the dataset diverse by using varied seed questions so the model does not generate repetitive patterns
Do only one round of self-instruct, not multiple iterations, to avoid collapse

Research also suggests that 10K–15K high quality samples is generally enough for effective domain adaptation on a pretrained model, so going up to 50K feels like overkill in terms of cost, time and effort. Also, if the organisation is open to covering the API costs, we can go with external models. Would love to know your preference on this.

2. 100ms TTFT Optimization

After breaking down the latency budget, the initial estimate looked like this:

Component	Latency
RAG Retrieval (BM25 + dense)	~10ms
Cross-Encoder Reranker	~20ms
Prompt construction	~2ms
Model TTFT (INT4 on NPU)	~60ms
Post-processing	~5ms
Total	~97ms

This is already within 100ms but leaves very little headroom for any unexpected delays.

A few optimizations that could bring this down further:

Query result caching : most developers working with OpenVINO will ask similar questions repeatedly (how to load a model, how to quantize, deployment configs etc.), so cache hit rate will be reasonably high. Cached queries skip retrieval and reranking entirely.
Lightweight reranker : switching from a large cross-encoder to bge-reranker-base cuts reranker latency from ~20ms to under 10ms while still being significantly better than no reranking.
Async retrieval : starting RAG retrieval and model warmup in parallel so they overlap rather than running sequentially.

With these the revised estimate looks like:

Component	Latency
RAG Retrieval (async, cached)	~5ms
Lightweight reranker	~8ms
Prompt construction	~2ms
Model TTFT (INT4 on NPU)	~60ms
Post-processing	~5ms
Total	~80ms

This gives a comfortable 20ms buffer under the 100ms target. As a fallback mechanism after evaluation, if latency still crosses 100ms, reducing top-k chunks from 5 to 3 is an option worth keeping in mind.

3. GRPO Reward Weighting

Also thinking about adding weights to the GRPO reward functions since multiple reward signals can conflict. For example R4 (efficiency) could conflict with R3 (correctness) if the model learns to write shorter but less correct code. A weighted approach like prioritising R1 and R3 with higher weights and keeping R4 lower would prevent this kind of conflict during training.

Would love to hear your thoughts on all of these, and if there are any other scenarios that should be kept in mind.

0 replies

GSOC 2026 : Project #6 Develop an OpenVINO-Domain Specialized Coder Model with SFT/GRPO/RAG #34261

Uh oh!

Replies: 10 comments · 5 replies

Uh oh!

Uh oh!

abhayuvi Feb 24, 2026 Author

Uh oh!

Uh oh!

abhayuvi Feb 24, 2026 Author

Uh oh!

Uh oh!

Uh oh!

abhayuvi Feb 25, 2026 Author

Uh oh!

Uh oh!

abhayuvi Feb 25, 2026 Author

Uh oh!

abhayuvi Mar 18, 2026 Author

Uh oh!

Uh oh!

Uh oh!

abhayuvi Mar 18, 2026 Author

Uh oh!

Uh oh!

abhayuvi Mar 23, 2026 Author

Uh oh!

Uh oh!

abhayuvi Mar 26, 2026 Author

1. Teacher Model Cost

2. 100ms TTFT Optimization

3. GRPO Reward Weighting

Replies: 10 comments 5 replies

abhayuvi Feb 24, 2026
Author

abhayuvi Feb 24, 2026
Author

abhayuvi Feb 25, 2026
Author

abhayuvi
Feb 25, 2026
Author

abhayuvi Mar 18, 2026
Author

abhayuvi
Mar 18, 2026
Author

abhayuvi
Mar 23, 2026
Author

abhayuvi
Mar 26, 2026
Author