GSOC 2026 : Project #6 Develop an OpenVINO-Domain Specialized Coder Model with SFT/GRPO/RAG #34261
Replies: 10 comments 5 replies
-
|
Hey @abhayuvi, this is a fantastic and well-thought-out breakdown for GSoC! Your focus on GRPO reward functions (compilation success/API validity) and quantization targets (NNCF/INT4) shows you've really looked into the technical constraints. Regarding the base model, Qwen2.5-Coder-7B has been showing great performance in recent benchmarks, so that could be a strong contender. For the RAG knowledge base, don't forget the Jupyter Notebooks in the OpenVINO training materials—they often have the most up-to-date API usage patterns. Good luck with the proposal! |
Beta Was this translation helpful? Give feedback.
-
|
Hi @abhayuvi,
Curated Data: We should prioritize the latest OpenVINO 2024.x/2025.x documentation. Additionally, specifically curating examples of "Legacy API -> New API" migrations, OpenVINO GenAI API usage, and NNCF quantization snippets will drastically improve the RAG performance.
Looking forward to discussing this further with you and seeing how your implementation evolves! |
Beta Was this translation helpful? Give feedback.
-
|
Thank you for putting so much thought into this. This is an excellent analysis, and I completely agree with your assessment. Here are my thoughts.
Using GitHub to communicate is fine. Looking forward to your next move |
Beta Was this translation helpful? Give feedback.
-
|
Also @7taozhou7, @yinquan251 — have been looking through the repo for good first issues to contribute to before the proposal, but couldn't find any open ones currently. Would appreciate any pointers on where a meaningful contribution could be made , whether that's documentation, tests, or any area related to the project scope. |
Beta Was this translation helpful? Give feedback.
-
|
Wow, this is sick.I am currently working on Reinforcement and RAG alignment work and I thought of drafting a proposal for this project and working on the project but i think @abhayuvi is the right fit here. The architecture explains a lot without being too complex. Good luck man. |
Beta Was this translation helpful? Give feedback.
-
|
Hi @7taozhou7 and @yinquan251, wanted to share a quick update on the architecture and the proposal.
The updated architecture diagram and full proposal draft have also been shared over email from abhayuvi.raj@gmail.com for review. Would really appreciate any feedback or suggestions before the deadline, happy to make changes if anything needs to be reconsidered. Also thanks for your previous feedbacks which led to correct and improve the architecture. |
Beta Was this translation helpful? Give feedback.
-
|
hey @abhayuvi , really nice work on the clean roadmap. I was exploring similar ideas yesterday, but looks like you’ve already structured it much better. given the timeline, I’d be happy to collaborate and contribute wherever it helps. I genuinely think working together on a clear direction like this can speed things up a lot. feel free to ping me anytime if you need any help, good luck! |
Beta Was this translation helpful? Give feedback.
-
|
Hello everyone, I have a few doubts regarding the GSoC 2026 proposal, would really appreciate if someone could help clear these up.
Thanks. |
Beta Was this translation helpful? Give feedback.
-
|
@abhayuvi I think this is a really cool project too, I was also browsing the GSOC projects and was thinking of writing a proposal for this project, but it seems like you've got it down. I'd love to help out if needed as well, feel free to ping me! |
Beta Was this translation helpful? Give feedback.
-
|
Hey @7taozhou7 and @yinquan251, While going through the project in more depth, found a few things worth discussing that were not covered in the initial architecture. 1. Teacher Model CostIf going with GPT-4o or Claude API for generating the SFT dataset, the cost for 10K–50K samples could go anywhere between $500–$2500. Since we are already working with Qwen2.5-Coder-7B, why not use the same model on cloud compute (Kaggle/Colab) to generate the training samples and then train on those? It reduces the cost to zero and keeps everything within the same ecosystem. Now there is one risk with this approach worth discussing: model collapse. If the model trains on its own outputs, it can start reinforcing its own mistakes and biases, gradually degrading in quality over iterations. The generated data starts looking like the model's existing patterns rather than diverse high quality examples. But this can be handled:
Research also suggests that 10K–15K high quality samples is generally enough for effective domain adaptation on a pretrained model, so going up to 50K feels like overkill in terms of cost, time and effort. Also, if the organisation is open to covering the API costs, we can go with external models. Would love to know your preference on this. 2. 100ms TTFT OptimizationAfter breaking down the latency budget, the initial estimate looked like this:
This is already within 100ms but leaves very little headroom for any unexpected delays. A few optimizations that could bring this down further:
With these the revised estimate looks like:
This gives a comfortable 20ms buffer under the 100ms target. As a fallback mechanism after evaluation, if latency still crosses 100ms, reducing top-k chunks from 5 to 3 is an option worth keeping in mind. 3. GRPO Reward WeightingAlso thinking about adding weights to the GRPO reward functions since multiple reward signals can conflict. For example R4 (efficiency) could conflict with R3 (correctness) if the model learns to write shorter but less correct code. A weighted approach like prioritising R1 and R3 with higher weights and keeping R4 lower would prevent this kind of conflict during training. Would love to hear your thoughts on all of these, and if there are any other scenarios that should be kept in mind. |
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
-
Hello @7taozhou7 , @yinquan251 and Everyone! Hope you're doing well.
I'm Yuvraj, graduating this summer in B.Tech CS with AI & ML. For the past 10 months I've been working as a full-time intern at a US-based MNC, where I've built and shipped RAG and SFT pipelines that are currently in production and used globally . so the core technologies in this project are ones I work with daily, not just academically.
I've been following the OpenVINO repo since my 2nd year, and this year Project-6 "Develop an OpenVINO-Domain Specialized Coder Model with SFT/GRPO/RAG" immediately caught my attention. I spent this weekend doing a deep dive into the project description and the referenced HuggingFace cookbook, and put together a draft architecture covering the full pipeline:
After working through the design, I have a few questions I'd love your input on:
I know the project is marked Hard and I've likely missed things in my initial design , that's exactly what I'm hoping to discuss. I'd love to collaborate and contribute meaningfully to this.
If anyone else in the community has worked on similar OpenVINO + LLM projects or has thoughts on the architecture, I'd love to hear your input too! Looking forward to the discussion.
Beta Was this translation helpful? Give feedback.
All reactions