-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Description
Hi authors, thanks for the great work!
In the paper, the RAG filter verification is described as using the solver itself to answer with collected RAG documents (no search tools), to verify proposer questions:
“...we collect all the search results in the proposer’s trajectory as the RAG documents, and let the solver answer without using search tools. If the proposer’s question is correct... the solver should already have sufficient information to correctly predict the answer...”
However, in the released code, the RAG filter seems to call an external LLM-as-a-judge API, rather than the solver model:
quarl/utils/problem_extraction.pyinitializesllm_judgeviaget_global_judge(...)withQUARK_BASE_URL/QUARK_MODEL.- When
use_rag_filteris enabled,_validate_with_external_llm(...)is called, which usesllm_judge.model_based_answer(...)andllm_judge.model_based_match(...).
This appears to be a different model/service than the solver (which is the actor model used in rollout).
- Is it actually intended to use a different model (external judge) for the RAG filter, or should it use the solver itself as described in the paper? If the solver is intended, is the current code path incorrect or incomplete?
- From your experience, which works better in practice: using an external judge model vs. using the solver itself for RAG verification?
- In my experiments, using a stronger external model increases the RAG filter pass rate in early training, but it may also introduce harder questions that the solver later struggles to solve. Is this expected, and do you recommend any mitigation strategies?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels