RAG filter in code uses external judge API, but paper says solver itself does verification

Hi authors, thanks for the great work!

In the paper, the RAG filter verification is described as using the *solver itself* to answer with collected RAG documents (no search tools), to verify proposer questions:

> “...we collect all the search results in the proposer’s trajectory as the RAG documents, and let the solver answer without using search tools. If the proposer’s question is correct... the solver should already have sufficient information to correctly predict the answer...”

However, in the released code, the RAG filter seems to call an **external LLM-as-a-judge API**, rather than the solver model:

- `quarl/utils/problem_extraction.py` initializes `llm_judge` via `get_global_judge(...)` with `QUARK_BASE_URL` / `QUARK_MODEL`.
- When `use_rag_filter` is enabled, `_validate_with_external_llm(...)` is called, which uses `llm_judge.model_based_answer(...)` and `llm_judge.model_based_match(...)`.

This appears to be a different model/service than the solver (which is the actor model used in rollout).

1. Is it actually intended to use a *different* model (external judge) for the RAG filter, or should it use the solver itself as described in the paper? If the solver is intended, is the current code path incorrect or incomplete?  
2. From your experience, which works better in practice: using an external judge model vs. using the solver itself for RAG verification?  
3. In my experiments, using a stronger external model increases the RAG filter pass rate in early training, but it may also introduce harder questions that the solver later struggles to solve. Is this expected, and do you recommend any mitigation strategies?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RAG filter in code uses external judge API, but paper says solver itself does verification #8

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RAG filter in code uses external judge API, but paper says solver itself does verification #8

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions