Skip to content

RAG filter in code uses external judge API, but paper says solver itself does verification #8

@528why

Description

@528why

Hi authors, thanks for the great work!

In the paper, the RAG filter verification is described as using the solver itself to answer with collected RAG documents (no search tools), to verify proposer questions:

“...we collect all the search results in the proposer’s trajectory as the RAG documents, and let the solver answer without using search tools. If the proposer’s question is correct... the solver should already have sufficient information to correctly predict the answer...”

However, in the released code, the RAG filter seems to call an external LLM-as-a-judge API, rather than the solver model:

  • quarl/utils/problem_extraction.py initializes llm_judge via get_global_judge(...) with QUARK_BASE_URL / QUARK_MODEL.
  • When use_rag_filter is enabled, _validate_with_external_llm(...) is called, which uses llm_judge.model_based_answer(...) and llm_judge.model_based_match(...).

This appears to be a different model/service than the solver (which is the actor model used in rollout).

  1. Is it actually intended to use a different model (external judge) for the RAG filter, or should it use the solver itself as described in the paper? If the solver is intended, is the current code path incorrect or incomplete?
  2. From your experience, which works better in practice: using an external judge model vs. using the solver itself for RAG verification?
  3. In my experiments, using a stronger external model increases the RAG filter pass rate in early training, but it may also introduce harder questions that the solver later struggles to solve. Is this expected, and do you recommend any mitigation strategies?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions