RSE实验的评估细节

Dear Authors,

Thank you for your insightful paper, "Quantifying Distillation in Large Language Models." We found the proposed RSE (Response Similarity Evaluation) method particularly interesting and have been studying it in detail.

In Section 3.2 and Section 4.1.2, it is mentioned that RSE employs an "LLM-as-a-judge" approach to assess response similarity. However, the specific model used as the judge is not explicitly stated. While the ICE experiment (Section 4.1.1) clearly uses GPT4o-mini for evaluation, it is unclear whether the same model was used for RSE or if a different model (e.g., GPT-4o) was adopted.

Could you kindly clarify which specific model served as the judge in the RSE experiments? This information would greatly assist us in better understanding the methodology and replicating the experiments.

Thank you once again for your valuable work. We look forward to your response.

Best regards,
xy


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RSE实验的评估细节 #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

RSE实验的评估细节 #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions