Feature/compute cometkiwi metric by tanhaow · Pull Request #57 · Princeton-CDH/muse

tanhaow · 2026-03-19T18:06:43Z

Associated Issue(s): resolves #52

Changes in this PR

Include all key changes in this pull request

Added compute_cometkiwi() function to metrics.py using the Unbabel/wmt22-cometkiwi-da model
Integrated CometKiwi metric into evaluate_corpus.py script; CSV output now includes cometkiwi column alongside chrf and comet
Added HuggingFace authentication error handling with helpful user guidance for the gated HuggingFace model (--> or should we move this to DeveloperNotes instead?)

Notes

First run will download ~2GB model and cache it for subsequent use (Added model caching via LOADED_METRICS dictionary to reuse loaded models across evaluations)

Reviewer Checklist

Include discrete checks that should be done by the reviewer beyond looking through
code and/or file changes. Note that this check list will correspond to tasks within
the PR overview page.

Verify compute_cometkiwi() function signature matches pattern (takes tr_text and src_text, returns float)
Check that evaluate_corpus.py correctly computes and writes CometKiwi scores to CSV

laurejt

This is looking pretty good, but needs some changes regarding logging and exception handling.

Within compute_cometkiwi,

Do not suppress the model loading progress bar
Update the exception handling so that it only catches the specific exception types of interest and does not rely directly on the messages of the exceptions

laurejt · 2026-03-25T13:32:19Z

src/muse/evaluation/metrics.py

+            # Suppress stdout/stderr during model loading
+            with (
+                contextlib.redirect_stdout(io.StringIO()),
+                contextlib.redirect_stderr(io.StringIO()),
+            ):


Generally, we should never be redirecting stdout/stderr. Packages usually have means of suppress/turning off logging and progress bars. For HuggingFace, see the packages utilities.

In this context, it is useful to see that the HuggingFace model is being loaded because it can be a choking point for machines with limited RAM. So, the model loading progress bar should not be suppressed.

laurejt · 2026-03-25T13:34:41Z

src/muse/evaluation/metrics.py

+        except Exception as e:
+            # Check if this is an authentication/gated model error
+            # The comet package wraps authentication errors in a KeyError with
+            # "not supported by COMET" message, so we need to check the cause chain
+            error_msg = str(e).lower()
+
+            # Check the exception cause chain for authentication-related errors
+            is_auth_error = False
+            current = e
+            while current is not None:
+                current_msg = str(current).lower()
+                if any(
+                    keyword in current_msg
+                    for keyword in [
+                        "403",
+                        "gated",
+                        "authentication",
+                        "authorized",
+                        "forbidden",
+                    ]
+                ):
+                    is_auth_error = True
+                    break
+                current = getattr(current, "__cause__", None)
+
+            # Also check if it's the specific "not supported" error from comet
+            # which typically indicates an authentication issue with gated models
+            if "not supported by comet" in error_msg or is_auth_error:
+                msg = (
+                    "Authentication required for CometKiwi model. "
+                    "Please:\n"
+                    "1. Visit https://huggingface.co/Unbabel/wmt22-cometkiwi-da and accept the license\n"
+                    "2. Run: hf auth login\n"
+                    "3. Enter your HuggingFace token when prompted"
+                )
+                raise RuntimeError(msg) from e
+            # Re-raise other errors
+            raise


Only catch the types of exceptions you're attempting to handle. Code should be checking the type of the exception, not its message.

laurejt

Why are you no longer using comet's provided download_model method? It is what is recommended by the package. If this is necessary, than it should be documented.

tanhaow · 2026-03-31T14:27:58Z

Why are you no longer using comet's provided download_model method? It is what is recommended by the package. If this is necessary, than it should be documented.

It was because download_model catches all exceptions internally and raises all of them just as a generic KeyError, so we can't see the specific HuggingFace exception type like we could with snapshot_download. But thanks for pointing out download_model works is recommended by the package. I have changed back to use it.

laurejt

Thanks for explaining the error behavior of comet's download_model. Add a quick comment in the code, so we can document this idiosyncrasy. Otherwise this looks ready to go. 🚀

laurejt · 2026-03-31T14:32:20Z

src/muse/evaluation/metrics.py

+        except KeyError as e:
+            msg = (


Just add a quick comment here mentioning that comet.download_model catches all and re-raises any exceptions as KeyErrors. Thanks for identifying this issue, it's a bit of an odd one.

tanhaow added 2 commits March 19, 2026 13:49

add cometkiwi metric

0b83807

modify the script

12729a3

tanhaow self-assigned this Mar 19, 2026

Update metrics.py

cfe714a

tanhaow requested a review from laurejt March 19, 2026 18:17

laurejt requested changes Mar 25, 2026

View reviewed changes

tanhaow force-pushed the feature/compute-cometkiwi-metric branch from dd16f55 to cfe714a Compare March 30, 2026 18:40

revise per @laurejt review

ea02d37

tanhaow requested a review from laurejt March 30, 2026 18:55

laurejt reviewed Mar 31, 2026

View reviewed changes

use download_model

0d1aa17

tanhaow requested a review from laurejt March 31, 2026 14:28

laurejt approved these changes Mar 31, 2026

View reviewed changes

add a note about download_model

85c57c5

tanhaow merged commit 67cd36d into develop Mar 31, 2026
1 check failed

tanhaow deleted the feature/compute-cometkiwi-metric branch March 31, 2026 15:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/compute cometkiwi metric#57

Feature/compute cometkiwi metric#57
tanhaow merged 6 commits intodevelopfrom
feature/compute-cometkiwi-metric

tanhaow commented Mar 19, 2026

Uh oh!

laurejt left a comment

Uh oh!

laurejt Mar 25, 2026

Uh oh!

laurejt Mar 25, 2026

Uh oh!

laurejt left a comment •

edited

Loading

Uh oh!

tanhaow commented Mar 31, 2026

Uh oh!

laurejt left a comment

Uh oh!

laurejt Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tanhaow commented Mar 19, 2026

Changes in this PR

Notes

Reviewer Checklist

Uh oh!

laurejt left a comment

Choose a reason for hiding this comment

Uh oh!

laurejt Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

laurejt Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

laurejt left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tanhaow commented Mar 31, 2026

Uh oh!

laurejt left a comment

Choose a reason for hiding this comment

Uh oh!

laurejt Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

laurejt left a comment •

edited

Loading