Skip to content

Feature/compute cometkiwi metric#57

Merged
tanhaow merged 6 commits intodevelopfrom
feature/compute-cometkiwi-metric
Mar 31, 2026
Merged

Feature/compute cometkiwi metric#57
tanhaow merged 6 commits intodevelopfrom
feature/compute-cometkiwi-metric

Conversation

@tanhaow
Copy link
Copy Markdown

@tanhaow tanhaow commented Mar 19, 2026

Associated Issue(s): resolves #52

Changes in this PR

Include all key changes in this pull request

  • Added compute_cometkiwi() function to metrics.py using the Unbabel/wmt22-cometkiwi-da model
  • Integrated CometKiwi metric into evaluate_corpus.py script; CSV output now includes cometkiwi column alongside chrf and comet
  • Added HuggingFace authentication error handling with helpful user guidance for the gated HuggingFace model (--> or should we move this to DeveloperNotes instead?)

Notes

  • First run will download ~2GB model and cache it for subsequent use (Added model caching via LOADED_METRICS dictionary to reuse loaded models across evaluations)

Reviewer Checklist

Include discrete checks that should be done by the reviewer beyond looking through
code and/or file changes. Note that this check list will correspond to tasks within
the PR overview page.

  • Verify compute_cometkiwi() function signature matches pattern (takes tr_text and src_text, returns float)
  • Check that evaluate_corpus.py correctly computes and writes CometKiwi scores to CSV

@tanhaow tanhaow self-assigned this Mar 19, 2026
@tanhaow tanhaow requested a review from laurejt March 19, 2026 18:17
Copy link
Copy Markdown

@laurejt laurejt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty good, but needs some changes regarding logging and exception handling.

Within compute_cometkiwi,

  • Do not suppress the model loading progress bar
  • Update the exception handling so that it only catches the specific exception types of interest and does not rely directly on the messages of the exceptions

Comment on lines +121 to +125
# Suppress stdout/stderr during model loading
with (
contextlib.redirect_stdout(io.StringIO()),
contextlib.redirect_stderr(io.StringIO()),
):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, we should never be redirecting stdout/stderr. Packages usually have means of suppress/turning off logging and progress bars. For HuggingFace, see the packages utilities.

In this context, it is useful to see that the HuggingFace model is being loaded because it can be a choking point for machines with limited RAM. So, the model loading progress bar should not be suppressed.

Comment on lines +128 to +165
except Exception as e:
# Check if this is an authentication/gated model error
# The comet package wraps authentication errors in a KeyError with
# "not supported by COMET" message, so we need to check the cause chain
error_msg = str(e).lower()

# Check the exception cause chain for authentication-related errors
is_auth_error = False
current = e
while current is not None:
current_msg = str(current).lower()
if any(
keyword in current_msg
for keyword in [
"403",
"gated",
"authentication",
"authorized",
"forbidden",
]
):
is_auth_error = True
break
current = getattr(current, "__cause__", None)

# Also check if it's the specific "not supported" error from comet
# which typically indicates an authentication issue with gated models
if "not supported by comet" in error_msg or is_auth_error:
msg = (
"Authentication required for CometKiwi model. "
"Please:\n"
"1. Visit https://huggingface.co/Unbabel/wmt22-cometkiwi-da and accept the license\n"
"2. Run: hf auth login\n"
"3. Enter your HuggingFace token when prompted"
)
raise RuntimeError(msg) from e
# Re-raise other errors
raise
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only catch the types of exceptions you're attempting to handle. Code should be checking the type of the exception, not its message.

@tanhaow tanhaow force-pushed the feature/compute-cometkiwi-metric branch from dd16f55 to cfe714a Compare March 30, 2026 18:40
@tanhaow tanhaow requested a review from laurejt March 30, 2026 18:55
Copy link
Copy Markdown

@laurejt laurejt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you no longer using comet's provided download_model method? It is what is recommended by the package. If this is necessary, than it should be documented.

@tanhaow
Copy link
Copy Markdown
Author

tanhaow commented Mar 31, 2026

Why are you no longer using comet's provided download_model method? It is what is recommended by the package. If this is necessary, than it should be documented.

It was because download_model catches all exceptions internally and raises all of them just as a generic KeyError, so we can't see the specific HuggingFace exception type like we could with snapshot_download. But thanks for pointing out download_model works is recommended by the package. I have changed back to use it.

@tanhaow tanhaow requested a review from laurejt March 31, 2026 14:28
Copy link
Copy Markdown

@laurejt laurejt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining the error behavior of comet's download_model. Add a quick comment in the code, so we can document this idiosyncrasy. Otherwise this looks ready to go. 🚀

Comment on lines +123 to +124
except KeyError as e:
msg = (
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just add a quick comment here mentioning that comet.download_model catches all and re-raises any exceptions as KeyErrors. Thanks for identifying this issue, it's a bit of an odd one.

@tanhaow tanhaow merged commit 67cd36d into develop Mar 31, 2026
1 check failed
@tanhaow tanhaow deleted the feature/compute-cometkiwi-metric branch March 31, 2026 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants