Conversation
laurejt
left a comment
There was a problem hiding this comment.
This is looking pretty good, but needs some changes regarding logging and exception handling.
Within compute_cometkiwi,
- Do not suppress the model loading progress bar
- Update the exception handling so that it only catches the specific exception types of interest and does not rely directly on the messages of the exceptions
src/muse/evaluation/metrics.py
Outdated
| # Suppress stdout/stderr during model loading | ||
| with ( | ||
| contextlib.redirect_stdout(io.StringIO()), | ||
| contextlib.redirect_stderr(io.StringIO()), | ||
| ): |
There was a problem hiding this comment.
Generally, we should never be redirecting stdout/stderr. Packages usually have means of suppress/turning off logging and progress bars. For HuggingFace, see the packages utilities.
In this context, it is useful to see that the HuggingFace model is being loaded because it can be a choking point for machines with limited RAM. So, the model loading progress bar should not be suppressed.
src/muse/evaluation/metrics.py
Outdated
| except Exception as e: | ||
| # Check if this is an authentication/gated model error | ||
| # The comet package wraps authentication errors in a KeyError with | ||
| # "not supported by COMET" message, so we need to check the cause chain | ||
| error_msg = str(e).lower() | ||
|
|
||
| # Check the exception cause chain for authentication-related errors | ||
| is_auth_error = False | ||
| current = e | ||
| while current is not None: | ||
| current_msg = str(current).lower() | ||
| if any( | ||
| keyword in current_msg | ||
| for keyword in [ | ||
| "403", | ||
| "gated", | ||
| "authentication", | ||
| "authorized", | ||
| "forbidden", | ||
| ] | ||
| ): | ||
| is_auth_error = True | ||
| break | ||
| current = getattr(current, "__cause__", None) | ||
|
|
||
| # Also check if it's the specific "not supported" error from comet | ||
| # which typically indicates an authentication issue with gated models | ||
| if "not supported by comet" in error_msg or is_auth_error: | ||
| msg = ( | ||
| "Authentication required for CometKiwi model. " | ||
| "Please:\n" | ||
| "1. Visit https://huggingface.co/Unbabel/wmt22-cometkiwi-da and accept the license\n" | ||
| "2. Run: hf auth login\n" | ||
| "3. Enter your HuggingFace token when prompted" | ||
| ) | ||
| raise RuntimeError(msg) from e | ||
| # Re-raise other errors | ||
| raise |
There was a problem hiding this comment.
Only catch the types of exceptions you're attempting to handle. Code should be checking the type of the exception, not its message.
dd16f55 to
cfe714a
Compare
It was because |
laurejt
left a comment
There was a problem hiding this comment.
Thanks for explaining the error behavior of comet's download_model. Add a quick comment in the code, so we can document this idiosyncrasy. Otherwise this looks ready to go. 🚀
src/muse/evaluation/metrics.py
Outdated
| except KeyError as e: | ||
| msg = ( |
There was a problem hiding this comment.
Just add a quick comment here mentioning that comet.download_model catches all and re-raises any exceptions as KeyErrors. Thanks for identifying this issue, it's a bit of an odd one.
Associated Issue(s): resolves #52
Changes in this PR
Include all key changes in this pull request
compute_cometkiwi()function tometrics.pyusing theUnbabel/wmt22-cometkiwi-damodelevaluate_corpus.pyscript; CSV output now includescometkiwicolumn alongsidechrfandcometNotes
LOADED_METRICSdictionary to reuse loaded models across evaluations)Reviewer Checklist
Include discrete checks that should be done by the reviewer beyond looking through
code and/or file changes. Note that this check list will correspond to tasks within
the PR overview page.
compute_cometkiwi()function signature matches pattern (takestr_textandsrc_text, returns float)evaluate_corpus.pycorrectly computes and writes CometKiwi scores to CSV