-
-
Notifications
You must be signed in to change notification settings - Fork 2
Do not translate markers #153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #153 +/- ##
==========================================
+ Coverage 88.30% 88.38% +0.07%
==========================================
Files 275 275
Lines 16235 16389 +154
==========================================
+ Hits 14337 14485 +148
- Misses 1898 1904 +6 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should very minimally affect my current code
Reviewed 12 of 12 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ddaspit and @johnml1135)
machine/corpora/paratext_project_text_updater_base.py
line 30 at r1 (raw file):
text_behavior: UpdateUsfmTextBehavior = UpdateUsfmTextBehavior.PREFER_EXISTING, embed_behavior: UpdateUsfmIntraVerseMarkerBehavior = UpdateUsfmIntraVerseMarkerBehavior.PRESERVE, syle_behavior: UpdateUsfmIntraVerseMarkerBehavior = UpdateUsfmIntraVerseMarkerBehavior.STRIP,
"style" is misspelled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 12 of 12 files at r1, all commit messages.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @johnml1135)
machine/corpora/scripture_ref_usfm_parser_handler.py
line 35 at r1 (raw file):
return ScriptureTextType.NONE if len(self._cur_text_type_stack) == 0 else self._cur_text_type_stack[-1] def get_in_note_text(self) -> bool:
This method should be prefixed with an underscore, since it is only intended to be used internally or by subclasses. Also, _is_in_note_text
would be a more consistent name.
machine/corpora/scripture_ref_usfm_parser_handler.py
line 122 at r1 (raw file):
self._in_embed = False def start_embed(self, state: UsfmParserState, marker: str, caller: str, category: Optional[str]) -> None:
This method should be prefixed with an underscore, since it is only intended to be used internally or by subclasses.
machine/corpora/scripture_ref_usfm_parser_handler.py
line 130 at r1 (raw file):
self._next_element(marker) def end_embed(
This method should be prefixed with an underscore, since it is only intended to be used internally or by subclasses.
machine/corpora/scripture_ref_usfm_parser_handler.py
line 173 at r1 (raw file):
def _end_non_verse_text(self, state: UsfmParserState, scripture_ref: ScriptureRef) -> None: ... def start_note_text(self, state: UsfmParserState):
This method should be prefixed with an underscore, since it is only intended to be used internally or by subclasses.
machine/corpora/scripture_ref_usfm_parser_handler.py
line 180 at r1 (raw file):
def _start_note_text(self, state: UsfmParserState, scripture_ref: ScriptureRef) -> None: ... def end_note_text(self, state: UsfmParserState):
This method should be prefixed with an underscore, since it is only intended to be used internally or by subclasses.
tests/testutils/data/usfm/Tes/41MATTes.SFM
line 2 at r1 (raw file):
\id MAT - Test \f + \fr 1.0 \ft \f*
We are trying to transition away from testing USFM reading/writing using these test files. I would like to stop using this file to test new cases.
tests/corpora/tests.cs
line 0 at r1 (raw file):
I think this file got checked-in accidentally.
machine/corpora/update_usfm_parser_handler.py
line 18 at r1 (raw file):
class UpdateUsfmIntraVerseMarkerBehavior(Enum):
I think we can just shorten this to UpdateUsfmMarkerBehavior
.
machine/corpora/update_usfm_parser_handler.py
line 25 at r1 (raw file):
class UpdateUsfmParserHandler(ScriptureRefUsfmParserHandler): _untranslatable_paragraph_tags = ("r", "rem")
This class could be used for other cases where you want to update the text in USFM and not just translation. Can we handle these specific translation cases in Serval, i.e. don't pass in translation segments for these tags?
Previously, isaac091 (Isaac Schifferer) wrote…
Nice catch :-) |
Previously, ddaspit (Damien Daspit) wrote…
Done. |
Previously, ddaspit (Damien Daspit) wrote…
Done. |
Previously, ddaspit (Damien Daspit) wrote…
It's different for dotnet - underscores are only for private variables, not protected. I'll update it. |
Previously, ddaspit (Damien Daspit) wrote…
Done. |
Previously, ddaspit (Damien Daspit) wrote…
Done. |
Previously, ddaspit (Damien Daspit) wrote…
Done. |
Previously, ddaspit (Damien Daspit) wrote…
Done. |
Previously, ddaspit (Damien Daspit) wrote…
I tried to keep the changes minimal and move many of the tests other places. Because the parsing was different, I needed to at least update some of the results, especially around the notes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 9 files at r2, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @isaac091 and @johnml1135)
machine/corpora/scripture_ref_usfm_parser_handler.py
line 130 at r1 (raw file):
Previously, johnml1135 (John Lambert) wrote…
It's different for dotnet - underscores are only for private variables, not protected. I'll update it.
Yeah, by convention, we've been using underscores for private and protected fields/methods in Machine.py.
tests/testutils/data/usfm/Tes/41MATTes.SFM
line 2 at r1 (raw file):
Previously, johnml1135 (John Lambert) wrote…
I tried to keep the changes minimal and move many of the tests other places. Because the parsing was different, I needed to at least update some of the results, especially around the notes.
I'm not sure I understand why changes were necessary to this file. Shouldn't UsfmTextBase
still parse this file the same way it did before? The tests for this file are in place to guard against regressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @johnml1135)
Previously, ddaspit (Damien Daspit) wrote…
Ok - this really needs to be done in Serval then, not in Machine. I'll tear it out. |
Previously, ddaspit (Damien Daspit) wrote…
Yes, I could have made the changes with independent tests, rather then overloading these. The only actual "change" per say is the 1:esb changing to 2:esb - it parses differently because the \fm tag registers as an embed. If we need to, I can remove the modifications and add more individual tests to verify the edge cases. |
Previously, johnml1135 (John Lambert) wrote…
Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @johnml1135)
tests/testutils/data/usfm/Tes/41MATTes.SFM
line 2 at r1 (raw file):
Previously, johnml1135 (John Lambert) wrote…
Yes, I could have made the changes with independent tests, rather then overloading these. The only actual "change" per say is the 1:esb changing to 2:esb - it parses differently because the \fm tag registers as an embed. If we need to, I can remove the modifications and add more individual tests to verify the edge cases.
Yes, that is the plan moving forward. We no longer want to use this file for new tests. It has just gotten too difficult to maintain and it isn't explicit what we are testing for. We should create individual tests for all new cases.
Previously, johnml1135 (John Lambert) wrote…
@ddaspit - I am inclined to leave it as is right now, but if you think it best, I can work on minimizing changes to the MAT file. |
Previously, johnml1135 (John Lambert) wrote…
Ok - I will create the new tests. |
…m machine.py Updated logic * Add "nested" embeds * If we strip embeds and there is an updated embed, strip it All tests pass
38c87db
to
ba783fb
Compare
@ddaspit - the changes requested have been implemented. |
Previously, johnml1135 (John Lambert) wrote…
done |
49edd20
to
5ebd6af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 11 of 11 files at r4, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @johnml1135)
tests/testutils/data/usfm/Tes/41MATTes.SFM
line 2 at r1 (raw file):
Previously, johnml1135 (John Lambert) wrote…
done
You forgot to revert the footnote marker that occurs right after the \id
marker. I believe that was added to check for a specific issue.
machine/corpora/scripture_ref_usfm_parser_handler.py
line 278 at r4 (raw file):
return marker == "ft" def _is_embed_part(self, marker: Optional[str]) -> bool:
The name of this method does not clearly communicate the purpose of this method is and how it is different than other "embed" methods.
machine/corpora/scripture_ref_usfm_parser_handler.py
line 281 at r4 (raw file):
return marker is not None and marker.startswith(EMBED_STARTING_CHARS) def _is_embed_character(self, marker: Optional[str]) -> bool:
I wasn't sure what this meant at first. I would rename this to _is_embed_character_style
.
machine/corpora/scripture_ref_usfm_parser_handler.py
line 282 at r4 (raw file):
def _is_embed_character(self, marker: Optional[str]) -> bool: return marker in ("f", "fe", "fig", "fm", "x")
This set can be a constant.
machine/corpora/usfm_text_base.py
line 208 at r4 (raw file):
state.token is not None and self._is_in_embed(state.token.marker) and (not self._is_in_note_text() or self._is_in_nested_embed(state.token.marker))
It is unclear what this if
statement is for. Could you put a comment explaining or (probably better) put the conditional in a method/property with a name that describes what the conditional is checking?
Previously, ddaspit (Damien Daspit) wrote…
Whether the marker is in the group of markers associated with all types of embeds - figs, f, fe, ft, fr, xt, xta, etc. |
Previously, ddaspit (Damien Daspit) wrote…
Done. |
Previously, johnml1135 (John Lambert) wrote…
_is_embed_part_style |
Previously, ddaspit (Damien Daspit) wrote…
Done. |
Previously, ddaspit (Damien Daspit) wrote…
Done. |
Previously, ddaspit (Damien Daspit) wrote…
It was - but I moved it to another test. The numbering is now done differently so that if I keep it, all the numbers are thrown off. By removing it here and moving it to a different test, it minimizes the changes needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r5, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @johnml1135)
machine/corpora/scripture_ref_usfm_parser_handler.py
line 281 at r4 (raw file):
Previously, johnml1135 (John Lambert) wrote…
Done.
Now that I understand the difference between the methods, I would rename this to _is_embed_style
so that it is consistent with _is_embed_part_style
. I promise this is the last comment.
Previously, ddaspit (Damien Daspit) wrote…
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r6, all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @johnml1135)
This is a mirror PR to sillsdev/machine#281.
This change is