Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclosed \fe breaks usfm parser #467

Open
jcuenod opened this issue Jul 30, 2024 · 5 comments
Open

Unclosed \fe breaks usfm parser #467

jcuenod opened this issue Jul 30, 2024 · 5 comments
Labels
pipeline 6: infer Issue related to using a trained model to translate. usfm USFM parsing issue

Comments

@jcuenod
Copy link

jcuenod commented Jul 30, 2024

I'm getting missing caller parameter after \\{token.name} because of missing caller parameter after \fe. For example, a line of USFM:

\v 31 At Ala nen yi mendek ndi mendek wakkagagerak nogo abok aret enegen kage nagagerik, <<Abu obeelom aret kagi,>> mbareegerak. Ti eereegerak nogo o kiip age, o weege, eeke ne ambi 6 aret agagerak.\fe

The text I'm working with actually has a bunch of these. I realize this is probably not valid usfm (I'm not sure), but is there a way to be less strict in parsing it and allowing the \fe to implicitly close?

(Perhaps related to #201)

@ddaspit
Copy link
Collaborator

ddaspit commented Jul 30, 2024

In what context is this occurring? When you try to translate a book?

@jcuenod
Copy link
Author

jcuenod commented Jul 30, 2024

Yes:

2024-07-30 09:54:25,115 - silnlp.nmt.translate - ERROR - Was not able to translate EST.
Traceback (most recent call last):
File "/home/james/silnlp/silnlp/nmt/translate.py", line 123, in translate_books
translator.translate_book(
File "/home/james/silnlp/silnlp/common/translator.py", line 321, in translate_book
self.translate_usfm(
File "/home/james/silnlp/silnlp/common/translator.py", line 346, in translate_usfm
doc: List[sfm.Element] = list(usfm.parser(book_file, stylesheet=stylesheet, canonicalise_footnotes=False))
File "/home/james/silnlp/silnlp/sfm/init.py", line 696, in default
e.extend(sub_parser(e))
File "/home/james/silnlp/silnlp/sfm/init.py", line 696, in default
e.extend(sub_parser(e))
File "/home/james/silnlp/silnlp/sfm/init.py", line 696, in default
e.extend(sub_parser(e))
[Previous line repeated 1 more time]
File "/home/james/silnlp/silnlp/sfm/usfm.py", line 430, in NoteText
self._error(ErrorLevel.Content, "missing caller parameter after \{token.name}", parent)
File "/home/james/silnlp/silnlp/sfm/init.py", line 599, in _error
raise SyntaxError(msg)
SyntaxError: /home/james/silnlp_data/Paratext/projects/WDR/WDREST.SFM: line 180,1 [id c p]: missing caller parameter after \fe
Traceback (most recent call last):
File "/home/james/.pyenv/versions/3.8.19/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/james/.pyenv/versions/3.8.19/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/james/silnlp/silnlp/nmt/translate.py", line 389, in
main()
File "/home/james/silnlp/silnlp/nmt/translate.py", line 358, in main
translator.translate_books(
File "/home/james/silnlp/silnlp/nmt/translate.py", line 140, in translate_books
raise ValueError(f"Some books failed to translate: {' '.join(translation_failed)}")
ValueError: Some books failed to translate: EST

Line numbers may not match perfectly, because I have some local changes (logging things).

So perhaps #444 and #449 are relevant

@ddaspit ddaspit added pipeline 6: infer Issue related to using a trained model to translate. usfm USFM parsing issue labels Jul 30, 2024
@ddaspit
Copy link
Collaborator

ddaspit commented Jul 30, 2024

This issue is related to a host of issues with the USFM parser we are using in the translate script. We are in the process of replacing it with a parser that isn't as strict.

@ddaspit
Copy link
Collaborator

ddaspit commented Sep 30, 2024

@isaac091 Can we close this issue?

@isaac091
Copy link
Collaborator

I didn't close it initially because I couldn't test the exact book, but I would imagine it did get fixed with the change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pipeline 6: infer Issue related to using a trained model to translate. usfm USFM parsing issue
Projects
Status: 📋 Backlog
Development

No branches or pull requests

3 participants