Skip to content

Conversation

@laura-burdick-sil
Copy link
Collaborator

@laura-burdick-sil laura-burdick-sil commented Dec 2, 2025

Previously, the verse segmentation algorithm was crashing when there was a passage that was only a single verse. This fixes that behavior.


This change is Reviewable

@laura-burdick-sil laura-burdick-sil added the bug Something isn't working label Dec 2, 2025
@laura-burdick-sil laura-burdick-sil moved this to 🔖 Ready in SIL-NLP Research Dec 2, 2025
@laura-burdick-sil laura-burdick-sil moved this from 🔖 Ready to 👀 In review in SIL-NLP Research Dec 2, 2025
Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@Enkidu93 reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @benjaminking)

@laura-burdick-sil
Copy link
Collaborator Author

I've also added some code to output the resulting verses in vref format (so that they can easily be fed directly into NLLB). @Enkidu93, would you also be able to review this code? Thank you!

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! This looks good.

My one concern is versification: It looks like the source and target verses are created without having a versification specified. In both cases, the versification will default to the English versification (luckily the same!). The vref.txt format, I believe, follows the Original versification.

I think we could address handling different versifications properly in a separate issue (unless you'd like to just do it now). For the time being could you confirm that there's a mismatch and if so just change the versification of the verse refs in verse_map to Original before str()ing them so the lookup will be apples-to-apples?

@Enkidu93 reviewed all commit messages.
Reviewable status: 0 of 1 files reviewed, all discussions resolved (waiting on @benjaminking)

@laura-burdick-sil
Copy link
Collaborator Author

That's a good point, thanks for catching that! I've been digging into the code around versification, but I've never worked with it before, so I'm not sure that I've implemented this correctly.

In Paratext, the source project is marked as having a versification of 4 (would that be English versification?). I'm not sure what the versification of the target project is, since I don't have access to their Paratext project.

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Versification is a hairy thing 😬 😁 . 4 would be English, yes. Yeah, it seems like we just didn't think of this before. Probably not making much difference at the moment but we'll definitely want to address this as we move to a more established pipeline.

This works well, but I put in a comment for consideration!

@Enkidu93 reviewed all commit messages.
Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion (waiting on @benjaminking and @laura-burdick-sil)


silnlp/alignment/segment_verses.py line 729 at r3 (raw file):

        for trg_passage in trg_segmented_passages:
            for verse in trg_passage.verses:
                verse.reference.change_versification(ORIGINAL_VERSIFICATION)

You might consider using VerseRef.to_versification which doesn't mutate the original reference and call it while iterating below. Besides saving a few lines, it would guarantee that steps after this step which may access those verse references will access the unchanged references.

@laura-burdick-sil
Copy link
Collaborator Author

Thanks for the comment - that's a good suggestion! I'll add an issue about addressing this versification issue more thoroughly at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

Status: 👀 In review

Development

Successfully merging this pull request may close these issues.

4 participants