Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New feature: uppercase first letter in sentence, code cleanup #20

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ivoras
Copy link

@ivoras ivoras commented Aug 19, 2024

Major change: making the first letter in the word following a "." or "?" uppercase (optional, defaults to off).
Minor changes: code cleanup, whitespace removal.

@oliverguhr
Copy link
Owner

Thanks for the PR! I always had the idea to add true casing to the model.

However, I see an issue here. For example, given the following text:

this is an test my name is oliver

the output would be:

this is an test. My name is oliver.

This true casing would only work after a "." or "?" not at the beginning of a sentence and not with "!" as we don't detect them.

@ivoras
Copy link
Author

ivoras commented Aug 21, 2024

This true casing would only work after a "." or "?" not at the beginning of a sentence and not with "!" as we don't detect them.

I don't know what you mean with "!", as the patch doesn't use it, but I've also noticed it doesn't capitalise the starting sentence of the text, so I've updated the patch.

I know this is not proper true-casing as that would probably involve also applying it to possible names inside sentences, but it's good enough for my needs. There's a model on HF that attempts to do that (1-800-BAD-CODE/xlm-roberta_punctuation_fullstop_truecase) but it's too buggy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants