This repository has been archived by the owner on Sep 21, 2023. It is now read-only.
-
-
Notifications
You must be signed in to change notification settings - Fork 15
Implement formatting check for long hyphen #261
Closed
fulminatingmoat
wants to merge
4
commits into
GrafeasGroup:main
from
fulminatingmoat:LongHyphenFormattingCheck
Closed
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
ddbdadf
Implement formatting check for long hyphen
fulminatingmoat dc19711
Update formatting_validation.py
fulminatingmoat 236666b
Update test_formatting_validation.py
itsthejoker 5167a03
Merge branch 'main' into LongHyphenFormattingCheck
fulminatingmoat File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
9 changes: 9 additions & 0 deletions
9
test/validation/transcriptions/invalid/autogenerated-long-hyphen.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
*Image Transcription: Facebook* | ||
|
||
-— | ||
|
||
I had to come here to vent because I'm scared and sad .... my 300sqft camper that I birthed my son in unassisted has been "under construction" for two months and I'm due in November and wanted to start an outdoor space before baby came so I could move more freely and privately ... my baby daddy has destroyed all the building supplies $3k+ worth of lumber and insulation left in a big storm instead of being brought inside ... he's been getting drunk and sleeping there instead of working and I've been working full time and doing all the meal prep and laundry and shopping and managing the kids schedules while homeschooling and he doesn't do anything but game and smoke and drink and I have no help or support - not to mention there's an assault charge against him for abusing me now which means CPS is involved and that's stressful itself .... we're homeless and having authorities questioned our whole lives and I have no family or friends and all my energy and money is SPENT doing literally EVERYTHING alone and paying for babysitting... baby daddy won't shower or eat at all hasn't been sleeping just gaming and his car has mold growth and mushrooms in the floor boards.... his mouth is badly infected from lack of hygiene... he canceled all my therapy appointments for the next two months .... and he's due to potentially go to jail in September he refused council for court ... I cry every day.... I'm so alone and sad and overwhelmed and I just want to be in my tiny home but I cannot build it myself ... anyway still planning an unassisted but will probably be totally and utterly alone with me and my two small kids which is .... sad to say the least and probably end up having it on a beach somewhere because I'm legitimately homeless | ||
|
||
—- | ||
|
||
^^I'm a human volunteer content transcriber for Reddit and you could be too! [If you'd like more information on what we do and why we do it, click here!](https://www.reddit.com/r/TranscribersOfReddit/wiki/index) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
|
@@ -68,6 +68,10 @@ | |||||||
# Image Transcription | ||||||||
VALID_HEADERS = ["Audio Transcription", "Image Transcription", "Video Transcription"] | ||||||||
|
||||||||
# Regex to recognize separators being replaced with autogenerated long hyphens by mobile devices | ||||||||
# —- or -— instead of --- | ||||||||
AUTOGENERATED_LONG_HYPHEN_PATTERN = re.compile(r"—-|-—") | ||||||||
|
||||||||
|
||||||||
def check_for_bold_header(transcription: str) -> Optional[FormattingIssue]: | ||||||||
"""Check if the transcription has a bold instead of italic header.""" | ||||||||
|
@@ -149,6 +153,24 @@ def check_for_fenced_code_block(transcription: str) -> Optional[FormattingIssue] | |||||||
) | ||||||||
|
||||||||
|
||||||||
def check_for_autogenerated_long_hyphen(transcription: str) -> Optional[FormattingIssue]: | ||||||||
"""Check if the transcription contains autogenerated long hyphens as a result of mobile device 'assistance' | ||||||||
|
||||||||
Separator should look like this: | ||||||||
--- | ||||||||
|
||||||||
Mobile devices may convert the separator to: | ||||||||
—- or -— | ||||||||
|
||||||||
These don't display correctly on all devices | ||||||||
""" | ||||||||
return ( | ||||||||
FormattingIssue.AUTOGENERATED_LONG_HYPHEN | ||||||||
if AUTOGENERATED_LONG_HYPHEN_PATTERN.search(transcription) is not None | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not drop the regex and do a simple
Suggested change
|
||||||||
else None | ||||||||
) | ||||||||
|
||||||||
|
||||||||
def check_for_unescaped_username(transcription: str) -> Optional[FormattingIssue]: | ||||||||
"""Check if the transcription contains an unescaped username. | ||||||||
|
||||||||
|
@@ -224,6 +246,7 @@ def check_for_formatting_issues(transcription: str) -> Set[FormattingIssue]: | |||||||
check_for_heading_with_dashes(transcription), | ||||||||
check_for_missing_separators(transcription), | ||||||||
check_for_fenced_code_block(transcription), | ||||||||
check_for_autogenerated_long_hyphen(transcription), | ||||||||
check_for_unescaped_username(transcription), | ||||||||
check_for_unescaped_subreddit(transcription), | ||||||||
check_for_unescaped_heading(transcription), | ||||||||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The long hyphen isn't always auto-generated, so perhaps "accidental" would be a better descriptor than "autogenerated"?
Side note: if settings for "smart quotes" (and the like) are enabled on macOS, it automatically replaces
--
with—
there too. Not just mobile. 😄