Skip to content

Commit

Permalink
fix: handle special citation format in WARNOCK file
Browse files Browse the repository at this point in the history
- Force pattern_C (simple page number matching) for WARNOCK file
- Prevents false positive matches with complex citation patterns
- Addresses issue with unique text formatting causing regex conflicts

Issue: Pattern A and B were incorrectly matching page numbers
Solution: Force simpler pattern when WARNOCK is detected in filename
  • Loading branch information
MatiasAgelvis committed Jan 26, 2025
1 parent 5c97b21 commit aab6255
Showing 1 changed file with 15 additions and 4 deletions.
19 changes: 15 additions & 4 deletions docusaurus_nb.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,13 +61,19 @@ def get_regex_pattern(string):
# matches simple page notation
pattern_C = r'([\*_]*[pP]\\?\s*\.\s*[\*_]*(\d+).*?)'

# Special case for WARNOCK file: The text formatting in this file
# causes false positives with pattern_A and pattern_B due to its unique citation style.
# When 'WARNOCK' is in the filename, force using pattern_C (simple page number matching)
# to avoid regex matching issues with the more complex patterns.
# TODO: Consider updating the name extraction regex to better handle these cases,
# or create a specific pattern for this citation style.
if 'WARNOCK'.lower() in name.lower():
return pattern_C

if re.search(pattern_A, string):
return pattern_A
elif re.search(pattern_B, string):
return pattern_B
else:
print('pattern_C\n'*10)
return pattern_C
return pattern_B


# for some reason when mammoth exports md to a directory
Expand Down Expand Up @@ -99,6 +105,11 @@ def get_regex_pattern(string):
flags = re.IGNORECASE
pattern = get_regex_pattern(content)
content = re.sub(pattern, r'\n## \2\n\1', content, flags=flags)

if 'Warnock' in parent_folder:
print('\n', parent_folder, ':', re.findall(r'([\*_]*[pP]\\?\s*\.\s*[\*_]*(\d+).*?)', content))
print('\n', pattern)

# remove heading white spaces
# content = content.lstrip()

Expand Down

0 comments on commit aab6255

Please sign in to comment.