Cb changes 20250131 #11218

tiff · 2025-01-31T15:21:57Z

Summary by CodeRabbit

New Features
- Enhanced German language support with new podcast-related terms and names.
- Expanded vocabulary in English and German spelling dictionaries, including scientific and contemporary terms.
- Improved language disambiguation for technical terms like "I/O".
- Introduced new rules for capitalization related to political references in German and for unnecessary pronouns.
Bug Fixes
- Refined comma whitespace rule to focus specifically on parentheses.
- Updated grammar rules for more accurate language checking.
Documentation
- Added multiple new names and terms to multi-token suggestion lists.

coderabbitai · 2025-01-31T15:22:07Z

Walkthrough

This pull request encompasses multiple changes across various LanguageTool language modules, focusing on refining grammatical and punctuation rules, as well as expanding language resources. The modifications include updates to spelling dictionaries, ignore files, disambiguation rules, and grammar configurations in both German and English modules. Key changes involve modifying bracket detection logic to focus solely on parentheses, removing specific test cases related to mismatched brackets, and adding numerous words, names, and technical terms to enhance language processing and spell-checking capabilities.

Changes

File Path	Change Summary
`languagetool-core/src/main/java/org/languagetool/rules/CommaWhitespaceRule.java`	Modified bracket detection logic to focus only on parentheses
`languagetool-core/src/test/java/org/languagetool/rules/CommaWhitespaceRuleTest.java`	Removed test cases for mismatched bracket scenarios
`languagetool-core/src/main/resources/org/languagetool/resource/spelling_global.txt`	Added numerous words, brands, names, and cultural references
`languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/hunspell/ignore.txt`	Added "Nietzard/S", "Deepseek/S", "Leavitt/S", "Adschman/S", "Alijew/S", "Yonpyong/S" to the ignore list
`languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/hunspell/spelling.txt`	Added podcast-related terms, new adjectives, and verb variations
`languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/multitoken-suggest.txt`	Added names of public figures and personalities
`languagetool-language-modules/de/src/main/resources/org/languagetool/rules/de/remote-rule-filters.xml`	Added new rule for "Die Linke" capitalization
`languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/disambiguation.xml`	Added rule for correctly interpreting "I/O"
`languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/hunspell/ignore.txt`	Added technical terms, names, and specialized vocabulary
`languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/hunspell/spelling.txt`	Added scientific and technical terminology
`languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/grammar.xml`	Updated `SPEND_IT_FOR` rule's antipattern structure
`languagetool-language-modules/de/src/main/resources/org/languagetool/rules/de/grammar.xml`	Added new antipattern for specific grammatical structures
`languagetool-language-modules/de/src/main/resources/org/languagetool/rules/de/replace.txt`	Updated replacement rules according to the 2024 German spelling reform
`languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/compounds.txt`	Added new compound word "Winnie-the-Pooh"
`languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/multiwords.txt`	Added entries for "West African" and "West Africans"

Possibly related PRs

Cb changes 20241203 #11081: The changes in the main PR, specifically in the CommaWhitespaceRule class, are related to the modifications in the spelling_global.txt file from the retrieved PR, as both involve adjustments to how specific types of characters (brackets and words) are processed.
Cb changes 20241121 #11051: The changes in the main PR, specifically in the CommaWhitespaceRule class, are related to the modifications in the CommaWhitespaceRuleTest class, which involves removing tests that previously checked for various bracket types, aligning with the narrowed focus on parentheses in the main PR.
Cb changes 20241217 #11120: The changes in the main PR, specifically in the CommaWhitespaceRule class, are related to the modifications in the spelling_global.txt file from the retrieved PR, as both involve adjustments to how specific terms (including punctuation and vocabulary) are processed.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/hunspell/ignore.txt (1)
Line range hint 2-11007: Well-formed additions to the ignore list.

The new entries maintain consistent formatting and include a good mix of proper nouns, technical terms, and compound words. The preservation of case sensitivity and diacritical marks (e.g., "Kantō") is appropriate.

Consider grouping related terms for better maintainability.

For variants like "Kanto" and "Kantō", consider adding a comment to indicate their relationship or grouping them together in the file. This would make future maintenance easier.
  # Japanese region names and variants
  Kanto
  Kantō
languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/disambiguation.xml (1)
17185-17194: Consider expanding the rule to handle more variations.

The current rule might miss common variations of "I/O". Consider adding support for:

Plural forms (e.g., "I/Os")

Alternative separators (e.g., "I-O")

Lowercase variations (e.g., "i/o")

Here's a suggested expansion of the rule:
     <rule id="I_O_NO_PRP" name="I in I/O is no PRP">
         <pattern>
             <marker>
-                <token case_sensitive="yes">I</token>
+                <token regexp="yes">(?i)I|i</token>
             </marker>
-            <token>/</token>
+            <token regexp="yes">[/-]</token>
-            <token>O</token>
+            <token regexp="yes">(?i)O|o</token>
+            <token regexp="yes">s?</token>
         </pattern>
         <disambig action="remove" postag="PRP.*"/>
     </rule>

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba526dc and d95cba7.

📒 Files selected for processing (11)

languagetool-core/src/main/java/org/languagetool/rules/CommaWhitespaceRule.java (1 hunks)
languagetool-core/src/main/resources/org/languagetool/resource/spelling_global.txt (1 hunks)
languagetool-core/src/test/java/org/languagetool/rules/CommaWhitespaceRuleTest.java (0 hunks)
languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/hunspell/ignore.txt (1 hunks)
languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/hunspell/spelling.txt (1 hunks)
languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/multitoken-suggest.txt (1 hunks)
languagetool-language-modules/de/src/main/resources/org/languagetool/rules/de/remote-rule-filters.xml (1 hunks)
languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/disambiguation.xml (1 hunks)
languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/hunspell/ignore.txt (1 hunks)
languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/hunspell/spelling.txt (1 hunks)
languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/grammar.xml (1 hunks)

💤 Files with no reviewable changes (1)

languagetool-core/src/test/java/org/languagetool/rules/CommaWhitespaceRuleTest.java

✅ Files skipped from review due to trivial changes (1)

languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/hunspell/ignore.txt

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: Analyze (java-kotlin)

🔇 Additional comments (13)

languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/grammar.xml (1)

106880-106883: LGTM! Good addition of the "for...sake" antipattern.

The new antipattern correctly handles common idiomatic expressions like "for heaven's sake", "for goodness sake", etc. The skip="4" attribute appropriately allows for variations in these phrases.
languagetool-core/src/main/java/org/languagetool/rules/CommaWhitespaceRule.java (1)

236-236: Verify the impact of bracket handling changes.

Let's analyze the usage of this rule across the codebase to understand the full impact of these changes.

Also applies to: 244-244

✅ Verification successful

Bracket handling changes are safely isolated to core implementation

The change to parenthesis handling in isLeftBracket() is inherited by all language-specific implementations without any conflicts or overrides. No language-specific customizations of bracket handling were found.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Analyze usage and impact of CommaWhitespaceRule changes

echo "1. Finding direct usages of CommaWhitespaceRule:"
rg -l "CommaWhitespaceRule"

echo -e "\n2. Finding subclasses:"
ast-grep --pattern 'class $_ extends CommaWhitespaceRule'

echo -e "\n3. Finding test files containing bracket-related assertions:"
rg -l "assertMatches.*[\[\{\}\]].*CommaWhitespaceRule"

echo -e "\n4. Finding language-specific configurations:"
fd -e xml | xargs rg "COMMA_PARENTHESIS_WHITESPACE"
Length of output: 4800

Script:
#!/bin/bash
# Check language-specific implementations and tests

echo "1. Examining German implementation:"
rg -A 5 -B 5 "class GermanCommaWhitespaceRule" languagetool-language-modules/de/src/main/java/org/languagetool/rules/de/GermanCommaWhitespaceRule.java

echo -e "\n2. Examining Ukrainian implementation:"
rg -A 5 -B 5 "class UkrainianCommaWhitespaceRule" languagetool-language-modules/uk/src/main/java/org/languagetool/rules/uk/UkrainianCommaWhitespaceRule.java

echo -e "\n3. Examining Arabic implementation:"
rg -A 5 -B 5 "class ArabicCommaWhitespaceRule" languagetool-language-modules/ar/src/main/java/org/languagetool/rules/ar/ArabicCommaWhitespaceRule.java

echo -e "\n4. Examining Persian implementation:"
rg -A 5 -B 5 "class PersianCommaWhitespaceRule" languagetool-language-modules/fa/src/main/java/org/languagetool/rules/fa/PersianCommaWhitespaceRule.java

echo -e "\n5. Looking for bracket-related tests:"
rg -A 5 -B 5 "test.*[].*CommaWhitespaceRule" languagetool-core/src/test/java/org/languagetool/rules/CommaWhitespaceRuleTest.java
Length of output: 2690

Script:
#!/bin/bash
# Examine core test cases and implementation details

echo "1. Looking for bracket-related test cases in CommaWhitespaceRuleTest:"
rg -A 2 "[].*CommaWhitespaceRule" languagetool-core/src/test/java/org/languagetool/rules/CommaWhitespaceRuleTest.java

echo -e "\n2. Check for isLeftBracket or similar method overrides:"
ast-grep --pattern 'class $_ extends CommaWhitespaceRule {
  $$$
  isLeftBracket($_) {
    $$$
  }
  $$$
}'

echo -e "\n3. Examining the core implementation context:"
rg -A 5 -B 5 "isLeftBracket" languagetool-core/src/main/java/org/languagetool/rules/CommaWhitespaceRule.java
Length of output: 1484
languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/hunspell/spelling.txt (4)

66006-66020: LGTM! Well-structured podcast-related terminology.

The compound words follow German word formation rules correctly, with proper gender pairs and inflection markers.

66005-66005: LGTM! Well-formed adjectives with correct inflection markers.

The adjectives follow German compound formation rules and are correctly marked for inflection.

Also applies to: 66021-66022, 66033-66033

66023-66034: LGTM! Consistent marking of separable prefix verbs.

The verbs use underscore consistently to mark separable prefixes, following German verb formation rules.

66035-66036: LGTM! Complex compounds and adjectives are well-formed.

The terms follow German word formation rules with appropriate inflection markers.

languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/hunspell/spelling.txt (1)

696-710: LGTM! The new words are properly formatted and follow the existing pattern.

The additions include correctly formatted scientific and technical terms:

Biology terms: "four-limbed", "strong-limbed", "limbed", "uroepithelium", "uroepithelia", "uroepithelial"

Chemistry terms: "isocitrate", "isocitrates", "decarboxylation", "alphaketoglutarate", "alphaketoglutarates"

Architecture terms: "stylobate", "stylobates"

Historical terms: "Perizzite", "Perizzites"

languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/multitoken-suggest.txt (1)

3215-3221: LGTM! The new entries are properly formatted and follow the existing pattern.

The additions include correctly formatted names of public figures with appropriate "/S" suffix:

News presenters: "Susanne Daubner/S", "Constantin Schreiber/S", "Dunja Hayali/S", "Marietta Slomka/S", "Linda Zervakis", "Jan Hofer/S"

Artists: "Katy Perry/S"

languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/hunspell/ignore.txt (1)

Line range hint 1-1: Clear and informative header comment.

The header comment effectively communicates the file's purpose and auto-generated nature.

languagetool-language-modules/de/src/main/resources/org/languagetool/rules/de/remote-rule-filters.xml (1)

2587-2595: LGTM! Rule correctly enforces capitalization of the political party name.

The rule appropriately checks for proper capitalization of "Die Linke" (The Left), a German political party, ensuring consistent capitalization in text.
languagetool-core/src/main/resources/org/languagetool/resource/spelling_global.txt (2)

27133-27133: ⚠️ Potential issue

Remove the line number that got mixed into the entries.

The number "27133" appears to be a line number that accidentally got included in the spelling dictionary.
-27133
Likely invalid or redundant comment.

27134-27145: Verify proper names and consider adding common variations.

The entries look good, but consider adding these variations for consistency:

"Boom" as a standalone entry (since "CoreWeave" is included as standalone)

"Siena" as a standalone entry (since it's part of "Banca Monte dei Paschi di Siena")

Let's verify the accuracy of proper names:
languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/disambiguation.xml (1)

17185-17194: LGTM! The rule correctly prevents "I" in "I/O" from being tagged as a personal pronoun.

The implementation uses appropriate case sensitivity and pattern matching to identify the "I/O" context.

coderabbitai · 2025-01-31T15:24:48Z

languagetool-core/src/main/java/org/languagetool/rules/CommaWhitespaceRule.java

@@ -233,15 +233,15 @@ private static boolean isLeftBracket(String str) {
      return false;
    }
    char c = str.charAt(0);
-    return c == '(' || c == '[' || c == '{';
+    return c == '(';


⚠️ Potential issue

Breaking change: Bracket handling restricted to parentheses only.

The changes in isLeftBracket and isRightBracket methods now only handle parentheses (), removing support for square brackets [] and curly braces {}. This is a breaking change that conflicts with:

The class documentation which states: "A rule that matches periods, commas and closing parenthesis preceded by whitespace and opening parenthesis followed by whitespace."

The existing test cases mentioned in the AI summary that check for various bracket types.

Consider one of these solutions:

Revert the changes to maintain backward compatibility:

- return c == '('; + return c == '(' || c == '[' || c == '{';

- return c == ')'; + return c == ')' || c == ']' || c == '}';

If the intent is to restrict bracket handling:

Update the class documentation to clarify the scope

Update the test cases to remove bracket type checks

Consider creating a new rule for handling other bracket types

Add a migration guide for users

Also applies to: 244-244

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (9)

languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/compounds.txt (1)

9044-9044: Compound Entry Addition: "Winnie-the-Pooh"*

The new compound entry "Winnie-the-Pooh*" has been added correctly. It follows the expected format by including hyphens and appending the appropriate asterisk (*) indicator. Please verify that its placement aligns with the intended ordering (alphabetical or grouping) within the file.
languagetool-language-modules/de/src/main/resources/org/languagetool/rules/de/grammar.xml (2)
75401-75407: New Antipattern Rule Enhancement

The newly added <antipattern> element is well-structured. It introduces a rule that matches a sentence starting with a SENT_START token, an optional punctuation token (matching quotation marks), then the literal "musste", followed by tokens for adverbs and articles. This rule should help refine the grammatical checking capabilities for specific constructs in German.

Consider adding an inline comment to clarify the purpose of this rule for future maintainers.

75401-75407: Optional Inline Documentation Recommendation

To further enhance maintainability, an inline comment describing the rationale behind this antipattern could be beneficial. For example, documenting that this rule detects cases where sentences begin with optional quotation marks followed by "musste" and specific grammatical tokens would provide useful context.

You might consider applying a diff similar to the following:
-                <antipattern>
+                
+                <antipattern>
languagetool-language-modules/de/src/main/resources/org/languagetool/rules/de/remote-rule-filters.xml (1)
2587-2595: New Rule Addition for 'Die Linke'

The new rule in the AI_DE_GGEC_REPLACEMENT_ORTHOGRAPHY_UPPERCASE rule group is structured correctly and follows the established XML format. It checks for the pattern where the token "Die" is marked and is immediately followed by the token "Linke". The example provided clearly illustrates the intended usage:
<pattern case_sensitive="yes">
  <marker>
    <token>Die</token>
  </marker>
  <token>Linke</token>
</pattern>
<example correction="">Ich glaube, <marker>Die</marker> Linke hat keine Chance bei der nächsten Wahl.</example>
Consideration:
Please verify whether this rule might inadvertently flag valid occurrences of the political party "Die Linke" (if that is not the intended target) versus instances where "die" should be lowercase. If the correction should not apply when referring to the party, an exception or additional context might be needed to avoid false positives.
languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/multiwords.txt (2)
1-8108: Consider organizing entries alphabetically for better maintainability.

While the file contains a comprehensive list of multiword terms, the entries appear to be added in a somewhat random order. Consider reorganizing them alphabetically to make it easier to:

Find existing entries

Avoid duplicates

Maintain the file over time

Here's a script to help sort the entries while preserving comments:
#!/bin/bash
# Sort entries while preserving comments
grep "^#" multiwords.txt > comments.tmp
grep -v "^#" multiwords.txt | sort > entries.tmp
cat comments.tmp entries.tmp > multiwords_sorted.txt
rm comments.tmp entries.tmp
1-8108: Consider adding POS tag validation.

The file uses various POS tags (NNP, NNPS, NN, NNS, etc.) but there's no explicit validation of allowed tags. Consider adding a validation step to ensure consistency.

Here's a script to analyze current POS tag usage:
#!/bin/bash
# Extract and count unique POS tags
grep -v "^#" | awk '{print $NF}' | sort | uniq -c
languagetool-core/src/main/resources/org/languagetool/resource/spelling_global.txt (3)

27131-27150: Evaluate new entries and maintain consistency.

The additions in this hunk introduce several proper nouns and names (e.g., "Willem-Alexander", "Yomiuri Shimbun", "Asahi Shimbun", etc.). Please verify that these new entries adhere to the established formatting and capitalization conventions. It’s also worth confirming whether the order (typically alphabetical) is maintained or if reordering is required for consistency with other dictionary sections.

27157-27175: Review naming variations and duplicate entries.

This hunk adds multiple new entries such as "Kumbh Mela", "Boom Supersonic", and several personal and corporate names. Note the presence of closely related variants (e.g., "Arabella Rose Kushner", "Arabella Kushner", and "Arabella Rose")—please confirm that each variant is necessary and not an accidental duplicate. Additionally, ensure the naming conventions (spacing, hyphenation) remain consistent.

27186-27192: Validate accuracy and relevance of additional entries.

The entries introduced in this hunk (e.g., "Ramanujan", "Rick Deckard", "Vernon Lee", "Karoline Leavitt") expand the dictionary's scope. Please double-check that these names are accurate, relevant, and align with the overall goals of the spelling resource. Maintaining consistency in style with the rest of the file is crucial.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 009d66e and 2172943.

📒 Files selected for processing (13)

languagetool-core/src/main/resources/org/languagetool/resource/spelling_global.txt (3 hunks)
languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/hunspell/ignore.txt (2 hunks)
languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/hunspell/spelling.txt (1 hunks)
languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/multitoken-suggest.txt (1 hunks)
languagetool-language-modules/de/src/main/resources/org/languagetool/rules/de/grammar.xml (1 hunks)
languagetool-language-modules/de/src/main/resources/org/languagetool/rules/de/remote-rule-filters.xml (3 hunks)
languagetool-language-modules/de/src/main/resources/org/languagetool/rules/de/replace.txt (1 hunks)
languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/compounds.txt (1 hunks)
languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/disambiguation.xml (1 hunks)
languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/hunspell/ignore.txt (1 hunks)
languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/hunspell/spelling.txt (1 hunks)
languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/multiwords.txt (1 hunks)
languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/grammar.xml (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (5)

languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/disambiguation.xml
languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/grammar.xml
languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/hunspell/ignore.txt
languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/hunspell/spelling.txt
languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/hunspell/ignore.txt

🧰 Additional context used

🪛 LanguageTool

languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/multitoken-suggest.txt

[misspelling] ~3226-~3226: This word is normally spelled as one.
Context: ...onkunas Peter Joppich/S Abd al-Rahman/S Exxon Mobil/S Masar-e Scharif/S Tall Abyad/S #name ...

(EN_COMPOUNDS_EXXON_MOBIL)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: Analyze (java-kotlin)

🔇 Additional comments (10)

languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/hunspell/spelling.txt (2)

66005-66038: Validation & Consistency Check for Added Podcast and "simsen" Terms

This hunk introduces a batch of new entries ranging from "drachenmäßig/A" through "Zustrombegrenzungsgesetze/SN". These include podcast-related terms (e.g., "Podcastwerbung", "Podcastmarketing/S", "Podcastranking/S") and various verbal forms using “simsen” (e.g., "rum_simsen", "weg_simsen", "herum_simsen", etc.), as well as adjectives such as "geschlechterparitätisch/A". Please verify that each entry adheres to the established Hunspell dictionary format and that the morphological flags (like /A, /S, /SN, /NS) are accurate and consistent with similar, existing entries.

It might be useful to run an automated script or manual check against the dictionary’s parsing logic to ensure none of these entries inadvertently break lookup expectations.

66046-66067: 🛠️ Refactor suggestion

Review of Additional New Technical and Loanword Entries

This section adds further vocabulary items starting with "Slusheis" and continuing through "Tanzim-Miliz". Notably, entries such as "Slusheis" versus "Slush-Eis" and the two variations for the ice machine ("Slush-Eis-Maschine/N" and "Slusheismaschine/N") suggest an intention to capture regional or orthographic variants. Additionally, terms like "Al-Aksa-Moschee", "al-Aqsa-Moschee", "al-Aqsa", and "al-Aksa" appear with subtle differences that should be cross-checked for consistency with any preexisting standards or guidelines for loanwords and hyphenation. The inline remark in "Stop-Loss #eng" is acceptable if similar notation is used elsewhere; otherwise, consider standardizing its format.

Please ensure that these entries do not introduce unintended duplicates or conflicts with existing dictionary terms.

languagetool-language-modules/de/src/main/resources/org/languagetool/rules/de/replace.txt (2)

853-854: New Replacement Rules for “Bär” Variants

The new entries
Puh der Bär=Pu der Bär
Poo der Bär=Pu der Bär
are correctly added to standardize these expressions in line with the 2024 German spelling reform. Please verify that the chosen replacement “Pu der Bär” aligns with overall project conventions.

855-869: Review of Updated Replacement Entries

The subsequent lines (855–869) update several replacement rules (e.g. “Fettarmekost”, “Gutelaune”, “Hugo Ekitike[s]”, “Bloc Quebecois”, etc.) with the correct diacritical marks and spellings per the recent reform. These modifications look consistent with the updated guidelines.

🧰 Tools

🪛 LanguageTool

[uncategorized] ~859-~859: Do not mix variants of the same word (‘Québécois’ and ‘Quebecois’) within a single text.
Context: ...tikes=Hugo Ekitikés Bloc Quebecois=Bloc Québécois Le Meridien=Le Méridien Francois Bayrou...

(EN_WORD_COHERENCY)

languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/multitoken-suggest.txt (4)

3215-3235: New Multi-token Suggestion Entries

A large set of new entries has been appended to enhance multi-token suggestion coverage. The additions include several personal names and terms such as:
Susanne Daubner/S, Constantin Schreiber/S, Dunja Hayali/S, Marietta Slomka/S, Linda Zervakis, Jan Hofer/S, and Katy Perry/S (lines 3215–3221).
These entries appear to be well-formed in terms of syntax and suffix usage. Please confirm that they meet the intended criteria and that the use of “/S” (or the absence thereof, as with “Linda Zervakis”) is deliberate.

🧰 Tools

🪛 LanguageTool

[misspelling] ~3226-~3226: This word is normally spelled as one.
Context: ...onkunas Peter Joppich/S Abd al-Rahman/S Exxon Mobil/S Masar-e Scharif/S Tall Abyad/S #name ...

(EN_COMPOUNDS_EXXON_MOBIL)

3226-3227: Spelling Check for “Exxon Mobil/S”

The entry
Exxon Mobil/S (line ~3227)
triggers a static analysis hint: it is normally written as one word (“ExxonMobil”). Please verify whether the space is intentional due to domain-specific usage or if it should be corrected for consistency.

🧰 Tools

🪛 LanguageTool

[misspelling] ~3226-~3226: This word is normally spelled as one.
Context: ...onkunas Peter Joppich/S Abd al-Rahman/S Exxon Mobil/S Masar-e Scharif/S Tall Abyad/S #name ...

(EN_COMPOUNDS_EXXON_MOBIL)

3228-3233: Validation of Additional New Entries

Entries such as
Masar-e Scharif/S,
Tall Abyad/S #name,
Tripoli/S #name,
Umm Qasr/S,
Umm Kasr/S, and
Willem-Alexander/S #name
(from lines 3228–3233) have been added. They effectively expand the multi-token suggestion dataset. It is recommended to double-check that the usage of suffixes (e.g. “/S”) consistently applies to all names as intended.

3234-3234: Consistency Check for “Pu der Bär” Entry

The new entry
Pu der Bär (line 3234)
is added without any suffix. In the other resource file, similar terms use suffixes to indicate token modifications. Please verify if this omission is intentional or if a suffix (such as “/S”) should be appended to maintain consistency with the overall format.
languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/multiwords.txt (2)

1-4: LGTM! Clear and helpful header comments.

The header comments effectively explain the purpose and usage rules for this multiwords file, including:

Its use in EnglishHybridDisambiguator

Criteria for adding terms (single POS, no disambiguation needed)

Spelling handling behavior

Case sensitivity rules

316-317: Verify consistency with other geographic terms.

The newly added entries "West African" (NNP) and "West Africans" (NNPS) follow the same pattern as other geographic terms in the file like "East Flanders" (NNP) and "West Flanders" (NNP). Let's verify there are no similar terms missing.

✅ Verification successful

Consistency Confirmed for Directional African Terms

The review comment is addressed as the grep results show that the directional geographic terms in our repository follow a consistent pattern. In the English multiwords file, similar entries like “East Flanders” and “West Flanders” are used, and there are no other “African” directional terms (e.g., North African, South African, or East African) expected or missing. The added “West African” (NNP) and “West Africans” (NNPS) entries align with this established convention.

The grep output from other geographic terms confirms consistent directional naming.

There is no evidence of missed similar directional terms regarding “African” in the English multiwords file.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search for other directional geographic terms that may need similar treatment
rg -i "^(north|south|east|west)\s+\w+" | grep -v "African"
Length of output: 31487

tiff added 5 commits January 31, 2025 16:18

[core] remove } and ] from CommaWhiteSpaceRule

b48e0c0

[core] add global spelling

cfb3cd2

[de] add spelling

3d4cf08

[de] improve rule

97cc91c

[en] improve rules

d95cba7

coderabbitai bot reviewed Jan 31, 2025

View reviewed changes

tiff added 6 commits January 31, 2025 16:56

[de] improve rule

01ffc54

[de] improve rule

009d66e

[core] add global spelling

29c464f

[de] improve rules

1deb9bb

[en] improve rules

34bf951

Merge branch 'master' into cb-changes-20250131

2172943

coderabbitai bot reviewed Feb 3, 2025

View reviewed changes

tiff merged commit a8f2643 into master Feb 4, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cb changes 20250131 #11218

Cb changes 20250131 #11218

tiff commented Jan 31, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 31, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Jan 31, 2025

coderabbitai bot left a comment

Cb changes 20250131 #11218

Cb changes 20250131 #11218

Conversation

tiff commented Jan 31, 2025 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Jan 31, 2025 • edited Loading

Walkthrough

Changes

Possibly related PRs

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jan 31, 2025

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

tiff commented Jan 31, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 31, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)