Skip to content

Conversation

meriam2303
Copy link

@meriam2303 meriam2303 commented Sep 29, 2025

French Model draft to test + Arabic & Maghrebi commented out

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

@meriam2303 meriam2303 marked this pull request as draft September 29, 2025 17:25
@mawiesne mawiesne changed the title french model Add tagging examples to verify French POS model Sep 30, 2025
@mawiesne
Copy link
Contributor

Thx @meriam2303 for the PR. Let's see if it passes the tests.

@mawiesne mawiesne added the tests Pull requests that add or update test code label Sep 30, 2025
@mawiesne
Copy link
Contributor

mawiesne commented Sep 30, 2025

@meriam2303 It seems there is a syntax error:

POSTaggerMEIT.java. 157:6: mismatched input 'Arguments' expecting ')'

Could you check, correct it and push a fix to the same branch?

Please also add a new static constant to that test:

private static final String FRENCH = "fr";

See other constants close to the class definition (POLISH, GERMAN, ENGLISH...)

@mawiesne mawiesne self-assigned this Sep 30, 2025
@mawiesne mawiesne requested a review from rzo1 September 30, 2025 07:03
@mawiesne mawiesne marked this pull request as ready for review September 30, 2025 15:51
@pazairfog
Copy link

Hi,

There are still some indentation errors for the French data.

diff --git a/opennlp-core/opennlp-runtime/src/test/java/opennlp/tools/postag/POSTaggerMEIT.java b/opennlp-core/opennlp-runtime/src/test/java/opennlp/tools/postag/POSTaggerMEIT.java
index 0963f67e..3bd10889 100644
--- a/opennlp-core/opennlp-runtime/src/test/java/opennlp/tools/postag/POSTaggerMEIT.java
+++ b/opennlp-core/opennlp-runtime/src/test/java/opennlp/tools/postag/POSTaggerMEIT.java
@@ -135,15 +135,15 @@ public class POSTaggerMEIT {
       Arguments.of(POLISH, 0,
         "Działacze stosowali też różne formy nacisku na polski konsulat , żeby zaopiekował się " +
         "bezrobotnymi z Polski albo dał im choćby na bezpłatny bilet do kraju .",
-          new String[]{"NOUN", "VERB", "PART", "ADJ", "NOUN", "NOUN", "ADP", "ADJ", "NOUN", "PUNCT", "SCONJ", 
-            "VERB", "PRON", "ADJ", "ADP", "PROPN", "CCONJ", "VERB", "PRON", "PART", "ADP", "ADJ", "NOUN", 
+          new String[]{"NOUN", "VERB", "PART", "ADJ", "NOUN", "NOUN", "ADP", "ADJ", "NOUN", "PUNCT", "SCONJ",
+            "VERB", "PRON", "ADJ", "ADP", "PROPN", "CCONJ", "VERB", "PRON", "PART", "ADP", "ADJ", "NOUN",
             "ADP", "NOUN", "PUNCT"}),
       // via: @kinow
       Arguments.of(CATALAN, 1,
         "Un gran embossament d'aire fred es comença a despenjar cap al centre d'Europa.",
           // OpenNLP, different at: idx pos 2, 3, 5, and 13(+14) -> however, only pos 5 is "wrong" (ref)
           new String[]{"DET", "ADJ", "NOUN", "ADP", "NOUN", "ADJ", "PRON", "VERB", "ADP", "VERB", "NOUN",
-              "ADP+DET", "NOUN", "ADP", "PROPN", "PUNCT"}),
+            "ADP+DET", "NOUN", "ADP", "PROPN", "PUNCT"}),
       // REFERENCE ("gold"):
       // "DET", "ADJ", "NOUN", "ADP", "NOUN", "ADJ", "PRON", "VERB", "ADP", "VERB", "NOUN", "ADP+DET",
         // "NOUN", "ADP", "PROPN", "PUNCT"})
@@ -153,22 +153,22 @@ public class POSTaggerMEIT {
         // "NOUN", "PROPN", "PROPN", "PUNCT"
         // ok! ,  ok! ,  ??? ,  ???   ,  ok!  ,  ok! ,  ok!  ,  ok!  ,  ok! ,  ok!  ,  ok!  ,  ok!  +  ok! ,
         // ok!  ,  ???   ,  ok!   ,  ok!
-      // via: @meriam2303 , original by Guillaume Musso:
-      // La jeune fille et la nuit, S.469 
+      // via: @meriam2303, original by Guillaume Musso:
+      // La jeune fille et la nuit, S.469
       Arguments.of(FRENCH, 0,
-      "Vivre avec elle me faisait souffrir, mais vivre sans elle m'aurait tué.",
-      new String[]{"VERB","ADP","PRON","PRON","AUX","VERB","PUNCT","CCONJ","VERB","ADP","PRON","PRON","AUX",
-          "VERB","PUNCT"})
+        "Vivre avec elle me faisait souffrir, mais vivre sans elle m'aurait tué.",
+          new String[]{"VERB", "ADP", "PRON", "PRON", "AUX", "VERB", "PUNCT", "CCONJ", "VERB", "ADP", "PRON",
+            "PRON", "AUX", "VERB", "PUNCT"})
       // via @meriam2303, original by Hind Choueykh Ben Salah
       // التجريد في الشّعر العربي , S. 42
       //Arguments.of(ARABIC,0,
       //"عشق أبو نواس جارية تدعى جنان",
-      //new String[]{"VERB","PROPN","NOUN","VERB","PROPN"})  
+      //new String[]{"VERB","PROPN","NOUN","VERB","PROPN"})
       // via @meriam2303, original by Mohamed Laarousi Elmetoui
       // التوت المر , S.7
       //Arguments.of(MARGHREBI_ARABIC_FRENCH,0,
       //"Wassa3 belek ya baba...",
-      //new String[]{"VERB","NOUN","ITNJ","NOUN","PUNCT"})    
+      //new String[]{"VERB","NOUN","ITNJ","NOUN","PUNCT"})
     );
   }
 }

Copy link
Contributor

@rzo1 rzo1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed checkstyle but currently fails due to missing model (?)

Error:  Tests run: 8, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.277 s <<< FAILURE! -- in opennlp.tools.postag.POSTaggerMEIT
Error:  opennlp.tools.postag.POSTaggerMEIT.testPOSTagger(String, int, String, String[])[8] -- Time elapsed: 0.005 s <<< ERROR!
java.lang.NullPointerException: Cannot invoke "opennlp.tools.tokenize.Tokenizer.tokenize(String)" because the return value of "java.util.Map.get(Object)" is null
	at opennlp.tools.postag.POSTaggerMEIT.testPOSTagger(POSTaggerMEIT.java:66)

@rzo1
Copy link
Contributor

rzo1 commented Oct 14, 2025

Fixed the test setup. Now we have

Vivre_VERB avec_ADP elle_PRON me_PRON faisait_VERB souffrir_VERB ,_PUNCT mais_CCONJ vivre_VERB sans_ADP elle_PRON m'_PRON aurait_AUX tué_VERB ._PUNCT 
Vivre[VERB] <-- OK
avec[ADP] <-- OK
elle[PRON] <-- OK
me[PRON] <-- OK
faisait[VERB] <-- NOK, pos=4
souffrir[VERB] <-- OK
,[PUNCT] <-- OK
mais[CCONJ] <-- OK
vivre[VERB] <-- OK
sans[ADP] <-- OK
elle[PRON] <-- OK
m'[PRON] <-- OK
aurait[AUX] <-- OK
tué[VERB] <-- OK
.[PUNCT] <-- OK

which should be AUX according to the provided reference.

Think it is an edge case here:

Actually, faisait is the imperfect of “faire”. Here it functions as a semi-auxiliary in faisait souffrir. Tagging it as AUX is acceptable in Universal Dependencies because “faire” + infinitive is considered an auxiliary construction. (not a native French speaker though).

Tried some other (online) taggers, which will label faisat as VERB.

@mawiesne mawiesne changed the title Add tagging examples to verify French POS model OPENNLP-1782: Add tagging examples to verify French POS model Oct 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tests Pull requests that add or update test code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants