NEWS

May 5, 2023
* Release 0.6
  * Change in generated Orthography - Adapted to the official orthograpy of the Academia Aragonesa de la Lengua (https://www.academiaaragonesadelalengua.org/sites/default/files/ficheros-pdf/Ortograf%C3%ADa%20de%20l%27aragon%C3%A9s_web_an.pdf)
  * Keeps analysis compatibility with previous orthography (apertium-arg)
  * New entries added.
  * Minor fixes performed.
  * Bilingual dictionary:
     > 26437 entries (8015 proper nouns).
  * Main pending issues: improve disambiguation with CG, add support for generating dialectal preferences

8 January 2021
* Release 0.5 (with change in the language codes apertium-es-an --> apertium-spa-arg)
   * Added constraint grammar support (included in spa->arg direction, added support in arg->spa direction)
   * Changed arg.prob file
   * Adapted to changes in apertium-spa and apertium-arg
   * Added support for new np labels and acronyms
   * Definite articles generated according to "Gramatica Basica de l'Aragonés" neutral standard paradigm.
   * Support for short ordinals and digital hour
   * Improved apostrophation rules in post generation. Fix problems with "de + en (prep)".
   * Added support for superlatives with "muito/muit/molt"
   * Minor fixes performed
   * Bilingual dictionary: 
     > 25040 entries (8038 proper nouns)

* Main pending Issues:
   * Still room to improve on coverage (in apertium-arg).
   * Still much to work on transfer.
   * Improve morphological disambiguation, mainly in arg->spa direction (GC rules)
   * Develop lexical selection rules.

* Release 0.4 (with change in the language codes apertium-es-an --> apertium-spa-arg)
* Main changes since version 0.3:
   * Adoption of three-letter codes: spa (old es) and arg (old an).
   * Use monolingual packages apertium-spa and apertium-arg (separated monolingual and bilingual data).
   * Trimming of the monolingual dictionary (in both directions, although only the spa monodix is really trimmed     since arg monodix has grown in paralell to the bidix).
   * Added lexical selection support (translate-to-default-equivalent.xsl not used anymore).
   * Improvement in coverage and morphology of Aragonese monolingual dictionary.
     > 552 paradigms,
     > 21271 lemmae (26103 entries, of which, 8150 proper nouns), including 2518 multi-words.
     Na�ve coverage: from 89.8% in an.wikipedia to 92.4% in a corpus of narrative texts.
   * Some improvements and fixes in verbal morphology.
   * Several agrammatical or rare combinations of enclitic pronouns are now ignored.
   * Some changes in the preferred (generated form): 'pa' (instead of 'ta') to translate 'para', 'cantatz' (instead of 'cantat') as p2.pl imperative, 'coixeyo' rather than 'coixeo'...
   * Extended support for dialectal verbal morphology analysis ('anatz', 'veden', 'anirem', 'quisto', benasques strong infinitives)
   * new paradigms ("coch/�n__n", "troz__n", "restaur/ant__n", "contpa__pr")
   * duplicated paradigms eliminated (e.g. "tapi/z__n")
* Main pending Issues:
   * Still room to improve on coverage.
   * Still much to work on transfer.
   * Develop lexical selection rules

Tue 29 Dec 2012 9:00:00 UTC
* Release 0.3
* Improvements since version 0.2:
   * Improved tagger (PoS desambiguator) for Aragonese. Changes in the tagger
   rules (...an.tsx) and unsupervised training in the Wikipedia corpus. 
   * Improvement in coverage and morphology of Aragonese monolingual dictionary.
     > 546 paradigms, 
     > 21073 lemmae (of which, 8047 proper nouns), including 1921 multi-words. 
     ~568 000 different analyzable surface forms, 
     ~117 500 different generated surface forms. 
     Na�ve coverage: from 88.4% in an.wikipedia to 94.0% in a corpus of narrative texts.
   * Some paradigms fixed (e.g. seguir)
   * Some rules fixed (e.g. prep+el+que --> prep+o+que)
   * Some rules added (v�ase --> se veiga, l�aselo --> se lo leiga, behaviour of
   buen, mal before a noun)
   * Improved distinction beteen a<pr> and a<det> (a2) and two meanings of "ta" so
   that the postgenerator can apply different rules to them.
   * Use of apocopated posesives mi, tu, su is dealt with in analysis.
   * Periphrastic past dealt with in analysis (synthetic forms are used for
   generation, although rules for using periphrastic instead are written and
   commented).
* Main pending Issues:
   * Still room to improve on coverage.
   * Still much to work on transfer.

Tue 6 Sept 2011 9:00:00 UTC
* Release 0.2.1
* Solved postgeneration of a (preposition) + a (article f sg) using double
label <a/><a/> for a (article f sg).
* See release 0.2.0 for main pending issues.

Sun 10 Jul 2011 19:20:14 IST
* Release 0.2.0
* First operative bidirectional release.
* Improvements:
   * Aragonese monodix and bidix in the initial release (0.1.0) were checked 
     and corrected.
   * Huge improvement in the completeness of verbal and nominal morphology. 
     Wide paradigms analysing most dialectal forms and orthographic variations 
     (see Note on orthography below).
   * Closed classes completed.
   * Treatment of enclitics (up to combinations of two enclitics), including 
     dialectal non-standard combinations.
   * Treatment of apostrophation (analysis and (post)-generation).
   * Orthographic variation concentrated on monodix.
   * Dialectal variation concentrated (when possible) on monodix.
   * Possesives fixed.
   * Impersonal haber-ie dealt with.
   * Important increase in the dictionary size and coverage. 
     > 510 paradigms, 
     > 17500 lemmae (of which, 8025 proper nouns), including 1786 multi-words. 
     ~500 000 analyzable surface forms, 
     ~120 000 generated surface forms. 
     86.3% na�ve coverage in wikipedia.
   * es monodix trimmed to bidix using ignore label (i="yes").
* Main pending Issues:
   * Still room to improve on coverage.
   * Still much to work on transfer (current version based on a reduced and 
     tuned subset of rules from es-ca).
   * Current POS disambiguator is directly taken from es-ca.
     Note: Orthography is taken from "Academia de l'Aragon�s - Estudio de 
     Filoloch�a Aragonesa", a board created in the II Congreso de l'Aragon�s.  
     http://www.academiadelaragones.org/biblio/EDACAR7_2.pdf . 
     It is the orthography used in Wikipedia in Aragonese 
     (http://an.wikipedia.org). 
     There is  partial compatibility in analysis with other previously used 
     orthographies.

Sun Sep 26 15:42:44 IST 2010
 * Initial release (0.1.0)
 * Caveats:
   - Functions only in an->es direction
   - Several closed category words missing from an analyser
     (including "ir")
   - "Cowboys, Ted!"
     This system has been put together in a very shoddy, MacGuyver-ish
     way:
     The majority of the lexicon has been composed on the basis
     of presumed cognates. For the most part, this has been 
     restricted to Latin derivatives, but on more than one occasion, I
     simply went nuts and pulled in anything the Spanish analyser would
     recognise.
     The only bitexts available were the UN Declaration of Human Rights
     and the welcome message for new users of the Aragonese Wikipedia. 
     Statistical methods were not widely employed.
     To deal with the spelling variations, I abused the heck out of sed,
     filtering unknowns repeatedly before passing the result through the
     analyser, to pluck out the results. Much of the ~8000 words in the
     bilingual lexicon are mere variations. (In a particularly ironic
     twist, it has 3 variations of 'normalizaci�n'). These variants will 
     need to be sorted out to have es->an: the first translation made with
     this system before release was of the document on an.wikipedia 
     describing the new spelling rules.
     Although I got some notes from Juan Pablo Mart�nez on the equivalents
     of ser and estar, I was not able to get further information. My  
     "solution" is to ignore the issue and come back to it later.
     Also, Juan Pablo added some vocabulary to the analyser, most of which
     I have not been able to use for lack of translations. Hopefully, we
     can get these reinstated soon.
     A tagger has yet to be trained for Aragonese; during development, I found
     the Spanish tagger to be sufficient, and so have used that. This is a
     temporary measure.