Skip to content

acoli-repo/acoli-morph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Morphological resources in OntoLex-Morph

idea:

  • represent all major types of morphological resources in OntoLex
  • inflectional morphology of an OntoLex entry (morphosyntactic features)
  • derivational morphology of an OntoLex entry (morphological segmentation plus linguistic analysis)
  • morphological rules

resources/phenomena covered:

  • uder/: derivation, sample conversion for German, includes bootstrapping of morph:Morphs and morph:DerivationalRules
  • unimorph/: inflection, sample conversion for German, includes bootstrapping of morph:Morphs and morph:InflectionalRules
  • germanet/: compounding, German only, includes bootstrapping of morph:Morphs and morph:CompoundingRules
  • morphisto/: German FST morphology, focus is the preservation of rules in OntoLex, also shows how to use SPARQL to bootstrap a regexp-based morphological generator from morph:InflectionRules.
  • linking/: owl:sameAs links across all aforementioned datasets

release:

  • release/: release of the interlined graph in binary (RDF-HDT) format

The repository contains the release data, to re-build from scratch, run

$> make refresh

OntoLex-Morph is still under development, so this is explorative work, mostly. The vocabulary may change.

At the moment, OntoLex-Morph is capable of covering one-level morphologies, but not two-level morphologies. Morphisto is a two-level morphology for which we provide a reconstruction. The drop in performance in comparison to the native implementation is the omission of two-level morphology rules (i.e., filters and morphophonological processes that operate over "deep" morphology created in the first level)

Multilingual extension

At the moment, we focus on German. However, the converters can be extended to other languages more or less directly:

  • UniMorph: 142 languages
  • UDer/DeriNet: 20 languages (open source data only: Catalan, Croatian, Czech, English, Estonian, Finnish, French, Gaelic, German, Italian, Latin, Persian, Polish, Portuguese, Russian, Serbo-Croatian, Slovenian, Spanish, Swedish, Turkish)

For SFST grammars (ala Morphisto), this is a bit more complicated, as converters need to be adjusted for internal naming conventions. But numerous SFST-based generators/analyzers are in existence, e.g.,