Coptic Dependency Parser

⚠️ Warning: This program is still at an experimental stage. Use it at your own risk!

A neural-symbolic dependency parser for Coptic texts, combining state-of-the-art neural dependency parsing with symbolic Prolog-based grammatical validation.

📖 Description

The Coptic Dependency Parser is a specialized NLP tool designed for analyzing Coptic texts using Universal Dependencies formalism. It integrates:

Neural Parsing: DiaParser (biaffine attention parser) for accurate dependency structure prediction
Symbolic Validation: Prolog-based grammatical rules for Coptic-specific syntactic patterns
Text Normalization: Automatic handling of combining diacritical marks to prevent unknown tokens
Interactive Visualization: Graphical dependency tree display with transliteration

This hybrid neural-symbolic approach enhances parsing accuracy by leveraging both data-driven learning and explicit linguistic knowledge of Coptic grammar, including:

VSO (Verb-Subject-Object) word order patterns
Tripartite nominal sentences (Subject-Copula-Predicate)
Coptic article and pronoun systems
Morphological agreement rules

✨ Features

Core Functionality

🔍 Dependency Parsing: Neural dependency parsing using DiaParser trained on Coptic Scriptorium data
📝 Interactive GUI: User-friendly interface with virtual Coptic keyboard
🌳 Tree Visualization: Graphical dependency trees with arc labels and POS tags
📊 Multiple Export Formats: HTML and PDF export of parsing results
⚙️ Text Normalization: Automatic preprocessing to handle combining diacritics
✅ Grammatical Validation: Prolog-based pattern matching and error detection

Linguistic Features

Tripartite Pattern Recognition: Automatic detection of Coptic nominal sentences
VSO Word Order Validation: Coptic-specific syntactic constraint checking
Morphological Analysis: Article stripping and clitic identification
POS Tagging: Stanza-based part-of-speech tagging for Coptic
Lemmatization: Basic lemmatization support

🚀 Installation

Prerequisites

Python 3.8 or higher

SWI-Prolog (for Prolog integration)

# Ubuntu/Debian
sudo apt-get install swi-prolog

# macOS
brew install swi-prolog

# Windows: Download from https://www.swi-prolog.org/download/stable

Installation Steps

Clone the repository

git clone https://github.com/Rogaton/coptic-dependency-parser.git
cd coptic-dependency-parser

Install Python dependencies
```
pip install -r requirements.txt
```

Download Stanza models for Coptic

python3 -c "import stanza; stanza.download('cop')"

Download or train the DiaParser model

The parser can work with either:
- Pre-trained Coptic model (place in models/cop.diaparser)
- Stanza's built-in dependency parser (automatic fallback)
See config.py for model path configuration.

📚 Usage

Running the GUI Application

python3 coptic-parser.py

Using the Parser

Input Text: Type or paste Coptic text in the input field, or use the virtual keyboard
Parse: Click "Parse & Analyze Dependencies" to process the text
View Results:
- Parse Text Tab: See detailed token-level analysis
- Dependency Graph Tab: Navigate through visual dependency trees
- Dependency Table Tab: Export results to HTML or PDF

Example Input

ⲁⲛⲟⲕ ⲡⲉ ⲡⲛⲟⲩⲧⲉ

This tripartite sentence ("I am God") will be analyzed with:

Dependency structure showing subject-copula-predicate relations
Automatic pattern recognition
POS tagging and lemmatization

Programmatic Usage

from coptic_prolog_rules import create_prolog_engine
import stanza

# Initialize NLP pipeline
nlp = stanza.Pipeline('cop', processors='tokenize,pos,lemma,depparse')

# Initialize Prolog validation
prolog = create_prolog_engine()

# Parse text
text = "ⲁⲛⲟⲕ ⲡⲉ ⲡⲛⲟⲩⲧⲉ"
doc = nlp(text)

# Validate with Prolog
for sentence in doc.sentences:
    words = [word.text for word in sentence.words]
    pos_tags = [word.upos for word in sentence.words]
    heads = [word.head for word in sentence.words]
    deprels = [word.deprel for word in sentence.words]

    validation = prolog.validate_parse_tree(words, pos_tags, heads, deprels)
    print(validation)

📁 Project Structure

coptic-dependency-parser/
├── coptic-parser.py              # Main GUI application
├── coptic_prolog_rules.py        # Prolog integration module
├── coptic_text_normalizer.py    # Text preprocessing
├── config.py                     # Configuration management
├── coptic_grammar.pl             # Prolog dependency grammar rules
├── coptic_lexicon.pl             # Coptic lexical database (6,842+ entries)
├── requirements.txt              # Python dependencies
├── LICENSE                       # CC BY-NC-SA 4.0 License
├── README.md                     # This file
├── tools/                        # Evaluation and comparison tools
│   ├── parser_comparison_tool.py # Compare with CopticScriptorium
│   └── evaluate_baseline.py     # Baseline performance evaluation
├── docs/                         # Documentation
│   └── evaluation/              # Evaluation reports and documentation
│       ├── CORPUS_COMPARATIVE_ANALYSIS.md
│       └── COPTICSCRIPTORIUM_README.md
├── data/
│   ├── depparse/                # Training/evaluation data (CoNLL-U format)
│   ├── lexicon/                 # Lexical resources
│   └── tokenize/                # Tokenization data
└── models/
    └── cop.diaparser            # DiaParser model (download separately)

🔧 Configuration

Model paths and settings can be configured in config.py:

# Set custom model path via environment variable
export COPTIC_DIAPARSER_MODEL=/path/to/your/cop.diaparser

# Or place model file in:
# ./models/cop.diaparser

📊 Data Sources

This parser uses linguistic data from:

Coptic Scriptorium - UD-annotated Coptic corpus for training
Universal Dependencies - Dependency annotation scheme
Comprehensive Coptic Lexicon - Extracted morphological information

📈 Evaluation & Comparison

Parser Comparison Tool

Compare the dependency parser with CopticScriptorium's morpheme-level tagger to understand their complementary strengths:

# Run comparison on example texts
python3 tools/parser_comparison_tool.py

# Compare specific text
python3 tools/parser_comparison_tool.py "ⲁϥⲥⲱⲧⲙ ⲙⲙⲟϥ"

Key Differences:

Dependency Parser: Word-level tokenization, syntactic structure, UD framework
CS Tagger: Morpheme-level segmentation, TreeTagger format, corpus annotation

Both tools share underlying components (Till analyzers, normalization) but serve different research purposes. See COPTICSCRIPTORIUM_README.md for details.

Performance Metrics

Evaluated across diverse Coptic text genres:

Corpus Type	Coverage	Characteristics
Documentary Papyri	95.6%	Simple, formulaic syntax
Monastic Literature	93.7%	Standardized prescriptive language
Biblical Texts	93.1%	Translation Greek (Koine → Sahidic)
Literary Texts	82.4%	Complex rhetorical structures

The parser achieves 82-96% coverage using Till's grammar modules (§35-50 Articles, §292-304 Conjunctions, §309-319 Negations, §245-268 Morphology). See CORPUS_COMPARATIVE_ANALYSIS.md for detailed evaluation results.

Running Baseline Evaluation

# Evaluate parser performance on test corpora
python3 tools/evaluate_baseline.py

# Evaluate on specific corpus files
python3 tools/evaluate_baseline.py corpus1.txt corpus2.txt

🧪 Technical Details

Architecture

Neural Parser: BiAffine attention mechanism (Dozat & Manning, 2017)
POS Tagger: Stanza neural pipeline for Coptic
Validation Engine: PySwip integration with SWI-Prolog
Visualization: Matplotlib-based dependency graph rendering

Performance

Handles multi-sentence documents
Real-time validation with Prolog constraints
Supports batch processing via command-line interface

Linguistic Patterns Supported

VSO transitive/intransitive sentences
Tripartite nominal sentences (standard & converted)
Determiner-noun phrases with gender agreement
Adjective modification (post-nominal)
Prepositional phrases
Coordination structures

📄 License

Licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)

You are free to:

Share — copy and redistribute the material in any medium or format
Adapt — remix, transform, and build upon the material

Under the following terms:

Attribution — You must give appropriate credit
NonCommercial — You may not use the material for commercial purposes
ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license

Third-Party Dependencies: Retain their respective original licenses (Apache 2.0, MIT, etc.). See LICENSES/ directory for details.

🙏 Credits and Attribution

This parser integrates multiple open-source NLP tools and resources:

Core NLP Tools

Coptic Scriptorium
- Coptic NLP models and annotated corpus
- Citation: Zeldes, A., & Schroeder, C. T. (2016). "An NLP Pipeline for Coptic"
- License: Creative Commons licenses (varies by component)
Stanza - Stanford NLP Library
- Tokenization, POS tagging, and lemmatization for Coptic
- Citation: Qi, P., Zhang, Y., Zhang, Y., Bolton, J., & Manning, C. D. (2020). "Stanza: A Python Natural Language Processing Toolkit for Many Human Languages"
- License: Apache 2.0
DiaParser - Biaffine Dependency Parser
- Neural dependency parsing implementation
- Citation: Attardi, G., et al. (2009)
- License: Apache 2.0
Stanford CoreNLP
- Neural dependency parsing architecture
- Citation: Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. J., & McClosky, D. (2014). "The Stanford CoreNLP Natural Language Processing Toolkit"
- License: GPL v3+

Linguistic Resources

Universal Dependencies Project - Annotation scheme and guidelines
Comprehensive Coptic Lexicon - Morphological database

Additional Libraries

PySwip - SWI-Prolog Python interface
Matplotlib - Visualization library
WeasyPrint - PDF export functionality

🐛 Known Issues

Experimental stage: Results should be manually validated
Large model files (50+ MB) require Git LFS or separate download
Prolog integration requires SWI-Prolog installation
Some rare Coptic constructions may not be recognized

🤝 Contributing

As this is an experimental research project, contributions and feedback are welcome:

Report bugs or issues via GitHub Issues
Suggest linguistic patterns or grammatical rules
Contribute test cases or example texts

📧 Contact

For questions, suggestions, or collaboration inquiries:

Author: André Linden
Email: relanir@bluewin.ch
Project: Coptic NLP Research

🔗 Related Resources

📜 Citation

If you use this parser in your research, please cite:

@software{coptic_dependency_parser,
  author = {Linden, André},
  title = {Coptic Dependency Parser: A Neural-Symbolic Approach},
  year = {2024-2025},
  url = {https://github.com/Rogaton/coptic-dependency-parser},
  note = {Experimental version}
}

Note: This is research software under active development. Always verify parsing results manually for critical applications.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
docs/evaluation		docs/evaluation
models		models
tools		tools
.gitignore		.gitignore
EXTENDED_TESTING_REPORT.md		EXTENDED_TESTING_REPORT.md
FULL_INTEGRATION_REPORT.md		FULL_INTEGRATION_REPORT.md
IMPROVEMENTS_SUMMARY.md		IMPROVEMENTS_SUMMARY.md
LICENSE		LICENSE
QUICK_START.md		QUICK_START.md
README.md		README.md
TILL_INTEGRATION_STATUS.md		TILL_INTEGRATION_STATUS.md
config.py		config.py
coptic-parser.py		coptic-parser.py
coptic_articles_till.py		coptic_articles_till.py
coptic_conjunctions_till.py		coptic_conjunctions_till.py
coptic_dialect_handler.py		coptic_dialect_handler.py
coptic_dialect_identifier.py		coptic_dialect_identifier.py
coptic_grammar.pl		coptic_grammar.pl
coptic_lexicon.pl		coptic_lexicon.pl
coptic_morphology_till.py		coptic_morphology_till.py
coptic_negation_till.py		coptic_negation_till.py
coptic_pretokenization_morphology.py		coptic_pretokenization_morphology.py
coptic_prolog_rules.py		coptic_prolog_rules.py
coptic_pronouns_prepositions_till.py		coptic_pronouns_prepositions_till.py
coptic_proper_names.py		coptic_proper_names.py
coptic_text_normalizer.py		coptic_text_normalizer.py
coptic_tree_builder.py		coptic_tree_builder.py
coptic_tree_structures.py		coptic_tree_structures.py
requirements.txt		requirements.txt
test_full_parser.py		test_full_parser.py
test_parser_on_corpus.py		test_parser_on_corpus.py

License

Rogaton/coptic-dependency-parser

Folders and files

Latest commit

History

Repository files navigation