These notes are for a 6 lectures, 4 labs as taught at FaMAF 2017, Cordoba, Argentina:
- Class 1: Intro to IE
- Lab 1: (octroy master branch) perl baseline, maven, UIMA pipeline
- Class 2: Name Entity Recognition
- Lab 2: (octroy branch class 2) OpenNLP part of the pipeline, re-training
- Class 3: Rule-based IE
- Class 4: Statistical (CRFs) IE
- Lab 3: (octroy branch class 3) UIMA RuTA part of the pipeline
- Class 5: Hybrid IE
- Lab 4: (octroy master branch) ClearTk, training and execution
- Class 6: Research directions
Objectives: familiarize the participant with UIMA XMI format, the UIMA Eclipse environment, command-line compilation and execution using Maven. Evaluation using ruta-evaluation-standalone.
Objectives: delve into Apache OpenNLP named entity MaxEnt training and execution within Apache UIMA and outside. Prepare the background for ClearTk.
Objectives: familiarize the participant with UIMA RuTA Workbench. Deployment of UIMA RuTA scripts written in the workbench. Debugging of scripts
Objectives: create CRFs annotators using ClearTk. Feature extraction. Training and deployment.