-
Notifications
You must be signed in to change notification settings - Fork 39
TEI Import format
Ronald Haentjens Dekker edited this page Jul 11, 2014
·
2 revisions
TEI input:
Discussion 2014-07-08 (Lausanne) David J. Birnbaum / Ronald Haentjens Dekker:
- each witness in a separate TEI document
- take the
<body>
element (ignore the rest) - get rid of the hierarchy by converting tags into ranges or milestones
- tokenize on whitespace and punctuation (djb: is this what we should do with punctuation?)
- create normalized version
- collate
- generate variant graph
- TEI output issue: you can't raise the hierarchy again in a direct way because the collation markup introduces an overlapping hierarchy
- Solution: not responsibility of CollateX to raise hierarchy again; output with the milestones in place (attach milestone to the nearest token - with "nearest" still to be defined)