|
115 | 115 | format. Backward conversion is possible in many cases, with limitations inherent in the
|
116 | 116 | destination target format. <ptr type="software" xml:id="R8" target="#teicorpo"/>
|
117 | 117 | <rs type="soft.name" ref="#R8">TEICORPO</rs> can run the <ptr type="software" xml:id="R9"
|
118 |
| - target="#treetager"/> |
| 118 | + target="#treetagger"/> |
119 | 119 | <rs type="soft.name" ref="#R9">Treetagger</rs> part-of-speech tagger and the <ptr
|
120 | 120 | type="software" xml:id="R10" target="#stanfordcorenlp"/>
|
121 | 121 | <rs type="soft.name" ref="#R10">Stanford CoreNLP</rs> tools on TEI files and can export
|
|
231 | 231 | <div xml:id="similarities">
|
232 | 232 | <head>Similarities with and Differences from Other Approaches</head>
|
233 | 233 | <p>Many software packages dedicated to editing spoken language transcription contain
|
234 |
| - utilities that can convert many formats: for example, <ptr type="software" xml:id="15" |
| 234 | + utilities that can convert many formats: for example, <ptr type="software" xml:id="R15" |
235 | 235 | target="#exmaralda"/><rs type="soft.name" ref="#R15">EXMARaLDA</rs> (<rs
|
236 | 236 | type="Bib.Ref" target="#R15"><ref type="bibl" target="#schmidt2004">Schmidt 2004</ref>
|
237 | 237 | </rs>; see <rs type="URL" target="#R15"><ptr target="https://exmaralda.org"/></rs>),
|
238 |
| - <ptr type="software" xml:id="16" target="#anvil"/> |
| 238 | + <ptr type="software" xml:id="R16" target="#anvil"/> |
239 | 239 | <rs type="soft.name" ref="#R16">Anvil</rs> (<rs type="Bib.Ref" target="#R16">
|
240 | 240 | <ref type="bibl" target="#kipp2001">Kipp 2001</ref></rs>; see <rs type="URL"
|
241 | 241 | target="#R16"><ptr target="https://www.anvil-software.org"/></rs>), and <ptr
|
242 |
| - type="software" xml:id="17" target="#elan"/><rs type="soft.name" ref="#17">ELAN</rs> |
| 242 | + type="software" xml:id="R17" target="#elan"/><rs type="soft.name" ref="#R17">ELAN</rs> |
243 | 243 | (<rs type="bib.ref" target="#R17"><ref type="bibl" target="#wittenburg2006">Wittenburg
|
244 | 244 | et al. 2006</ref></rs>; see <rs type="URL" target="#R17">
|
245 | 245 | <ptr target="https://archive.mpi.nl/tla/elan"/></rs>). However, in all cases, the
|
|
257 | 257 | <p>The list of tools that are considered in the two projects is nearly the same. The only
|
258 | 258 | tools missing in the <ptr type="software" xml:id="R18" target="#teicorpo"/>
|
259 | 259 | <rs type="soft.name" ref="#R18">TEICORPO</rs> approach are <ptr type="software"
|
260 |
| - xml:id="19" target="#exmaralda"/><rs type="soft.name" ref="#R19">EXMARaLDA</rs> and |
| 260 | + xml:id="R19" target="#exmaralda"/><rs type="soft.name" ref="#R19">EXMARaLDA</rs> and |
261 | 261 | <ptr type="software" xml:id="R19" target="#folker"/>FOLKER (<rs type="bib.ref"
|
262 | 262 | target="#R19"><ref type="bibl" target="#schmidts2010">Schmidt and Schütte
|
263 | 263 | 2010</ref></rs>; see <rs type="URL" target="#R19"><ptr
|
|
620 | 620 | tools, a single-level annotation structure within the <gi>spanGrp</gi> elements is
|
621 | 621 | insufficient to represent the complex organization that can be constructed with the
|
622 | 622 | <ptr type="software" xml:id="R78" target="#elan"/><rs type="soft.name" ref="#R78"
|
623 |
| - >ELAN</rs> and <ptr type="software" xml:id="R78" target="#praat"/> |
| 623 | + >ELAN</rs> and <ptr type="software" xml:id="R79" target="#praat"/> |
624 | 624 | <rs type="soft.name" ref="#R79">Praat</rs> tools. <ptr type="software" xml:id="R80"
|
625 | 625 | target="#elan"/><rs type="soft.name" ref="#R80">ELAN</rs> is a tool used by many
|
626 | 626 | researchers to describe data of greater complexity than the data presented in the
|
|
792 | 792 | <figure xml:id="fig4">
|
793 | 793 | <graphic url="media/image2.PNG" width="620px" height="980px"/>
|
794 | 794 | <head type="legend"><ptr type="software" xml:id="R98" target="#elan"/><rs
|
795 |
| - type="soft.name" ref="#98">ELAN</rs> example of a temporal division</head> |
| 795 | + type="soft.name" ref="#R98">ELAN</rs> example of a temporal division</head> |
796 | 796 | </figure>
|
797 | 797 | <figure xml:id="example_code_4">
|
798 | 798 | <egXML xmlns="http://www.tei-c.org/ns/Examples">
|
|
851 | 851 | corpora to be used with other editing tools, some of which are suited to specific
|
852 | 852 | processing: for example, <ptr type="software" xml:id="R104" target="#praat"/>
|
853 | 853 | <rs type="soft.name" ref="#R104">Praat</rs> for phonetics/phonology; <ptr
|
854 |
| - type="software" xml:id="#R105" target="#transcriber"/> |
| 854 | + type="software" xml:id="R105" target="#transcriber"/> |
855 | 855 | <rs type="soft.name" ref="#R105">Transcriber</rs>/<ptr type="software" xml:id="R106"
|
856 | 856 | target="#clan"/>
|
857 | 857 | <rs type="soft.name" ref="#R106">CLAN</rs> for raw transcription; and <ptr
|
|
1076 | 1076 | <rs type="soft.name" ref="#R126">CLAN</rs> , <ptr type="software" xml:id="R127"
|
1077 | 1077 | target="#elan"/><rs type="soft.name" ref="#R127">ELAN</rs>, <ptr type="software"
|
1078 | 1078 | xml:id="R128" target="#praat"/>
|
1079 |
| - <rs type="soft.name" ref="R128">Praat</rs>, <ptr type="software" xml:id="R129" |
| 1079 | + <rs type="soft.name" ref="#R128">Praat</rs>, <ptr type="software" xml:id="R129" |
1080 | 1080 | target="#transcriber"/>
|
1081 | 1081 | <rs type="soft.name" ref="#R129">Transcriber</rs>, nor of course in TEI format.</p>
|
1082 | 1082 | <p><ptr type="software" xml:id="R130" target="#teicorpo"/>
|
|
1094 | 1094 | <rs type="soft.name" ref="#R134">TEICORPO</rs>: <ptr type="software" xml:id="R135"
|
1095 | 1095 | target="#treetagger"/>
|
1096 | 1096 | <rs type="soft.name" ref="#R135">TreeTagger</rs> and <ptr type="software" xml:id="R136"
|
1097 |
| - target="#corenlp"/> |
| 1097 | + target="#stanfordcorenlp"/> |
1098 | 1098 | <rs type="soft.name" ref="#R136">CoreNLP</rs>.</p>
|
1099 | 1099 | <div xml:id="treetagger">
|
1100 | 1100 | <head><ptr type="software" xml:id="R138" target="#treetagger"/>
|
|
1118 | 1118 | <rs type="soft.name" ref="#R140">TEICORPO</rs> should be used to generate an annotated
|
1119 | 1119 | file with lemma and POS information based on <ptr type="software" xml:id="R141"
|
1120 | 1120 | target="#treetagger"/>
|
1121 |
| - <rs type="soft.name" ref="#141">TreeTagger</rs>. <ptr type="software" xml:id="142" |
| 1121 | + <rs type="soft.name" ref="#R141">TreeTagger</rs>. <ptr type="software" xml:id="R142" |
1122 | 1122 | target="#treetagger"/>
|
1123 |
| - <rs type="soft.name" ref="#142">TreeTagger</rs> should be installed separately. The |
1124 |
| - implementation of <ptr type="software" xml:id="143" target="#treetagger"/> |
1125 |
| - <rs type="soft.name" ref="#143">TreeTagger</rs> in <ptr type="software" xml:id="R144" |
| 1123 | + <rs type="soft.name" ref="#R142">TreeTagger</rs> should be installed separately. The |
| 1124 | + implementation of <ptr type="software" xml:id="R143" target="#treetagger"/> |
| 1125 | + <rs type="soft.name" ref="#R143">TreeTagger</rs> in <ptr type="software" xml:id="R144" |
1126 | 1126 | target="#teicorpo"/>
|
1127 | 1127 | <rs type="soft.name" ref="#R144">TEICORPO</rs> includes the ability to use any
|
1128 | 1128 | syntactic model. For French data, we used the PERCEO model (<ref type="bibl"
|
|
1150 | 1150 | <gi>filename</gi></p></cell>
|
1151 | 1151 | <cell><p><gi>filename</gi> is the full location of the <ptr type="software"
|
1152 | 1152 | xml:id="R146" target="#treetagger"/>
|
1153 |
| - <rs type="soft.name" ref="#146">TreeTagger</rs> program, according to the system |
| 1153 | + <rs type="soft.name" ref="#R146">TreeTagger</rs> program, according to the system |
1154 | 1154 | used (Windows, MacOS, or Linux).</p></cell>
|
1155 | 1155 | </row>
|
1156 | 1156 | <row>
|
|
1163 | 1163 | <p>The environment variable TREE_TAGGER can be used to locate the model and the program.
|
1164 | 1164 | If no <code>-program</code> option is used, the default name for the <ptr
|
1165 | 1165 | type="software" xml:id="R147" target="#treetagger"/>
|
1166 |
| - <rs type="soft.name" ref="#147">TreeTagger</rs> program is used.</p> |
| 1166 | + <rs type="soft.name" ref="#R147">TreeTagger</rs> program is used.</p> |
1167 | 1167 | <p>The <code>-model</code> parameter is mandatory.</p>
|
1168 | 1168 | <p>The resulting filename ends with <code>.tei_corpo_ttg.tei_corpo.xml</code> or a
|
1169 | 1169 | specific name provided by the user (option <code>-o</code>).</p>
|
|
1279 | 1279 | </div>
|
1280 | 1280 | <div xml:id="stanford">
|
1281 | 1281 | <head><ptr type="software" xml:id="R148" target="#stanfordcorenlp"/>
|
1282 |
| - <rs type="soft.name" ref="#148">Stanford CoreNLP</rs></head> |
| 1282 | + <rs type="soft.name" ref="#R148">Stanford CoreNLP</rs></head> |
1283 | 1283 | <p><ptr type="software" xml:id="R149" target="#stanfordcorenlp"/>
|
1284 |
| - <rs type="soft.name" ref="#149">The Stanford Core Natural Language Processing</rs><note> |
1285 |
| - <p>Accessed March 11, 2021, <rs type="url" ref="#149"><ptr |
| 1284 | + <rs type="soft.name" ref="#R149">The Stanford Core Natural Language Processing</rs><note> |
| 1285 | + <p>Accessed March 11, 2021, <rs type="url" ref="#R149"><ptr |
1286 | 1286 | target="https://nlp.stanford.edu/software/"/></rs>.</p>
|
1287 | 1287 | </note> (<ptr type="software" xml:id="R150" target="#stanfordcorenlp"/>
|
1288 | 1288 | <rs type="soft.name" ref="#R150">CoreNLP</rs>) package is a suite of tools (<rs
|
|
1437 | 1437 | recent developments (see <ref type="bibl" target="#badin2021">Badin et al. 2021</ref>)
|
1438 | 1438 | made it possible to insert metadata stored in CSV files (including participant metadata)
|
1439 | 1439 | into the TEI files. This makes it possible to achieve more powerful corpus analysis
|
1440 |
| - using a tool such as <ptr type="software" xml:id="R177" target="txm"/><rs |
| 1440 | + using a tool such as <ptr type="software" xml:id="R177" target="#txm"/><rs |
1441 | 1441 | type="soft.name" ref="#R177">TXM</rs>.</p>
|
1442 | 1442 | <p>Our approach is somewhat similar to what is suggested in the conclusion of Schmidt,
|
1443 | 1443 | Hedeland, and Jettka (<ref type="bibl" target="#schmidt2017">2017</ref>), who describe a
|
|
1465 | 1465 | <div xml:id="conclusion">
|
1466 | 1466 | <head>Conclusion</head>
|
1467 | 1467 | <p><ptr type="software" xml:id="R183" target="#teicorpo"/>
|
1468 |
| - <rs type="soft.name" ref="R183">TEICORPO</rs> is a functional tool, created by the CORLI |
| 1468 | + <rs type="soft.name" ref="#R183">TEICORPO</rs> is a functional tool, created by the CORLI |
1469 | 1469 | network and ORTOLANG, that converts files created by software specializing in editing
|
1470 | 1470 | spoken-language data into TEI format. The result is fully compatible with the most recent
|
1471 | 1471 | developments in TEI, especially those that concern spoken-language material.</p>
|
|
0 commit comments