|
234 | 234 | <p>Many software packages dedicated to editing spoken language transcription contain
|
235 | 235 | utilities that can convert many formats: for example, <ptr type="software" xml:id="R15"
|
236 | 236 | target="#exmaralda"/>
|
237 |
| - <rs type="soft.name" ref="#R15">EXMARaLDA</rs> ( <rs type="soft.Bib.Ref" target="#R15" |
| 237 | + <rs type="soft.name" ref="#R15">EXMARaLDA</rs> ( <rs type="soft.bib.ref" ref="#R15" |
238 | 238 | ><ref type="bibl" target="#schmidt2004">Schmidt 2004</ref>
|
239 |
| - </rs>; see <rs type="soft.url" target="#R15"><ptr target="https://exmaralda.org" |
| 239 | + </rs>; see <rs type="soft.url" ref="#R15"><ptr target="https://exmaralda.org" |
240 | 240 | /></rs>), <ptr type="software" xml:id="R16" target="#anvil"/>
|
241 |
| - <rs type="soft.name" ref="#R16">Anvil</rs> (<rs type="soft.Bib.Ref" target="#R16"> |
| 241 | + <rs type="soft.name" ref="#R16">Anvil</rs> (<rs type="soft.bib.ref" ref="#R16"> |
242 | 242 | <ref type="bibl" target="#kipp2001">Kipp 2001</ref></rs>; see <rs type="soft.url"
|
243 |
| - target="#R16"><ptr target="https://www.anvil-software.org"/></rs>), and <ptr |
| 243 | + ref="#R16"><ptr target="https://www.anvil-software.org"/></rs>), and <ptr |
244 | 244 | type="software" xml:id="R17" target="#elan"/><rs type="soft.name" ref="#R17">ELAN</rs>
|
245 |
| - (<rs type="soft.bib.ref" target="#R17"><ref type="bibl" target="#wittenburg2006" |
246 |
| - >Wittenburg et al. 2006</ref></rs>; see <rs type="soft.url" target="#R17"> |
| 245 | + (<rs type="soft.bib.ref" ref="#R17"><ref type="bibl" target="#wittenburg2006" |
| 246 | + >Wittenburg et al. 2006</ref></rs>; see <rs type="soft.url" ref="#R17"> |
247 | 247 | <ptr target="https://archive.mpi.nl/tla/elan"/></rs>). However, in all cases, the
|
248 | 248 | conversions are limited to the features implemented in the tool itself—for example, with
|
249 | 249 | a limited set of metadata—and they cannot always be used to prepare data to be used by
|
|
260 | 260 | tools missing in the <ptr type="software" xml:id="R18" target="#teicorpo"/>
|
261 | 261 | <rs type="soft.name" ref="#R18">TEICORPO</rs> approach are <ptr type="software"
|
262 | 262 | xml:id="R19" target="#exmaralda"/><rs type="soft.name" ref="#R19">EXMARaLDA</rs> and
|
263 |
| - <ptr type="software" xml:id="R19" target="#folker"/>FOLKER (<rs type="soft.bib.ref" |
264 |
| - target="#R19"><ref type="bibl" target="#schmidts2010">Schmidt and Schütte |
265 |
| - 2010</ref></rs>; see <rs type="soft.url" target="#R19"><ptr |
| 263 | + <ptr type="software" xml:id="R241" target="#folker"/>FOLKER (<rs type="soft.bib.ref" |
| 264 | + ref="#R19"><ref type="bibl" target="#schmidts2010">Schmidt and Schütte |
| 265 | + 2010</ref></rs>; see <rs type="soft.url" ref="#R241"><ptr |
266 | 266 | target="https://exmaralda.org/en/folker-en/"/></rs>), but this was only because the
|
267 | 267 | conversion tools from and to <ptr type="software" xml:id="R20" target="#EXMARaLDA"/><rs
|
268 | 268 | type="soft.name" ref="#R20">EXMARaLDA</rs>, <ptr type="software" xml:id="R21"
|
|
279 | 279 | <ptr type="software" xml:id="R25" target="#folker"/>
|
280 | 280 | <rs type="soft.name" ref="#R25">FOLKER</rs> software fit within the process chain of
|
281 | 281 | <ptr type="software" xml:id="R26" target="#teicorpo"/><rs type="soft.name"
|
282 |
| - target="#R26"> TEICORPO</rs>. This demonstrates the usefulness of a well-known and |
| 282 | + ref="#R26"> TEICORPO</rs>. This demonstrates the usefulness of a well-known and |
283 | 283 | efficient format such as TEI.</p>
|
284 | 284 | <p>There are, however, differences between the two projects that make them nonredundant
|
285 | 285 | but complementary, each project having specificities that can be useful or damaging
|
286 | 286 | depending on the user’s needs. One minor difference is that the <ptr type="software"
|
287 |
| - xml:id="R27" ref="#teicorpo"/> |
288 |
| - <rs type="soft.name" target="#R27">TEICORPO</rs> project is not a functionality of an |
| 287 | + xml:id="R27" target="#teicorpo"/> |
| 288 | + <rs type="soft.name" ref="#R27">TEICORPO</rs> project is not a functionality of an |
289 | 289 | editing tool, but is a standalone tool for converting data between one format and
|
290 | 290 | another. This had certain effects on the user interface and explains some of the choices
|
291 | 291 | made in the development of the two tools.</p>
|
292 | 292 | <p>There are two major differences between <ptr type="software" xml:id="R28"
|
293 | 293 | target="#teicorpo"/>
|
294 |
| - <rs type="soft.name" target="#R28">TEICORPO</rs> and Schmidt’s approach, which affected |
| 294 | + <rs type="soft.name" ref="#R28">TEICORPO</rs> and Schmidt’s approach, which affected |
295 | 295 | both the design of the tools and how they can be used. The first difference is that in
|
296 |
| - developing <ptr type="software" xml:id="R29" ref="#teicorpo"/><rs type="soft.name" |
297 |
| - target="#R29">TEICORPO</rs>, it was decided that the conversion between the original |
| 296 | + developing <ptr type="software" xml:id="R29" target="#teicorpo"/><rs type="soft.name" |
| 297 | + ref="#R29">TEICORPO</rs>, it was decided that the conversion between the original |
298 | 298 | formats and TEI had to be lossless (or as lossless as possible) because we wanted to
|
299 | 299 | offer a means to store the research data for long-term conservation and dissemination in
|
300 | 300 | a standard XML format instead of in proprietary formats such as those used by <ptr
|
|
1004 | 1004 | <rs type="soft.name" ref="#R117">TEICONVERT</rs> makes spoken language data available
|
1005 | 1005 | for <ptr type="software" xml:id="R118" target="#txm"/><rs type="soft.name" ref="#R118"
|
1006 | 1006 | >TXM</rs> (<rs type="soft.bib.ref" ref="#R118"><ref type="bibl" target="#heiden2010"
|
1007 |
| - >Heiden 2010</ref></rs>; see <rs type="soft.turl" ref="#R118"><ptr |
| 1007 | + >Heiden 2010</ref></rs>; see <rs type="soft.url" ref="#R118"><ptr |
1008 | 1008 | target="http://textometrie.ens-lyon.fr"/></rs>), <ptr type="software" xml:id="R119"
|
1009 | 1009 | target="#letrameur"/>
|
1010 | 1010 | <rs type="soft.name" ref="#R119">Le Trameur</rs> (<rs type="soft.bib.ref" ref="#R119"
|
|
1149 | 1149 | <rs type="soft.name" ref="#R144">TEICORPO</rs> includes the ability to use any
|
1150 | 1150 | syntactic model. For French data, we used the PERCEO model (<ref type="bibl"
|
1151 | 1151 | target="#benzitoun2012">Benzitoun, Fort, and Sagot 2012</ref>).</p>
|
1152 |
| - <p>The command line to be used is: <code>java -cp <ptr type="software" xml:id="R208" |
| 1152 | + <p>The command line to be used is: <ptr type="software" xml:id="R240" target="#java"/><code> |
| 1153 | + <rs type="soft.name" ref="#R240">java</rs> -cp <ptr type="software" xml:id="R208" |
1153 | 1154 | target="#teicorpo"/>
|
1154 | 1155 | <rs type="soft.name" ref="#R208">TEICORPO</rs>.jar fr.ortolang.<ptr type="software"
|
1155 | 1156 | xml:id="R209" target="#teicorpo"/>
|
1156 |
| - <rs type="soft.name" ref="#R209">TEICORPO</rs>.TeiTreeTagger filenames...</code> |
| 1157 | + <rs type="soft.name" ref="#R209">TEICORPO</rs>.<ptr type="software" xml:id="R239" target="#treetagger"/> |
| 1158 | + Tei <rs type="soft.name" ref="#R239">TreeTagger</rs> filenames...</code> |
1157 | 1159 | with additional parameters:</p>
|
1158 | 1160 | <table xml:id="table2">
|
1159 | 1161 | <row role="label">
|
|
1329 | 1331 | <rs type="soft.name" ref="#R153">TreeTagger</rs> . The -model and -syntaxformat
|
1330 | 1332 | parameters can be used in a similar way to specify the grammatical model to be used
|
1331 | 1333 | and the output format. A command line example is:</p>
|
1332 |
| - <p><code>java -cp "teicorpo.jar:directory_for_SNLP/*" fr.ortolang.teicorpo.TeiSNLP |
| 1334 | + <p><code><ptr type="software" xml:id="R236" target="#java"/> |
| 1335 | + <rs type="soft.name" ref="#R236">java</rs> -cp "<ptr type="software" xml:id="R237" target="#teicorpo"/> |
| 1336 | + <rs type="soft.name" ref="#R237">teicorpo</rs>.jar:directory_for_SNLP/*" fr.ortolang.<ptr type="software" xml:id="R238" target="#teicorpo"/> |
| 1337 | + <rs type="soft.name" ref="#R238">teicorpo</rs>.TeiSNLP |
1333 | 1338 | -syntaxformat svalue -model filename.tei_corpo.xml</code></p>
|
1334 | 1339 | <p>The <term>directory_for_SNLP</term> is the name of the location on a computer where
|
1335 | 1340 | all the <ptr type="software" xml:id="R212" target="#stanfordcorenlp"/>
|
|
1392 | 1397 | <p>Export can be done from TEI into a format used by textometric software (see <ptr
|
1393 | 1398 | target="#example_code_11" type="crossref"/>). This is the case for <ptr
|
1394 | 1399 | type="software" xml:id="R160" target="#txm"/><rs type="soft.name" ref="#R160">TXM</rs>,<note>
|
1395 |
| - <p>See the Textométrie website, last updated June 29, 2020, <rs type="soft.ulr" ref="#R160" |
| 1400 | + <p>See the Textométrie website, last updated June 29, 2020, <rs type="soft.url" ref="#R160" |
1396 | 1401 | ><ptr target="http://textometrie.ens-lyon.fr/?lang=en"/></rs>.</p>
|
1397 | 1402 | </note> a textometric software application. In this case, instead of using a partition
|
1398 | 1403 | representation, the information from the grammatical analysis is inserted at the word
|
|
1591 | 1596 | target="https://www.fon.hum.uva.nl/paul/papers/speakUnspeakPraat_glot2001.pdf"
|
1592 | 1597 | />.</bibl>
|
1593 | 1598 | </rs>
|
1594 |
| - <ptr type="software" xml:id="R226" target="#teimata"/> |
| 1599 | + <ptr type="software" xml:id="R226" target="#teimeta"/> |
1595 | 1600 | <rs type="soft.bib.ref" ref="#R226">
|
1596 | 1601 | <bibl xml:id="etienne"><rs type="soft.agent" ref="#R226"><author>Etienne,
|
1597 | 1602 | Carole</author></rs>, <rs type="soft.agent" ref="#R226"><author>Loïc
|
|
0 commit comments