Skip to content

Commit a16bfc3

Browse files
committed
manually merged software list
2 parents 6229749 + 5c0d0f6 commit a16bfc3

12 files changed

+550
-272
lines changed

data/JTEI/14_2021-23/jtei-burnard-shoch-odebrecht-194-source.xml

Lines changed: 25 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -200,15 +200,17 @@
200200
projects, though the TEI Consortium website has for many years offered a platform
201201
for one: <title level="a">Projects Using the TEI,</title> accessed May 17, 2021,
202202
<ptr target="https://tei-c.org/activities/projects/"/>. More recently, the
203-
TEIhub project lists more than 12,500 <ptr type="software" xml:id="GitHub"
204-
target="#GitHub"/><rs type="soft.name" ref="#GitHub">GitHub</rs>-hosted TEI
205-
projects (last updated May 11, 2021, <ptr target="https://teihub.netlify.app/"/>);
206-
an associated bot called TEI Pelican provides a daily twitter feed of new <ptr
207-
type="software" xml:id="GitHub" target="#GitHub"/><rs type="soft.name"
208-
ref="#GitHub">GitHub</rs> repositories containing a TEI header. We are unaware
209-
of any systematic analysis of the application types indicated by these data
210-
sources, but a glance gives the impression that traditional editorial and
211-
resource-building projects predominate.</note>
203+
TEIhub project lists more than 12,500 <ptr type="software" xml:id="R1"
204+
target="#GitHub"/><rs type="soft.name" ref="#R1">GitHub</rs>-hosted TEI
205+
projects (last updated May 11, 2021, <ptr type="software" xml:id="R7"
206+
target="#teipelican"/><rs type="soft.url" ref="#R7"><ptr
207+
target="https://teihub.netlify.app/"/></rs>); an associated bot called <rs
208+
type="soft.url" ref="#R7">TEI Pelican</rs> provides a daily twitter feed of new
209+
<ptr type="software" xml:id="R2" target="#GitHub"/><rs type="soft.name"
210+
ref="#R2">GitHub</rs> repositories containing a TEI header. We are unaware of
211+
any systematic analysis of the application types indicated by these data sources,
212+
but a glance gives the impression that traditional editorial and resource-building
213+
projects predominate.</note>
212214
</p>
213215
<p>The work of the Action<note>Further information about the Action is available from
214216
its website at <ptr target="https://www.distant-reading.net/"/>. For information
@@ -228,9 +230,8 @@
228230
issues of sampling and balance were prepared for discussion and approval by the
229231
members of WG1, and remain available from the Working Group’s website. <note>These
230232
and other documents are available from the Action’s <ptr type="software"
231-
xml:id="GitHub" target="#GitHub"/><rs type="soft.name" ref="#GitHub"
232-
>GitHub</rs> page, accessed May 17, 2021, <ptr
233-
target="https://distantreading.github.io/"/>.</note>
233+
xml:id="R3" target="#GitHub"/><rs type="soft.name" ref="#R3">GitHub</rs> page,
234+
accessed May 17, 2021, <ptr target="https://distantreading.github.io/"/>.</note>
234235
</p>
235236
</div>
236237
<div xml:id="eltec">
@@ -653,7 +654,8 @@
653654
<p>In the ELTeC project, we begin by defining an ODD which selects from TEI all the
654655
components used by any ELTeC schema at any level. This ODD also contains
655656
documentation and specifies usage constraints applicable across every schema. This
656-
base ODD is then processed using the TEI standard odd2odd stylesheet to produce a
657+
base ODD is then processed using the <ptr type="software" xml:id="R8"
658+
target="#odd2odd"/><rs type="soft.name" ref="#R8">TEI standard odd2odd stylesheet</rs> to produce a
657659
stand-alone set of TEI specifications which we call eltec-library. Three different
658660
ODDs, eltec-0, eltec-1, and eltec-2, then derive specific schemas and documentation
659661
for each of the three ELTeC levels, using this library of specifications as a base
@@ -662,13 +664,14 @@
662664
resulting encoding standard. As with other ODDs, we are then able to produce
663665
documentation and formal schemas which reflect exactly the scope of each encoding
664666
level.</p>
665-
<p>The ODD sources and their outputs are maintained on <ptr type="software"
666-
xml:id="GitHub" target="#GitHub"/><rs type="soft.name" ref="#GitHub">GitHub</rs>
667-
and are also <ptr target="http://doi.org/10.5281/zenodo.3546326"/>published on Zenodo
668-
(<ref type="bibl" target="#odebrecht2019">Odebrecht et al. 2019</ref>) along with
669-
the ELTeC corpora.<note>The <ptr type="software" xml:id="GitHub" target="#GitHub"
670-
/><rs type="soft.name" ref="#GitHub">GitHub</rs> repository for the ELTeC
671-
collection (last updated May 17, 2021) is found at <ptr
667+
<p>The ODD sources and their outputs are maintained on <ptr type="software" xml:id="R4"
668+
target="#GitHub"/><rs type="soft.name" ref="#R4">GitHub</rs> and are also <ptr
669+
target="http://doi.org/10.5281/zenodo.3546326"/>published on <ptr type="software" xml:id="R9"
670+
target="#zenodo"/><rs type="soft.name" ref="#R9">Zenodo</rs> (<ref
671+
type="bibl" target="#odebrecht2019">Odebrecht et al. 2019</ref>) along with the
672+
ELTeC corpora.<note>The <ptr type="software" xml:id="R5" target="#GitHub"/><rs
673+
type="soft.name" ref="#R5">GitHub</rs> repository for the ELTeC collection
674+
(last updated May 17, 2021) is found at <ptr
672675
target="https://github.com/COST-ELTeC/"/>; the Zenodo community within which it
673676
is being published (last updated April 11, 2021) lives at <ptr
674677
target="https://zenodo.org/communities/eltec/"/>.</note>
@@ -689,8 +692,8 @@
689692
development and are expected to become available during the coming year. As noted
690693
above, up-to-date information about the current state of all corpora is publicly
691694
visible at <ptr target="http://distantreading.github.io/ELTeC/"/>, which includes
692-
links to the individual <ptr type="software" xml:id="GitHub" target="#GitHub"/><rs
693-
type="soft.name" ref="#GitHub">GitHub</rs> repositories for each corpus.</p>
695+
links to the individual <ptr type="software" xml:id="R6" target="#GitHub"/><rs
696+
type="soft.name" ref="#R6">GitHub</rs> repositories for each corpus.</p>
694697
<p>As well as continuing to expand the collection, and continuing to fine-tune its
695698
composition, we hope to improve the consistency and reliability of the metadata
696699
associated with each text, as far as possible automatically. For example, we have

data/JTEI/14_2021-23/jtei-cc-pn-erjavec-195-source.xml

Lines changed: 54 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -207,10 +207,11 @@
207207
<div xml:id="schema">
208208
<head>The Parla-CLARIN Schema</head>
209209
<p>Parla-CLARIN is written as a TEI ODD document, consisting of the prose guidelines and
210-
the schema specification, on the basis of which it is possible, using the standard TEI
211-
XSLT stylesheets, to derive an XML schema expressed either as a RelaxNG schema, a DTD,
212-
or a W3C schema, which is then used for formal validations of a Parla-CLARIN
213-
parliamentary corpus.</p>
210+
the schema specification, on the basis of which it is possible, using the <ptr
211+
type="software" xml:id="R5" target="#teistylesheets"/><rs type="soft.name" ref="#R5"
212+
>standard TEI XSLT stylesheets</rs>, to derive an XML schema expressed either as a
213+
RelaxNG schema, a DTD, or a W3C schema, which is then used for formal validations of a
214+
Parla-CLARIN parliamentary corpus.</p>
214215
<p>While the proposal tries to cater for many encoding needs, it is possible that new
215216
users will have to use TEI elements or attributes that are not discussed in the prose
216217
guidelines. Since the recommendations are still under development, the formal schema
@@ -324,20 +325,22 @@
324325
<div xml:id="presentation">
325326
<head>Presentation of Parla-CLARIN</head>
326327
<p>Like the TEI Guidelines, the Parla-CLARIN recommendations are available on <ref
327-
target="https://github.com/clarin-eric/parla-clarin/"><ptr type="software"
328-
xml:id="GitHub" target="#GitHub"/><rs type="soft.name" ref="#GitHub"
329-
>GitHub</rs></ref>, as a project<note>Tomaž Erjavec and Andrej Pančur, Parla-CLARIN
330-
project <ptr type="software" xml:id="GitHub" target="#GitHub"/><rs type="soft.name"
331-
ref="#GitHub">GitHub</rs> site, last updated March 17, 2021, <ptr
332-
target="https://github.com/clarin-eric/parla-clarin/"/>.</note> of the CLARIN ERIC
333-
collection. The project contains a folder for the schema (i.e., the Parla-CLARIN ODD
334-
document and XML schemas derived from it), a folder for the programs that convert the
335-
ODD into the XML schemas and to the HTML of the prose and schema definitions, and a
336-
folder for examples, which contains an artificial but fully worked out example of a
337-
Parla-CLARIN document and subfolders with various example resources, where each should
338-
contain: <list rend="ordered">
328+
target="https://github.com/clarin-eric/parla-clarin/"><ptr type="software" xml:id="R1"
329+
target="#GitHub"/><rs type="soft.name" ref="#R1">GitHub</rs></ref>, as a
330+
project<note>Tomaž Erjavec and Andrej Pančur, Parla-CLARIN project <ptr
331+
type="software" xml:id="R2" target="#GitHub"/><rs type="soft.name" ref="#R2"
332+
>GitHub</rs> site, last updated March 17, 2021, <ptr type="software" xml:id="R9"
333+
target="#parlaclarinscripts"/><rs type="soft.url" ref="#R9"><ptr
334+
target="https://github.com/clarin-eric/parla-clarin/"/></rs>.</note> of the CLARIN
335+
ERIC collection. The project contains a folder for the schema (i.e., the Parla-CLARIN
336+
ODD document and XML schemas derived from it), a folder for the <rs type="soft.name"
337+
ref="#R9">programs that convert the ODD into the XML schemas and to the HTML of the
338+
prose and schema definitions</rs>, and a folder for examples, which contains an
339+
artificial but fully worked out example of a Parla-CLARIN document and subfolders with
340+
various example resources, where each should contain: <list rend="ordered">
339341
<item>a sample of a corpus in its source encoding;</item>
340-
<item>XSLT script to convert it into Parla-CLARIN; and</item>
342+
<item><rs type="soft.name" ref="#R9">XSLT script to convert it into Parla-CLARIN</rs>;
343+
and</item>
341344
<item>the output of the conversion.</item>
342345
</list>
343346
</p>
@@ -495,12 +498,15 @@
495498
<p>Nevertheless, AKN is an important schema for modeling parliamentary proceedings,
496499
especially as the primary encoding standard used by various legislative bodies, so some
497500
of AKN’s solutions were used in developing the Parla-CLARIN proposal, in particular the
498-
typology of divisions of a document. Also developed was a partial, but non-trivial,
499-
conversion from AKN to Parla-CLARIN, which covers several AKN example documents. As
500-
mentioned in <ptr type="crossref" target="#presentation"/>, the example documents and
501-
conversion script can be found in the <ident>Examples</ident> folder of the Parla-CLARIN
502-
Git repository. The <ident>akn2tei.xsl</ident> script attempts to preserve the IDs of
503-
the source AKN document, converts the AKN addressee, role, and questions and answers to
501+
typology of divisions of a document. Also developed was a partial, but non-trivial, <ptr
502+
type="software" xml:id="R10" target="#parlaclarinscripts"/><rs type="soft.name"
503+
ref="#R10">conversion from AKN to Parla-CLARIN</rs>, which covers several AKN example
504+
documents. As mentioned in <ptr type="crossref" target="#presentation"/>, the example
505+
documents and conversion script can be found in the <ident>Examples</ident> folder of
506+
the Parla-CLARIN Git repository. The <ptr type="software" xml:id="R11"
507+
target="#parlaclarinscripts"/><rs type="soft.name" ref="#R11"
508+
><ident>akn2tei.xsl</ident></rs> script attempts to preserve the IDs of the source
509+
AKN document, converts the AKN addressee, role, and questions and answers to
504510
Parla-CLARIN, and maps FRBR data (which distinguishes a <soCalled>work</soCalled> from
505511
its <soCalled>expression</soCalled> and its expression from its
506512
<soCalled>manifestation</soCalled>) to the appropriate TEI elements and attributes.
@@ -572,9 +578,10 @@
572578
parliamentary proceedings meant for scholarly investigations. This scheme is currently a
573579
straightforward customization of the TEI Guidelines, with the majority of the effort
574580
having gone into the writing of the prose guidelines of the Parla-CLARIN recommendations
575-
and into developing the conversion from Akoma Ntoso to Parla-CLARIN. We have not included
576-
examples of the encoding, as these are readily available on the <ptr type="software"
577-
xml:id="GitHub" target="#GitHub"/><rs type="soft.name" ref="#GitHub">GitHub</rs>
581+
and into developing the <ptr type="software" xml:id="R12" target="#parlaclarinscripts"
582+
/><rs type="soft.name" ref="#R12">conversion from Akoma Ntoso to Parla-CLARIN</rs>. We
583+
have not included examples of the encoding, as these are readily available on the <ptr
584+
type="software" xml:id="R3" target="#GitHub"/><rs type="soft.name" ref="#R3">GitHub</rs>
578585
documentation page of the project, and large Parla-CLARIN encoded corpora are openly
579586
available.</p>
580587
<p>Apart from the siParl 2.0 corpus mentioned above (<ptr type="crossref"
@@ -601,15 +608,21 @@
601608
<p>As we wanted to have corpora that are not only interchangeable but interoperable as well,
602609
we created a bespoke ParlaMint XML schema directly in RelaxNG – the schema is compatible
603610
with Parla-CLARIN as it validates a subset of documents that would be validated against
604-
Parla-CLARIN. We produced common scripts that can convert any of the four corpora to plain
605-
text, to CoNLL-U format as used by the Universal Dependencies project, and to vertical
606-
format as used by the <ref target="http://cwb.sourceforge.net/">CWB</ref><note>The IMS
607-
Open Corpus Workbench (CWB), last modified March 30, 2021, <ptr
608-
target="http://cwb.sourceforge.net/"/>.</note> and <ref
609-
target="http://www.sketchengine.eu/">Sketch Engine</ref><note>Accessed January 13, 2022,
610-
<ptr target="http://www.sketchengine.eu/"/>.</note> (<ref type="bibl"
611-
target="#kilgarriff14">Kilgarriff et al. 2014</ref>) concordancers, as well as to
612-
extract complete speech metadata into TSV files.</p>
611+
Parla-CLARIN. We produced <ptr type="software" xml:id="R13" target="#parlaclarinscripts"
612+
/><rs type="soft.url" ref="#R13">common scripts that can convert any of the four corpora
613+
to plain text, to CoNLL-U format as used by the Universal Dependencies project, and to
614+
vertical format as used by the <ptr type="software" xml:id="R14" target="#cwb"/><rs
615+
type="soft.url" ref="#R14"><ref target="http://cwb.sourceforge.net/"
616+
>CWB</ref></rs></rs><note>The <rs type="soft.name" ref="#R14">IMS Open Corpus Workbench
617+
(CWB)</rs>, last modified March 30, 2021, <rs type="soft.url" ref="#R14"><ptr
618+
target="http://cwb.sourceforge.net/"/></rs>.</note> and <ptr type="software"
619+
xml:id="R15" target="#sketchengine"/><rs type="soft.url" ref="#R15"><ref
620+
target="http://www.sketchengine.eu/"><rs type="soft.name" ref="#R15">Sketch
621+
Engine</rs></ref></rs><note>Accessed January 13, 2022, <rs type="soft.url"
622+
ref="#R15"><ptr target="http://www.sketchengine.eu/"/></rs>.</note> (<rs
623+
type="soft.bib.ref" ref="#R15"><ref type="bibl" target="#kilgarriff14">Kilgarriff et al.
624+
2014</ref></rs>) concordancers, as well as to extract complete speech metadata into
625+
TSV files.</p>
613626
<p>In order for Parla-CLARIN to achieve its goal of becoming a widely recognized encoding
614627
format for corpora of parliamentary proceedings, significant work remains to be done. On
615628
the basis of the lessons learned in creating ParlaMint, we plan to revise the prose
@@ -619,10 +632,10 @@
619632
specification from the default ones in the TEI Guidelines to ones taken or adapted from
620633
the collected parliamentary corpora.</p>
621634
<p>Second, as we have already done for ParlaMint, we plan to add to the <ptr type="software"
622-
xml:id="GitHub" target="#GitHub"/><rs type="soft.name" ref="#GitHub">GitHub</rs>
623-
Parla-CLARIN project more down-conversion scripts with which we would increase the
624-
usability of the Parla-CLARIN corpora. As mentioned, work also needs to be done to develop
625-
a conversion to RDF.</p>
635+
xml:id="R4" target="#GitHub"/><rs type="soft.name" ref="#R4">GitHub</rs> Parla-CLARIN
636+
project more down-conversion scripts with which we would increase the usability of the
637+
Parla-CLARIN corpora. As mentioned, work also needs to be done to develop a conversion to
638+
RDF.</p>
626639
<p>Last, but not least, one of the great benefits of Git is the ability to support
627640
collaborative work, be it through posting issues, or through using pull requests to
628641
incorporate changes. While the community has not so far made use of these options, we hope
@@ -790,8 +803,8 @@
790803
<bibl xml:id="kilgarriff14"><author>Kilgarriff, Adam</author>, <author>Vít Baisa</author>,
791804
<author>Jan Bušta</author>, <author>Miloš Jakubíček</author>, <author>Vojtěch
792805
Kovář</author>, <author>Jan Michelfeit</author>, <author>Pavel Rychlý</author>, and
793-
<author>Vít Suchomel</author>. <date>2014</date>. <title level="a">The Sketch Engine:
794-
Ten Years On.</title>
806+
<author>Vít Suchomel</author>. <rs type="soft.bib.ref" ref="ewfew"><date>2014</date>.
807+
<title level="a">The Sketch Engine: Ten Years On.</title></rs>
795808
<title level="j">Lexicography: Journal of ASIALEX</title>
796809
<biblScope unit="volume">1</biblScope> (<biblScope unit="issue">1</biblScope>):
797810
<biblScope unit="page">7–36</biblScope>. doi:<idno type="DOI"

0 commit comments

Comments
 (0)