Public ADL text sources

The Archive for Danish Literature, ADL, comes to you via a collaboration between

As of writing, the corpus comprises 498 volumes with in total 165512 pages of Danish literature. The whole corpus has been encoded using TEI, but only about two-thirds of the pages have been subject to OCR and text encoding. This repository contains all those texts.

We also describe our data and particular our encoding practices. We also give information on how we envisage submissions could be structured.

Getting text

As might have noticed are all the texts in a XML format called Text Encoding Initiative (TEI). For many purposes, if not all, that is a good format.

If you want to extract texts from the files, you can use the the scripts

The first one (get_titles.xsl) creates a list of works inside a TEI file.

xsltproc  get_titles.xsl texts/hcaeventyr01val.xml 
workid57967;Eventyr, fortalte for Børn. Første Samling. Første Hefte. 1885.
workid58084;Fyrtøiet
workid59091;Lille Claus og store Claus
workid61051;Prindsessen paa Ærten
workid61317;Den lille Idas Blomster
workid62461;Eventyr, fortalte for Børn. Første Samling. Andet Hefte. 1885.
workid62544;Tommelise
workid64209;Den uartige Dreng
workid64656;Reisekammeraten

...

The second script (get_the_text.xsl) creates one text file per title in the TEI file.

Finally, you can adapt the shell script extract_stuff.sh to do both things directly.

Contributing documents

Projects with relevant scope can contribute documents to ADL, provided the

Copyright issues are resolved
They are accepted by DSL and KB
The XML is valid TEI

A contribution can be received by branch and pull request in github as is the practice on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 238 Commits
texts		texts
README.md		README.md
extract_stuff.sh		extract_stuff.sh
facsimile-text.md		facsimile-text.md
get_the_text.xsl		get_the_text.xsl
get_titles.xsl		get_titles.xsl
submission-facsimile-text.md		submission-facsimile-text.md
tei_all.rnc		tei_all.rnc
tei_all.rng		tei_all.rng
work-metadata.md		work-metadata.md
work.md		work.md
workflows.md		workflows.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Public ADL text sources

Getting text

Contributing documents

About

Releases

Packages

Languages

manuelciosici/public-adl-text-sources

Folders and files

Latest commit

History

Repository files navigation

Public ADL text sources

Getting text

Contributing documents

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages