This repository contains descriptions of collections of historical and contemporary texts. The collections are part of Deutsches Textarchiv (DTA) and Digitales Wörterbuch der deutschen Sprache (DWDS). All descriptions are available as YAML files.
The directory schemata
contains corresponding JSON schema files (in YAML format) as well as a script
which can be used for validation.
The descriptions of DTA and DWDS collections where created in the context of Text+.
All files are being made accessible within the DTA infrastructure under a Creative Commons licence.
This repository provides:
dta
: contains full descriptions of collections of historical texts within DTAdwds
: contains full descriptions of collections of contemporary texts within DWDStextplus
: contains a reduced collection registry-description as basis for discussion within Text+schemata
: contains schema files as well as script for validating
Validate datasets:
schemata/validate-against-schema.pl --schema=schemata/dta.yml dta/*.yml dwds/*.yml
Publish dataset for https://www.deutschestextarchiv.de/textplus/:
schemata/generate-and-publish-datasets.sh
Publish landing pages for https://www.deutschestextarchiv.de/sammlungen/:
make -C landing-pages landing-pages && make -C landing-pages publish
Publish box listing for https://www.dwds.de/collections/:
perl schemata/compile-catalog.pl dta/*.yml dwds/*.yml > presentation/catalog.json
rsync -av presentation/ kaskade:/var/www/collections