Skip to content

deutschestextarchiv/collections

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DTA collections metadata

This repository contains descriptions of collections of historical and contemporary texts. The collections are part of Deutsches Textarchiv (DTA) and Digitales Wörterbuch der deutschen Sprache (DWDS). All descriptions are available as YAML files.

The directory schemata contains corresponding JSON schema files (in YAML format) as well as a script which can be used for validation.

The descriptions of DTA and DWDS collections where created in the context of Text+.

All files are being made accessible within the DTA infrastructure under a Creative Commons licence.

Content

This repository provides:

  • dta: contains full descriptions of collections of historical texts within DTA
  • dwds: contains full descriptions of collections of contemporary texts within DWDS
  • textplus: contains a reduced collection registry-description as basis for discussion within Text+
  • schemata: contains schema files as well as script for validating

HOWTO

Validate datasets:

schemata/validate-against-schema.pl --schema=schemata/dta.yml dta/*.yml dwds/*.yml

Publish dataset for https://www.deutschestextarchiv.de/textplus/:

schemata/generate-and-publish-datasets.sh

Publish landing pages for https://www.deutschestextarchiv.de/sammlungen/:

make -C landing-pages landing-pages && make -C landing-pages publish

Publish box listing for https://www.dwds.de/collections/:

perl schemata/compile-catalog.pl dta/*.yml dwds/*.yml > presentation/catalog.json
rsync -av presentation/ kaskade:/var/www/collections

About

DTA collections metadata

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •