Skip to content

Things to clarify about the standards

Nicolas Noé edited this page Mar 18, 2016 · 4 revisions

Where is the updated, authoritative document about Darwin Core Archives?

Doubts, questions and inconsistencies

  1. In http://rs.tdwg.org/dwc/terms/guides/text/index.htm, there's no default value for the metadata attribute of archive. According to the Darwin Core Archive Format, Reference Guide to the XML Descriptor File, if there's no Metafile the metadata should be named EML.xml. It seems nothing is specified when we have à Metafile, but no metadata attribute. What would make more sense:
  • Also look for a default EML.xml ?
  • Assume there's no metadata ?
  1. According to the Reference Guide to the XML Descriptor File, Archives without metafiles are composed of a single CSV file (and possibly a metadata file). No more details are given concerning the CSV file, so the reader is left with the job of guessing the CSV dialect (not too difficult to guess) and the file encoding (impossible to guess correctly 100% of the time)

  2. According to the standard, the default metadata filename is EML.xml (uppercase). In practice, many archives use eml.xml (lowercase). This hasn't cause any issue yet because the metafile was explicitly specify, but this lowercase/uppercase thing (and how it should work on case-sensitive filesystems) should ideally be clarified.

  3. I discovered in a sample Archive coming from e-monocot (http://zingiberaceae.e-monocot.org/dwca.zip) html entities (") in fieldsEnclosedBy... Is that normal/valid? If so, we should update python-dwca-reader. Otherwise, we should inform emonocot that they're producting invalid files.