-
Notifications
You must be signed in to change notification settings - Fork 21
Things to clarify about the standards
- http://rs.tdwg.org/dwc/terms/guides/text/index.htm ?
- Darwin Core Archive Format, Reference Guide to the XML Descriptor File ?
- In http://rs.tdwg.org/dwc/terms/guides/text/index.htm, there's no default value for the
metadata
attribute ofarchive
. According to the Darwin Core Archive Format, Reference Guide to the XML Descriptor File, if there's no Metafile the metadata should be namedEML.xml
. It seems nothing is specified when we have à Metafile, but no metadata attribute. What would make more sense:
- Also look for a default
EML.xml
? - Assume there's no metadata ?
-
According to the Reference Guide to the XML Descriptor File, Archives without metafiles are composed of a single CSV file (and possibly a metadata file). No more details are given concerning the CSV file, so the reader is left with the job of guessing the CSV dialect (not too difficult to guess) and the file encoding (impossible to guess correctly 100% of the time)
-
According to the standard, the default metadata filename is EML.xml (uppercase). In practice, many archives use eml.xml (lowercase). This hasn't cause any issue yet because the metafile was explicitly specify, but this lowercase/uppercase thing (and how it should work on case-sensitive filesystems) should ideally be clarified.
-
I discovered in a sample Archive coming from e-monocot (http://zingiberaceae.e-monocot.org/dwca.zip) html entities (
"
) in fieldsEnclosedBy... Is that normal/valid? If so, we should update python-dwca-reader. Otherwise, we should inform emonocot that they're producting invalid files.