This repository has been archived by the owner on Jul 7, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 19
Data Sharing Best Practices
embaldridge edited this page Feb 12, 2013
·
2 revisions
- Be sensible
- Your data is interesting, thus it should be shared.
- Make individual files as easy to understand and work with as possible.
- Make it as easy as possible to combine with other datasets.
- No proprietary formats: use csv, no Excel, no pdf, no HTML, tab delimited is iffy
- Simple, clear, easy to read metadata- EML is hard for beginners- defensive metadata
- Do not use cross tab formatting
- No spaces
- No extra commas/quotes
- Don't use null values- have explicit zero vs. null values
- No special characters
- Consistent fields (units, type)
- No extra rows/cols
- Unicode
- Meaningful column names, file names- long is okay
- Non-processed data
- Explicit linking tables
- Ability to link out
- Geographic coordinates
- Taxonomy - maybe how to do unknown taxonomy
- Scripts to show processing
- Standard measures
- Where to archive- an archive (not on personal website)
- Explicit licenses, creative commons license - the differences between licenses can be confusing to the novice, so it might be good to have some sort of reference to point people to that explains the different types of licenses using easy to understand language.
- How to best format dates for easy machine reading and conversion
- Many data types
-
- focus on flat file
-
- centralizable
- Spatial data
- Examples of high profile compilations from small studies