This repository has been archived by the owner on Jul 7, 2024. It is now read-only.

Data Sharing Best Practices

Jump to bottom

embaldridge edited this page Feb 12, 2013 · 2 revisions

Brainstorming session

General guidelines:

Be sensible
Your data is interesting, thus it should be shared.
Make individual files as easy to understand and work with as possible.
Make it as easy as possible to combine with other datasets.

Cross platform compatibility:

No proprietary formats: use csv, no Excel, no pdf, no HTML, tab delimited is iffy

Metadata:

Simple, clear, easy to read metadata- EML is hard for beginners- defensive metadata

Formatting your data:

Do not use cross tab formatting
No spaces
No extra commas/quotes
Don't use null values- have explicit zero vs. null values
No special characters
Consistent fields (units, type)
No extra rows/cols
Unicode
Meaningful column names, file names- long is okay

Making your data useable for other people:

Non-processed data
Explicit linking tables
Ability to link out
Geographic coordinates
Taxonomy - maybe how to do unknown taxonomy
Scripts to show processing
Standard measures

Archiving data:

Where to archive- an archive (not on personal website)
Explicit licenses, creative commons license - the differences between licenses can be confusing to the novice, so it might be good to have some sort of reference to point people to that explains the different types of licenses using easy to understand language.

Other thoughts:

How to best format dates for easy machine reading and conversion
Many data types
- focus on flat file
- centralizable
Spatial data
Examples of high profile compilations from small studies