Skip to content
This repository has been archived by the owner on Jul 7, 2024. It is now read-only.

Data Sharing Best Practices

embaldridge edited this page Feb 12, 2013 · 2 revisions

Brainstorming session

General guidelines:

  • Be sensible
  • Your data is interesting, thus it should be shared.
  • Make individual files as easy to understand and work with as possible.
  • Make it as easy as possible to combine with other datasets.

Cross platform compatibility:

  • No proprietary formats: use csv, no Excel, no pdf, no HTML, tab delimited is iffy

Metadata:

  • Simple, clear, easy to read metadata- EML is hard for beginners- defensive metadata

Formatting your data:

  • Do not use cross tab formatting
  • No spaces
  • No extra commas/quotes
  • Don't use null values- have explicit zero vs. null values
  • No special characters
  • Consistent fields (units, type)
  • No extra rows/cols
  • Unicode
  • Meaningful column names, file names- long is okay

Making your data useable for other people:

  • Non-processed data
  • Explicit linking tables
  • Ability to link out
  • Geographic coordinates
  • Taxonomy - maybe how to do unknown taxonomy
  • Scripts to show processing
  • Standard measures

Archiving data:

  • Where to archive- an archive (not on personal website)
  • Explicit licenses, creative commons license - the differences between licenses can be confusing to the novice, so it might be good to have some sort of reference to point people to that explains the different types of licenses using easy to understand language.

Other thoughts:

  • How to best format dates for easy machine reading and conversion
  • Many data types
    • focus on flat file
    • centralizable
  • Spatial data
  • Examples of high profile compilations from small studies