This GitHub site has been set up for development of FAIRER Data Management tools.
FAIRER = FAIR + Ethical + Reproducible
FAIR (Findable, Accessible, Interoperable, and Reusable) principles emphasize machine-actionability and have become a global norm across all data domains. However, data that are FAIR in all respects are not necessarily transparent although ethical and reproducible principles are also required across all domains.
In 2023, the international consortium, Common Infrastructure for National Cohorts in Europe, Canada, and Africa (CINECA) stated (in Deliverable D7.3, First recommendations for implementation in IT Framework, including a Data Management Plan) that: * "While the FAIR principles have become a guiding technical resource for data sharing, legal and socio-ethical considerations are equally important for a fair data ecosystem for further uses of … data. FAIR data should be FAIRER, including also ethical and reproducible as key components” * (CINECA 2023). Data in particular fields may have additional data qualities (e.g., real-time, resilient, high availability, timely, equitable, spatial, timeseries, connected, etc.), these attributes don’t necessarily apply across the board. The FAIRER acronym embodies a universally essential set of data management principles that should apply across all data domains with Ethical + Reproducible capturing the requirement for transparency.
Ethical data means that: (a) Data are collected and managed in compliance with relevant government and professional codes of conduct, values and ethics, scientific integrity and responsible conduct of research; (b) Restricted, confidential, and sensitive data are handled appropriately, for example by implementing user authentication and controlled access to the data and and/or data anonymization and de-identification; (c) A statement is made as to whether or not Indigenous considerations exist and where applicable, Indigenous data sovereignty is respected and data are managed in accordance with CARE, OCAP and UNDRIP principles; (d) Data assets are managed in a manner such that data used as input to Big Data or Artificial Intelligence applications can be confirmed to be relevant, accurate, and up-to-date, and can be tested for unintended biases (GC 2019); (e) Authors and contributors are identified and contact person information is provided.
Reproducible data and code means that the final data and code are computationally reproducible within some tolerance interval or defined limits of precision and accuracy, i.e. a 3rd party will be able to verify the data lineage and processing, reanalyze the data and obtain consistent computational results using the same input raw data, computational steps, methods, computer software & code, and conditions of analysis in order to determine if the same result emerges from the reprocessing and reanalysis. “Same result” can mean different things in different contexts: identical measures in a fully deterministic context, the same numeric results but differing in some irrelevant detail, statistically similar results in a non-deterministic context, or validation of a hypothesis. All data and code are made available for 3rd-party verification of reproducibility. Note that reproducibility is a different concept from replicability. In the latter case, the final published data are linked to sufficiently detailed methods and information for a 3rd-party to be able to verify the results based on the independent collection of new raw data using similar or different methods but leading to comparable results. See, also, NASEM 2019.
For science programs, if our mission is to collect and manage world-class scientific data, our vision being that scientific data are treated as a publicly available national asset, if our goal is state-of-the-art scientific data management, stewardship, and infrastructure that maximizes the public value of our scientific data, if our work is grounded in values of scientific integrity and excellence and principles of Open Science by-default-by-design, and if our work is agile, enabling, trusted, purposeful, and user-centred, then the evidence of success of the mission is tidy, FAIRER data.
Austin CC (2020). The Open Science Ecosystem: A Systematic Framework Anchored in Values, Ethics, and FAIRER Data. SSRN Preprint.
CINECA (2023). Deliverable D7.3: First recommendations for implementation in IT Framework, including a Data Management Plan. Common Infrastructure for National Cohorts in Europe, Canada, and Africa.
NASEM (2019). Reproducibility and Replicability in Science. National Academies of Sciences, Engineering, and Medicine.
This GitHub site is developed and maintained by Dominique Charles and Claire C. Austin who work at Environment and Climate Change Canada.
Cite as: Dominique Charles and Claire C. Austin* (2024). * FAIRER data management: Github repository *. https://github.com/FAIRERdata
*Corresponding author: Claire C. Austin
Disclaimer: All views and opinions expressed are those of the co-authors, and do not necessarily reflect the official policy or position of their respective employers, or of any government, agency, or organization.
Licence: Creative Commons CC-BY-SA
Logo adapted by Dominique Charles from: Sangya Pundir, CC-BY-SA
Note: Individual tools on this site may have a different author statement, and different copyright/licence.