Skip to content

Releases: AustralianBioCommons/gen3schemadev

v0.1.0

15 Jan 02:18
Compare
Choose a tag to compare

📦 gen3schemadev Release - Archival Milestone

This release archives the current state of the gen3schemadev repository as it transitions to focus exclusively on data modeling and schema development. Previously, the repository supported tasks such as data generation, upload, synthetic file generation, and API-based data manipulation. These functionalities will no longer be the focus.


🔧 Release Details

🗂️ Repository Summary

This repository provides tools for automating processes in the Gen3 ecosystem, specifically:

  • 📘 Data dictionary creation
  • 📊 Data simulation
  • Metadata validation
  • 📤 Data submission

🔑 Key Features

1️⃣ gen3schemadev: Object-Relational Mapper for Gen3 Schemas

  • Converts spreadsheets into YAML files for building Gen3 Data Dictionaries.
  • Example tool: sheet2yaml.py

2️⃣ Workflow for Editing Project Dictionaries

  • Edits made in Google Sheets.
  • YAML schemas generated and validated locally.
  • Simulated data created, validated, and uploaded to Gen3.
  • Indexing services configured to integrate new dictionaries.

3️⃣ sheet2yaml-CLI.py: Command-Line Tool

  • Generates schemas from Google Sheets/tabs formatted according to the provided template.

4️⃣ Plausible Data Generator

  • Enhances simulated data by replacing random values with plausible ones based on defined distributions.
  • Input: JSON files and a CSV or Google Sheet describing plausible values.
  • Output: Edited JSONs and optional dummy sequencing/lipid files.

Example Usage:

python3 plausible_data_gen.py --path <PATH_TO_SIM_DATA> [--values <PATH_TO_CSV> | --gurl <GOOGLE_SHEET_URL>] --generate-files --file-types aligned_reads 

5️⃣ Metadata Validator

  • Validates metadata against defined schemas.
  • Includes a user guide and Jupyter notebook example.

6️⃣ Gen3 Data Submitter

  • Automates data submission to Gen3 with detailed usage instructions.

📌 Supported Workflows

  • 🛠️ Schema Development: YAML generation from spreadsheets.
  • 🎛️ Data Simulation: Plausible dataset creation and refinement.
  • 📑 Metadata Validation: Schema compliance checks.
  • 🚀 Data Submission: Automated upload and indexing in Gen3.

🔮 Moving Forward

The repository will now focus exclusively on data modeling and schema development. Other functionalities will no longer be maintained or supported.

What's Changed

Full Changelog: https://github.com/AustralianBioCommons/gen3schemadev/commits/v0.1.0