Releases: AustralianBioCommons/gen3schemadev
Releases · AustralianBioCommons/gen3schemadev
v0.1.0
📦 gen3schemadev Release - Archival Milestone
This release archives the current state of the gen3schemadev
repository as it transitions to focus exclusively on data modeling and schema development. Previously, the repository supported tasks such as data generation, upload, synthetic file generation, and API-based data manipulation. These functionalities will no longer be the focus.
🔧 Release Details
🗂️ Repository Summary
This repository provides tools for automating processes in the Gen3 ecosystem, specifically:
- 📘 Data dictionary creation
- 📊 Data simulation
- ✅ Metadata validation
- 📤 Data submission
🔑 Key Features
1️⃣ gen3schemadev: Object-Relational Mapper for Gen3 Schemas
- Converts spreadsheets into YAML files for building Gen3 Data Dictionaries.
- Example tool: sheet2yaml.py
2️⃣ Workflow for Editing Project Dictionaries
- Edits made in Google Sheets.
- YAML schemas generated and validated locally.
- Simulated data created, validated, and uploaded to Gen3.
- Indexing services configured to integrate new dictionaries.
3️⃣ sheet2yaml-CLI.py: Command-Line Tool
- Generates schemas from Google Sheets/tabs formatted according to the provided template.
4️⃣ Plausible Data Generator
- Enhances simulated data by replacing random values with plausible ones based on defined distributions.
- Input: JSON files and a CSV or Google Sheet describing plausible values.
- Output: Edited JSONs and optional dummy sequencing/lipid files.
Example Usage:
python3 plausible_data_gen.py --path <PATH_TO_SIM_DATA> [--values <PATH_TO_CSV> | --gurl <GOOGLE_SHEET_URL>] --generate-files --file-types aligned_reads
5️⃣ Metadata Validator
- Validates metadata against defined schemas.
- Includes a user guide and Jupyter notebook example.
6️⃣ Gen3 Data Submitter
- Automates data submission to Gen3 with detailed usage instructions.
📌 Supported Workflows
- 🛠️ Schema Development: YAML generation from spreadsheets.
- 🎛️ Data Simulation: Plausible dataset creation and refinement.
- 📑 Metadata Validation: Schema compliance checks.
- 🚀 Data Submission: Automated upload and indexing in Gen3.
🔮 Moving Forward
The repository will now focus exclusively on data modeling and schema development. Other functionalities will no longer be maintained or supported.
What's Changed
- merge null removal by @mshadbolt in #1
- Patch synth data acdc mar 2024 by @JoshuaHarris391 in #3
- 6 metadata json validator by @JoshuaHarris391 in #7
- pull main into gsheet feature by @JoshuaHarris391 in #8
- 4 download gsheet option by @JoshuaHarris391 in #9
- 11 synthetic data file generator by @JoshuaHarris391 in #12
- Feature validation reporter v2 by @JoshuaHarris391 in #14
Full Changelog: https://github.com/AustralianBioCommons/gen3schemadev/commits/v0.1.0