Skip to content

Conversation

@puja-trivedi
Copy link
Collaborator

@puja-trivedi puja-trivedi commented Dec 15, 2025

This PR contains changes to the LibraryGeneration Model that were made to unify the information between the MOWG metadata spreadsheet and the NIMP terminology list. Specifically focusing on if every attribute has a BICAN UUID and if that matches what is in the NIMP database.

…dCellSample.sample_preparation_date based off NIMP terminology list and MOWG spreadsheet
…mple.sample_preparation_date based off NIMP terminology list and MOWG spreadsheet
…abel_barcode. Also generalized dissociated_cell_sample_cell_label_barcode valueset name to cell_label_barcode since both EnrichedCellSample and DissociatedCellSample use this enum. Lastly, removed assigned of 'terminology_nhash' column to 'meaning'.
…based off NIMP terminology list and MOWG spreadsheet
…ff NIMP terminology list and MOWG spreadsheet
…off NIMP terminology list and MOWG spreadsheet. Also added more permissible values to barcoded_cell_sample_technique based off NIMP terminology list
…NIMP terminology list and MOWG spreadsheet. Also added more permissible values to library_technique based off NIMP terminology list
@gemini-code-assist
Copy link

Summary of Changes

Hello @puja-trivedi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements a significant standardization and expansion of the data schema related to library generation processes. It involves a comprehensive refactoring of attribute and slot definitions across multiple TSV files, aiming to unify common fields, improve consistency, and enrich the permissible values for various experimental techniques and biological metadata. These changes are crucial for enhancing the clarity and interoperability of the data model.

Highlights

  • Schema Standardization: The headers and column order in Attributes.tsv and Slots.tsv have been significantly refactored and standardized, consolidating related fields and introducing new ones like 'Local Variable Name'.
  • Attribute and Slot Unification: Attributes and slots previously tied to specific process classes (e.g., CdnaAmplification, CellBarcoding) have been merged into their respective sample classes (e.g., AmplifiedCdna, BarcodedCellSample), streamlining the data model. The cell_label_barcode was unified across dissociated and enriched cell samples.
  • Expanded Value Sets: The ValueSets.tsv file has been updated to include a broader range of permissible values for barcoded_cell_sample_technique and library_technique, reflecting new experimental methods. A comprehensive list of species with NCBI Taxonomy IDs was also added.
  • Terminology Refinement: Age unit terminology in ValueSets.tsv was standardized to singular forms (e.g., 'day' instead of 'days'), and Donor's identifier slot was updated to subject_id with a new UUID and definition.
  • Cross-Reference Introduction: New xref slots have been introduced across various biological entities in Slots.tsv, standardizing how cross-references are handled within the schema.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant refactoring of the schema attributes, primarily within the TSV files that define the LinkML model. The changes focus on unifying attributes under more appropriate classes and standardizing slot names, such as using preparation_date across different process classes. This greatly improves the consistency and maintainability of the schema. Additionally, new enum values have been added, and xref slots are introduced for better data integration. Overall, these are excellent improvements. I've found one minor typo in a slot definition that should be addressed.

puja-trivedi and others added 12 commits December 14, 2025 17:12
…codedCellSample.tag_local_name, LibraryAliquot.fastq_file_alignment_status. Added NIMP ID to slot:DissectionRoiPolygon.name. Added valueset for LibraryAliquot.fastq_file_alignment_status.
…ssName_nhash_id). also removed extra spaces from some cells. Also renamed SpecimenDissectionROI to ROI to match NIMP terminology browser (only changed in NIMP category column not in LinkML class name).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant