Skip to content

Comments

adds dataframer doc#143

Merged
bwalsh merged 5 commits intodevelopmentfrom
feature/dataframer-doc
Aug 20, 2025
Merged

adds dataframer doc#143
bwalsh merged 5 commits intodevelopmentfrom
feature/dataframer-doc

Conversation

@bwalsh
Copy link
Collaborator

@bwalsh bwalsh commented Aug 19, 2025

Description

This PR adds comprehensive documentation for gen3_tracker/meta/dataframer.py and aligns the public field map to the simplified_resources unit-test fixture. It also includes a Mermaid sequence diagram covering ingestion/upsert and the analytical read path (flattened_procedures()).

Included docs (new):

  • docs/dataframer/dataframer_documentation.md — What dataframer.py is for, key components, entities deep-dive, notable behaviors.
  • docs/dataframer/Appendix_Field_Map_Aligned.md — Appendix field map aligned to tests/unit/dataframer/test_dataframer.py’s simplified_resources.
  • docs/dataframer/dataframer_sequence_diagram.md — Mermaid sequence diagram (copy/paste-ready for GitHub/Docs).

Improvements (enables tests without gen3-client-config.ini):

file comment
gen3_tracker/cli.py Improves profile loading with error handling and removes help check logic
gen3_tracker/config/init.py Fixes error message to use config.gen3.profile instead of undefined profile variable
gen3_tracker/git/cli.py Optimizes authentication initialization to only occur when not in dry-run mode

Motivation and Context

  • Clarifies how dataframer.py ingests, upserts, and flattens FHIR resources for analytics.
  • Removes ambiguity between the documentation field map and the actual keys asserted in the test fixture (simplified_resources), reducing onboarding friction and preventing future regressions.
  • Provides an architecture-level sequence diagram to aid code reviews and future contributions.

How Has This Been Tested?

  • Cross-checked the appendix table against the fixture in tests/unit/dataframer/test_dataframer.py (simplified_resources) to ensure all mapped keys match (e.g., single identifier string, flattened DocumentReference attachment fields like md5, source_path, contentType, size, url, title, creation, and Observation attributes such as effectiveDateTime, valueCodeableConcept, sequencer, Gene, etc.).
  • Verified that the Mermaid diagram renders in GitHub-flavored Markdown preview.
  • Ran unit tests locally for the dataframer module to confirm no behavioral impact from documentation changes.

Environment: Python 3.11, SQLite (system default), macOS/Linux; ran pytest -q tests/unit/dataframer/.

Types of Changes

  • New feature (non-breaking change which adds functionality) — Documentation only
  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • I have updated the documentation accordingly (see new files under docs/dataframer/).
  • I have tested that this feature locally (doc build/preview and unit tests).
  • I have added tests to cover my changes. (N/A — documentation-only)
  • All new and existing tests passed. (Local)
  • Reviewer has tested this feature locally

Related files/refs:

  • Tests: tests/unit/dataframer/test_dataframer.py (fixture: simplified_resources)
  • Docs: docs/dataframer/dataframer_documentation.md, docs/dataframer/Appendix_Field_Map_Aligned.md, docs/dataframer/dataframer_sequence_diagram.md

@bwalsh bwalsh requested a review from Copilot August 19, 2025 16:55

This comment was marked as outdated.

This comment was marked as outdated.

bwalsh and others added 2 commits August 20, 2025 10:43
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds comprehensive documentation for the dataframer.py module, including detailed explanations of its components, a field map aligned to test fixtures, and a sequence diagram. The PR also includes several minor code improvements related to configuration handling and workflow organization.

  • Adds new documentation files explaining the dataframer module's functionality and architecture
  • Fixes configuration profile handling to improve error resilience
  • Optimizes authentication initialization in CLI push command to avoid unnecessary calls during dry runs

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
docs/dataframer.md New comprehensive documentation for dataframer module with field mapping
gen3_tracker/cli.py Improves profile loading with error handling and removes help check logic
gen3_tracker/config/init.py Fixes error message to use config.gen3.profile instead of undefined profile variable
gen3_tracker/git/cli.py Optimizes authentication initialization to only occur when not in dry-run mode
tests/unit/test_mime_type.py Removes trailing whitespace
.github/workflows/*.yaml Updates workflow configurations and adds git setup for unit tests

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@lbeckman314 lbeckman314 added the documentation Improvements or additions to documentation label Aug 20, 2025
@lbeckman314
Copy link
Contributor

lbeckman314 commented Aug 20, 2025

Review Steps

1. Install feature/dataframer-doc ✔️

➜ gh pr checkout 143
branch 'feature/dataframer-doc' set up to track 'upstream/feature/dataframer-doc'.
Switched to a new branch 'feature/dataframer-doc'

➜ python3 -m venv venv && source venv/bin/activate

➜ pip install -r requirements.txt

➜ pip install -r requirements-dev.txt

2. Configure Credentials (CALYPR-prod ✔️

# Download new credentials from https://calypr.ohsu.edu/Profile

➜ mv ~/Downloads/credentials.json ~/.gen3/credentials.json.calypr-prod

➜ gen3-client configure --profile=calypr-prod --cred=~/.gen3/credentials.json.calypr-prod --apiendpoint=https://calypr.ohsu.edu
Profile 'calypr-prod' has been configured successfully.

➜ export G3T_PROFILE=calypr-prod

➜ g3t ping
msg: 'Configuration OK: Connected using profile:calypr-prod'
endpoint: https://calypr.ohsu.edu
username: beckmanl@ohsu.edu
bucket_programs:
  cbds: cbds
  gdcdata: gdc
  get3test: fortera
  smmart-caliper-dev: smmart
your_access:
  ...

3. Run All Tests ⚠️

Note

Initially tried running all tests but encountered hanging test in test_end_to_end_workflow.py

This can be marked as expected behavior due to ongoing work with updating the Integration Tests:

Cleaning up GitHub Actions workflows :shipit: #123

➜ pytest tests/ --cov=gen3_util

=============== test session starts ===============
platform darwin -- Python 3.13.3, pytest-8.4.1, pluggy-1.6.0                                                                                                 rootdir: /Users/beckmanl/code/gen3_util
configfile: pytest.ini
plugins: anyio-4.10.0, cov-6.2.1
collected 39 items
tests/integration/test_bucket_import.py .  
tests/integration/test_bundle.py ..   
tests/integration/test_end_to_end_workflow.py ..^C   <---- Canceled hanging test ⚠️

4. Run Dataframer Tests ✔️

➜ pytest --cov=gen3_util tests/unit/dataframer/test_dataframer.py

pytest tests/unit/dataframer/test_dataframer.py
=============== test session starts ===============
platform darwin -- Python 3.13.3, pytest-8.4.1, pluggy-1.6.0
rootdir: /Users/beckmanl/code/gen3_util
configfile: pytest.ini
plugins: anyio-4.10.0, cov-6.2.1
collected 5 items

tests/unit/dataframer/test_dataframer.py .....                                                                                                        [100%]

=============== 5 passed in 0.51s ===============

Copy link
Contributor

@lbeckman314 lbeckman314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — great doc addition and dataframer tests passing ✔️

Review Steps

Small issue with a end-to-end test in test_end_to_end_workflow.py hanging (I might have been being too inpatient and not given it enough time to complete) but doesn't block dataframer tests or functionality. Approved!

@bwalsh
Copy link
Collaborator Author

bwalsh commented Aug 20, 2025

Small issue with a end-to-end test in test_end_to_end_workflow.py hanging (I might have been being too inpatient and not given it enough time to complete) but doesn't block dataframer tests or functionality. Approved!

Integration tests not expected to run. Only unit tests

@bwalsh bwalsh merged commit c2930af into development Aug 20, 2025
1 check passed
@bwalsh bwalsh deleted the feature/dataframer-doc branch August 20, 2025 20:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants