Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions CALYPR_DATAFRAMER.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Calypr Dataframer Repository

This directory contains the new `@calypr/dataframer` package extracted from the gen3_util repository. The package focuses exclusively on the metadata dataframe generation functionality.

## Repository Structure

```
calypr_dataframer/
├── __init__.py # Package initialization
├── cli.py # Command line interface
├── dataframer.py # Core dataframe generation logic
└── entities.py # FHIR resource simplification utilities

tests/
├── __init__.py
├── test_dataframer.py # Tests for core functionality
└── test_entities.py # Tests for entity utilities

setup.py # Package setup configuration
pyproject.toml # Modern Python packaging configuration
requirements.txt # Core dependencies
README.md # Package documentation
LICENSE # MIT license
.gitignore # Git ignore patterns
```

## Key Features Extracted

1. **LocalFHIRDatabase**: SQLite-based local FHIR data processing
2. **create_dataframe()**: Main function for generating dataframes from FHIR metadata
3. **SimplifiedResource**: FHIR resource flattening and normalization
4. **CLI Interface**: Command-line tool with dataframe generation command
5. **Multiple Resource Support**: DocumentReference, ResearchSubject, MedicationAdministration, Specimen, GroupMember

## What Was Removed

- All gen3-specific functionality (projects, collaborators, git operations)
- Gen3 client dependencies
- Complex configuration management
- Non-dataframe related CLI commands
- Gen3-specific authentication and profile management

## Dependencies Simplified

The new package has minimal dependencies:
- pandas, numpy (data processing)
- click (CLI)
- pydantic (data validation)
- ndjson, inflection, deepmerge (data processing utilities)

## Usage

```bash
# Install the package
pip install -e .

# Generate dataframe
calypr-dataframer dataframe DocumentReference ./META

# Interactive exploration
calypr-dataframer dataframe --dtale Specimen ./META
```

This creates a focused, lightweight tool specifically for FHIR metadata dataframe generation.
91 changes: 91 additions & 0 deletions DEMO.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# Calypr Dataframer Demo

This demo shows how the calypr_dataframer package would be used once dependencies are installed.

## Package Installation

```bash
# Install the package in development mode
pip install -e .

# Or install with all dependencies
pip install -r requirements.txt
```

## Sample Usage

### Command Line Interface

```bash
# Generate DocumentReference dataframe
calypr-dataframer dataframe DocumentReference ./META

# Generate with custom output
calypr-dataframer dataframe Specimen ./META specimens.csv

# Interactive exploration
calypr-dataframer dataframe ResearchSubject ./META --dtale

# Show help
calypr-dataframer --help
calypr-dataframer dataframe --help
```

### Python API

```python
import tempfile
from calypr_dataframer.dataframer import create_dataframe

# Create dataframe from FHIR metadata
with tempfile.TemporaryDirectory() as work_dir:
df = create_dataframe(
directory_path="./META",
work_path=work_dir,
data_type="DocumentReference"
)

print(f"Generated dataframe with {len(df)} rows and {len(df.columns)} columns")
print(f"Columns: {list(df.columns)}")

# Save to CSV
df.to_csv("output.csv", index=False)
```

### Expected Directory Structure

```
./META/
├── DocumentReference.ndjson
├── ResearchSubject.ndjson
├── Specimen.ndjson
├── Patient.ndjson
├── MedicationAdministration.ndjson
└── Group.ndjson
```

### Supported Data Types

- `DocumentReference` - Document metadata with linked observations
- `ResearchSubject` - Research participants with patient data
- `MedicationAdministration` - Medication events with patient context
- `Specimen` - Biological specimens with patient source
- `GroupMember` - Group membership relationships

## Key Features Demonstrated

1. **Resource Flattening**: Converts nested FHIR to flat tables
2. **Reference Resolution**: Links Patient data to other resources
3. **Extension Processing**: Extracts FHIR extensions as columns
4. **Coding Normalization**: Standardizes coded values
5. **Column Optimization**: Reorders columns for readability

## Example Output

A DocumentReference dataframe might include columns like:
- `identifier`, `resourceType`, `patient_id`
- `status`, `type`, `category`
- `patient_name`, `patient_birthDate`
- `subject`, `id`

The package focuses exclusively on dataframe generation, making it lightweight and purpose-built for FHIR metadata analysis.
4 changes: 2 additions & 2 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2023 ACED-IDP
Copyright (c) 2024 Calypr Team

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand All @@ -18,4 +18,4 @@ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
SOFTWARE.
183 changes: 144 additions & 39 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,62 +1,167 @@
# Calypr Dataframer

![](docs/gen3_tracker-logo.png)
# Gen3 Tracker
A specialized tool for generating dataframes from FHIR metadata. This package extracts the dataframe functionality from the gen3_util repository and provides a focused, lightweight solution for FHIR metadata processing.

Utilities to manage Gen3 schemas, projects and submissions.
## Features

## Quick Start
### Installation
- Generate structured dataframes from FHIR metadata
- Support for multiple FHIR resource types:
- DocumentReference
- ResearchSubject
- MedicationAdministration
- Specimen
- GroupMember
- Interactive data exploration with dtale (optional)
- CSV export functionality
- Built-in FHIR resource flattening and normalization

## Installation

```bash
pip install calypr-dataframer
```

$ pip install gen3_tracker
### Optional: Interactive Data Exploration

$ g3t version
version: 0.0.1
For interactive data exploration capabilities:

```
#### Optional: install the dtale package for interactive data exploration
* called from `g3t meta dataframe --dtale`
```
pip install g3t[dtale].
```bash
pip install calypr-dataframer[dtale]
```

## Quick Start

### Basic Usage

```bash
# Generate a CSV dataframe from DocumentReference resources
calypr-dataframer dataframe DocumentReference ./META

### Use
# Generate a CSV with custom output path
calypr-dataframer dataframe Specimen ./META my_specimens.csv

# Interactive exploration with dtale
calypr-dataframer dataframe ResearchSubject ./META --dtale
```
$ g3t --help
Usage: g3t [OPTIONS] COMMAND [ARGS]...

Gen3 Tracker: manage FHIR metadata and files.
### Directory Structure

Options:
--format [yaml|json|text] Result format. G3T_FORMAT [default: yaml]
--profile TEXT Connection name. G3T_PROFILE See
https://bit.ly/3NbKGi4
The tool expects FHIR metadata files in NDJSON format:

--version
--help Show this message and exit.
```
META/
├── DocumentReference.ndjson
├── ResearchSubject.ndjson
├── Specimen.ndjson
├── Patient.ndjson
└── ...
```

Commands:
init Initialize a new repository.
add Update references to data files to the repository.
status Show changed files.
push Push changes to the remote repository.
pull Fetch from and integrate with a remote repository.
clone Clone a repository into a new directory
ls List files in the repository.
rm Remove a single file from the server index, and MANIFEST.
ping Verify gen3-client and test connectivity.
meta Manage the META directory.
collaborator Manage project membership.
projects Manage Gen3 projects.
### Command Line Interface

```bash
calypr-dataframer dataframe --help
```

**Arguments:**
- `DATA_TYPE`: The type of FHIR resource to process (required)
- Options: Specimen, DocumentReference, ResearchSubject, MedicationAdministration, GroupMember
- `DIRECTORY_PATH`: Path to metadata directory (default: ./META)
- `OUTPUT_PATH`: Output CSV file path (default: {DATA_TYPE}.csv)

**Options:**
- `--dtale`: Launch interactive data exploration in browser
- `--debug`: Enable debug mode for troubleshooting

## Python API

```python
from calypr_dataframer.dataframer import create_dataframe
import tempfile

# Create dataframe from FHIR metadata
with tempfile.TemporaryDirectory() as temp_dir:
df = create_dataframe(
directory_path="./META",
work_path=temp_dir,
data_type="DocumentReference"
)

print(df.head())
df.to_csv("output.csv", index=False)
```

## Supported FHIR Resources

### DocumentReference
- Flattens document metadata
- Includes associated Observation resources
- Links to subject Patient data

### ResearchSubject
- Research study participant information
- Linked Patient demographics
- Enrollment details

### MedicationAdministration
- Medication administration events
- Patient linkage
- Dosage and timing information

### Specimen
- Biological specimen metadata
- Patient source information
- Collection and processing details

### GroupMember
- Group membership relationships
- Entity references
- Active/inactive status

## Data Processing Features

- **Resource Flattening**: Converts nested FHIR structures to flat tabular format
- **Reference Resolution**: Automatically resolves Patient references
- **Extension Handling**: Extracts and normalizes FHIR extensions
- **Coding Normalization**: Standardizes coded values and displays
- **Column Reordering**: Optimizes column order for better readability

## Requirements

- Python 3.8+
- pandas
- numpy
- click
- pydantic
- ndjson
- inflection
- deepmerge

## Development

### Setup Development Environment

```bash
git clone https://github.com/calypr/dataframer
cd dataframer
pip install -r requirements.txt
pip install -e .
```

## User Guide
* See [use cases and documentation](https://aced-idp.github.io/)
### Running Tests

```bash
pytest tests/
```

## Contributing
* See [CONTRIBUTING.md](CONTRIBUTING.md)

Contributions are welcome! Please feel free to submit a Pull Request.

## License

This project is licensed under the MIT License - see the LICENSE file for details.

## Origins

This package extracts and focuses the dataframe functionality from the [gen3_util](https://github.com/ACED-IDP/gen3_util) repository, providing a lightweight, specialized tool for FHIR metadata processing.
Loading
Loading