Skip to content

Conversation

@yadavchiragg
Copy link

@yadavchiragg yadavchiragg commented Dec 7, 2025

Overview

Implements core database models for BHV following healthcare data management best practices and HIPAA compliance requirements.

Changes

  • User Model (bhv/models/user.py)

    • Role-based access control (patient, social_worker, admin)
    • Email-based authentication support
    • GitHub repository URL storage for GitHub integration
    • Timestamp tracking
  • Image Model (bhv/models/image.py)

    • Support for local and GitHub storage types
    • File hash (SHA-256) for duplicate detection
    • Thumbnail path storage
    • Foreign key relationship to users
    • Upload date indexing for performance
  • Narrative Model (bhv/models/narrative.py)

    • Patient story storage linked to images
    • Track narrative creator (patient or social worker)
    • Last modified timestamp with auto-update
    • Foreign key relationships to both images and users
  • AuditLog Model (bhv/models/audit_log.py)

    • HIPAA compliance audit trail
    • Log all admin actions (view, edit, delete, upload)
    • IP address and user agent tracking
    • Target user and image tracking
    • Timestamp indexing for efficient queries
  • Database Configuration (bhv/database.py)

    • SQLAlchemy setup
    • Database initialization helper
  • Initialization Script (scripts/init_db.py)

    • Create all tables with single command
    • Helpful output for verification
  • Documentation (bhv/models/README.md)

    • Schema documentation
    • Usage instructions
    • Relationship diagrams

Design Decisions

1. Enum Types for Safety

  • Used Python Enums for UserRole and StorageType to prevent invalid values

2. Cascading Deletes

  • When a user is deleted, all their images and narratives are automatically removed
  • Maintains referential integrity

3. Indexing Strategy

  • Email (users) - for login queries
  • Upload date (images) - for timeline views
  • File hash (images) - for duplicate detection
  • Action + timestamp (audit_logs) - for audit queries

4. IPv6 Support

  • IP address field supports both IPv4 and IPv6 (45 characters)

5. Flexible Details Field

  • AuditLog.details as TEXT field to store additional context as JSON

Security Considerations

  • Password hashing support (bcrypt integration in next PR)
  • Audit logging for all administrative actions
  • User data isolation through proper foreign keys
  • IP address tracking for security analysis

Database Schema

users (1) ──── (N) images (1) ──── (N) narratives
  │                │
  │                └──────── (N) audit_logs
  │
  └──────────────────────── (N) audit_logs

Testing

Local Testing Steps

# Install dependencies
pip install Flask==3.0.0 Flask-SQLAlchemy==3.1.1

# Initialize database
python scripts/init_db.py

# Verify tables created
# Check that bhv.db file is created

Test Results

✅ All tables created successfully
✅ Relationships established correctly
✅ No SQLAlchemy errors

Next Steps

This PR establishes the foundation for:

-Authentication system (password hashing, login/logout)

  • Image upload handlers
  • Narrative CRUD operations
  • Admin dashboard with audit log viewer
  • GitHub storage integration

…ess control- Create Image model with local/GitHub storage support- Create Narrative model for patient stories- Create AuditLog model for HIPAA compliance tracking- Add database initialization script- Add comprehensive model documentationFeatures:- Role-based user management (patient, social_worker, admin)- Support for both local and GitHub storage- File hash-based duplicate detection- Audit trail for all admin actions- Proper foreign key relationships and cascading deletes- Database indexes for query performanceSecurity:- Password hash storage (bcrypt integration in next PR)- IP address logging for audit compliance- User data isolation through relationships
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @yadavchiragg, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request establishes the fundamental database schema for the Behavioral Health Vault (BHV) application. It introduces essential models for managing users, images, patient narratives, and audit logs, with a strong emphasis on healthcare data management best practices and HIPAA compliance. The changes lay a robust and secure groundwork for subsequent feature development, ensuring data integrity and traceability.

Highlights

  • Core Database Models Introduced: New SQLAlchemy models for User, Image, Narrative, and AuditLog have been added, forming the foundational data structure for the BHV application.
  • HIPAA Compliance & Security: The AuditLog model is specifically designed for HIPAA compliance, tracking administrative actions, IP addresses, and user agents. User data isolation and password hashing (planned for next PR) are also key considerations.
  • Robust Data Relationships & Integrity: Models include foreign key relationships (e.g., images to users, narratives to images/users) and cascading deletes to maintain referential integrity. Enum types are used for safety in roles and storage types.
  • Database Initialization & Configuration: A dedicated bhv/database.py module sets up SQLAlchemy, and a scripts/init_db.py script allows for easy creation of all database tables.
  • Comprehensive Documentation & Setup: Detailed documentation for the database schema (bhv/models/README.md) and installation instructions (docs/installation.md) have been added, alongside updated contributing guidelines.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@yadavchiragg yadavchiragg changed the base branch from main to dev December 7, 2025 08:49
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly refactors the project's contributing guidelines, making them more concise and beginner-friendly, while also establishing a new database layer using Flask-SQLAlchemy. Key changes include adding *.db to .gitignore, a complete overhaul of CONTRIBUTING.md to simplify instructions and remove detailed templates, and the introduction of bhv/database.py for SQLAlchemy initialization. New model files (user.py, image.py, narrative.py, audit_log.py) define the core database schema, documented in bhv/models/README.md, and are collectively imported via bhv/models/__init__.py. An init_db.py script is added for database setup, alongside a basic installation.md guide and initial test_basic.py tests. The requirements.txt is updated with Flask-SQLAlchemy and related dependencies, though it contains redundant entries. Review comments highlight several areas for improvement: ensuring database-side timestamp generation using server_default=db.func.now() in model definitions (user.py, narrative.py), using an Enum for the action field in AuditLog for type safety, adding a backref to resolve SQLAlchemy relationship ambiguity in audit_log.py, making file_size non-nullable in image.py, and loading the database URI from environment variables in init_db.py. Additionally, the review points out unused imports (datetime in bhv/database.py, model imports in init_db.py), inconsistent formatting in bhv/models/__init__.py, and issues within CONTRIBUTING.md and installation.md regarding incorrect code block formatting and redundant sections.

@yadavchiragg
Copy link
Author

Hey @pradeeban and @mdxabu!

I've implemented the core database models for BHV as discussed. Since the project structure is on hold while you evaluate architecture options, I wanted to contribute something concrete that will be needed regardless of the final directory structure.

This PR establishes the foundational data layer with four main models:

User Model - Role-based access (patient/social worker/admin) with support for the GitHub repo integration you mentioned

Image Model - Handles both local and GitHub storage types, includes SHA-256 hashing for duplicate detection

Narrative Model - Links patient stories to images with proper authorship tracking

AuditLog Model - HIPAA compliance audit trail that logs all admin actions with IP tracking

Coming from a cybersecurity background, I focused on security-first design: proper foreign key constraints, cascading deletes for data integrity, and comprehensive audit logging. The models include indexes on frequently queried fields (email, upload_date, timestamps) for performance.

I've also included an initialization script (scripts/init_db.py) that creates all tables with a single command - keeping with BHV's "minimal deployment" philosophy.

I tested this locally and all tables create successfully. Ready to adapt if you decide on different architectural patterns, but I believe these models capture the core data requirements from the GSoC description.

Looking forward to your feedback!

@mdxabu mdxabu added the on hold Not merging this PR now. label Dec 7, 2025
@mdxabu
Copy link
Member

mdxabu commented Dec 7, 2025

@yadavchiragg, Thanks for the contribution. And we planned to use MongoDB! And fix those bot comments!

yadavchiragg and others added 3 commits December 28, 2025 18:21
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

on hold Not merging this PR now.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants