Skip to content

Merge the recent pull requests from dev into main.#80

Merged
pradeeban merged 155 commits intomainfrom
dev
Feb 4, 2026
Merged

Merge the recent pull requests from dev into main.#80
pradeeban merged 155 commits intomainfrom
dev

Conversation

@pradeeban
Copy link
Member

No description provided.

KrishanYadav333 and others added 30 commits December 8, 2025 08:30
…verview

docs: Add location-proximity module documentation and research foundation
docs: Add system architecture diagram + test plan outline
- Implemented modular validation system for DREAMS multimodal data
- Schema validator: JSON structure validation with optional jsonschema
- Path validator: Media file existence checks with remote URL support (http/https/s3/ftp)
- Temporal validator: Timestamp ordering with millisecond support
- Reporter: Unified error reporting with JSON serialization for CI/CD
- CLI: Full argparse interface with --json flag for pipeline integration
- Examples: Sample data with intentional errors for testing
- Documentation: Complete README with usage examples

Features:
- Non-invasive: Optional, read-only validation
- Extensible: Easy to add new validators
- CI/CD ready: JSON output mode for automated pipelines
- Edge cases: Remote URLs skipped, millisecond timestamps supported
- Exit codes: 0 for success, 1 for errors
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Implement Phase-1:Data Integrity Layer for Multimodal Temporal Validation in DREAMS
- Add immutable EmotionEvent and EmotionTimeline data models
- Enforce chronological ordering with __post_init__ validation
- Implement timeline builder for constructing timelines from records
- Add lightweight temporal utilities (ordering checks, time gaps)
- Add JSON export functionality for visualization
- Include comprehensive unit tests for immutability and structure
- Move emotion models to dedicated analytics module
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
anish1206 and others added 27 commits January 30, 2026 12:08
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…ing.

Add audit trail, stale-lock/processing cleanup, improved logging and tests.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
feat: Implement self correcting CHIME model and federated learning
@pradeeban pradeeban merged commit e746e6a into main Feb 4, 2026
1 check passed
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @pradeeban, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the DREAMS project by introducing critical new features and architectural improvements. It establishes a robust data integrity validation layer to maintain data quality, integrates a sophisticated multi-dimensional location-proximity analysis module to uncover deeper insights into emotional geography, and implements a federated learning system for continuous, self-correcting improvement of the CHIME emotion classification model. These additions collectively aim to provide a more reliable, insightful, and adaptive platform for analyzing personal recovery journeys through digitized memories.

Highlights

  • New Data Integrity Module: Introduced a data_integrity module to validate multimodal time-series data, ensuring structural correctness, media file existence, and temporal consistency before analysis. This module is non-invasive, read-only, and extensible.
  • Multi-Dimensional Location-Proximity Analysis: Added a location_proximity module that analyzes location similarity beyond just GPS coordinates, incorporating categorical, linguistic, and cultural dimensions. This enables deeper insights into how places influence emotional patterns in recovery journeys.
  • Federated Learning for CHIME Model: Implemented a Federated Learning (FL) system for the CHIME emotion classification model. This allows the model to self-correct and improve over time based on user feedback, featuring privacy-preserving training, validation gates to prevent degradation, and atomic model updates.
  • Architectural & Codebase Refinements: Refactored the sentiment analysis logic into a class-based SentimentAnalyzer for better model management and lazy loading. Integrated CHIME classification and aspect-based sentiment analysis into the ingestion pipeline. Updated core application logic to support MongoDB integration for data storage.
  • Comprehensive Documentation & Testing: Added extensive documentation including a high-level ARCHITECTURE.md, a detailed LOCATION_PROXIMITY_SUMMARY.md, and a federated-learning.md guide. New unit and end-to-end tests were introduced for the FL system, location proximity, and temporal narrative graph components.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • .gitignore
    • Added entries for virtual environments (venv310/, venv/).
    • Added entries for Federated Learning model artifacts (dreamsApp/app/models/production_chime_model/, dreamsApp/app/models/temp_training_artifact/).
  • .vscode/settings.json
    • Added VS Code settings for Python terminal environment and project management.
  • ARCHITECTURE.md
    • Added a new high-level architecture overview document for DREAMS, detailing system architecture, processing pipeline, data storage, and visualization layers.
    • Includes flowcharts for Location-Proximity Pipeline and Semantic Clustering Workflow.
    • Outlines component integration points, data flow, scalability, security, technology stack, and development phases.
  • LOCATION_PROXIMITY_SUMMARY.md
    • Added a new detailed summary document for the Location-Proximity Analysis Extension.
    • Describes its overview, module location, quick demo, key features (multi-dimensional proximity, emotion-location mapping, semantic clustering), research questions addressed, use cases, integration with DREAMS, metrics, research contribution, dependencies, testing, and future enhancements.
  • README.md
    • Updated "Current Progress" section (removed checkmarks).
    • Updated "Repository Structure" to reflect new modules (data_integrity/, location_proximity/, dream-integration/).
    • Updated Flask run command to use flask --app "dreamsApp.app:create_app()" run --debug.
  • data_integrity/README.md
    • Added a new README for the Data Integrity Layer, explaining its purpose, quick start, architecture, validation checks (schema, path, temporal), exit codes, options, design philosophy, dependencies, testing, and integration examples.
  • data_integrity/init.py
    • Added package initialization with a docstring and version.
  • data_integrity/main.py
    • Added entry point to allow running data_integrity as a module.
  • data_integrity/examples/invalid_schema_data.json
    • Added example JSON data with a missing required timestamp field for schema validation testing.
  • data_integrity/examples/millisecond_timestamps.json
    • Added example JSON data with mixed timestamp formats (milliseconds, seconds, ISO 8601) for temporal validation testing.
  • data_integrity/examples/remote_urls_data.json
    • Added example JSON data with remote URLs for path validation testing (to be skipped).
  • data_integrity/examples/sample_data.json
    • Added example JSON data with intentional errors (out-of-order timestamps, future timestamps, missing media files) for validation testing.
  • data_integrity/examples/sample_schema.json
    • Added example JSON schema for validating DREAMS multimodal time-series data.
  • data_integrity/examples/valid_data.json
    • Added example JSON data that is valid against the sample schema.
  • data_integrity/path_validator.py
    • Added module for validating media file paths, including skipping remote URLs and handling non-existent files.
  • data_integrity/reporter.py
    • Added module for unified error reporting with ValidationIssue and ValidationReport classes.
  • data_integrity/schema_validator.py
    • Added module for JSON Schema validation, with graceful fallback if jsonschema is not installed.
  • data_integrity/temporal_validator.py
    • Added module for temporal consistency validation, checking for future and out-of-order timestamps.
  • data_integrity/validator.py
    • Added CLI entry point for the Data Integrity Validator, orchestrating schema, path, and temporal validations.
  • docs/TEST_PLAN.md
    • Added a comprehensive testing strategy and validation plan for DREAMS, including overall strategy, functional/performance/security/usability validation, module-specific testing, CI pipeline, risk mitigation, and detailed test plans for the Location-Proximity module.
  • dream-integration/.gitignore
    • Added venv310/ to ignore virtual environment.
  • dream-integration/app/.env.example
    • Added example .env file for MongoDB connection string.
  • dream-integration/app/.gitignore
    • Added .env to ignore environment file.
  • dream-integration/app/app.py
    • Modified to integrate with MongoDB for user, sample, and analysis results storage.
    • Updated list_persons, list_samples, read_text, read_scores functions to use MongoDB.
    • Changed media serving from local filesystem to GridFS.
    • Removed local file system search for image, transcript, description, and audio.
    • Updated analyze route to save analysis results to MongoDB.
  • dream-integration/app/db.py
    • Added new module for MongoDB connection and collection setup using pymongo and GridFS.
  • dream-integration/contributing.md
    • Updated setup and usage instructions to reflect changes in app.py and general project structure.
  • dream-integration/data/person-01/sample-01/analysis/image_scores.json
    • Added example image analysis scores.
  • dream-integration/data/person-01/sample-01/analysis/text_scores.json
    • Added example text analysis scores.
  • dream-integration/script.py
    • Added a migration script to move local filesystem data (images, audio, text, analysis results) into MongoDB.
  • dreamsApp/analytics/emotion_episode.py
    • Added new module defining the Episode dataclass for representing a segment of emotional events.
  • dreamsApp/analytics/emotion_proximity.py
    • Added new module for time-aware emotion proximity, including mapping emotion labels, segmenting timelines into windows, aggregating scores, and comparing timelines.
  • dreamsApp/analytics/emotion_segmentation.py
    • Added new module for temporal segmentation, defining TimeWindow and functions for fixed-window and gap-based timeline segmentation.
  • dreamsApp/analytics/emotion_timeline.py
    • Added new module defining EmotionEvent and EmotionTimeline dataclasses for immutable, chronologically-ordered emotion data.
  • dreamsApp/analytics/episode_proximity.py
    • Added new module for episode proximity, including functions to compute temporal overlap, gap, and classify proximity relations (overlapping, adjacent, disjoint).
  • dreamsApp/analytics/episode_segmentation.py
    • Added new module to segment an EmotionTimeline into Episode objects based on gaps.
  • dreamsApp/analytics/temporal_narrative_graph.py
    • Added new module for building a TemporalNarrativeGraph from episodes, defining NarrativeEdge and graph construction logic.
  • dreamsApp/app/init.py
    • Updated blueprint import for auth module to relative import.
  • dreamsApp/app/auth.py
    • Updated User model import to relative import.
  • dreamsApp/app/builder.py
    • Added new module with build_emotion_timeline function to construct EmotionTimeline objects from records.
  • dreamsApp/app/dashboard/main.py
    • Refactored generate_wordcloud_b64 into a helper function.
    • Added CHIME Radar Chart visualization.
    • Implemented a new /correct_chime endpoint for user feedback on CHIME predictions, including rate limiting, atomic updates, and triggering federated learning.
    • Added _maybe_trigger_fl_training function for background FL training.
    • Updated profile route to pass latest_post for feedback.
  • dreamsApp/app/exporters.py
    • Added new module for exporting EmotionTimeline data to CSV rows and summary formats.
  • dreamsApp/app/fl_worker.py
    • Added new module for Federated Learning, including model loading, training loop, validation gate (anchor check, improvement check), atomic model swap, and database updates.
  • dreamsApp/app/ingestion/routes.py
    • Updated imports to relative.
    • Integrated extract_gps_from_image to extract location data.
    • Integrated get_chime_category and select_text_for_analysis for CHIME analysis.
    • Added chime_analysis and location fields to the stored post document.
  • dreamsApp/app/templates/dashboard/profile.html
    • Adjusted image max-height.
    • Added CHIME Radar Chart display.
    • Added a section for "Latest Entry Analysis" with user feedback mechanism for CHIME predictions (accept/edit).
    • Added JavaScript functions (toggleEdit, acceptPrediction, submitCorrection, submitCorrectionData) for handling user corrections.
  • dreamsApp/app/timeline_utils.py
    • Added a deprecation notice for timeline utilities, directing users to EmotionTimeline class methods.
  • dreamsApp/app/utils/keywords.py
    • Changed spacy model from en_core_web_lg to en_core_web_sm.
  • dreamsApp/app/utils/llms.py
    • Updated google.genai import.
  • dreamsApp/app/utils/location_extractor.py
    • Added new module for extracting GPS data from image EXIF metadata.
  • dreamsApp/app/utils/logger.py
    • Added new module for setting up a production-ready logger with file and console output.
  • dreamsApp/app/utils/sentiment.py
    • Refactored sentiment analysis logic into a SentimentAnalyzer class for better model management and lazy loading.
    • Integrated CHIME classification (get_chime_category) and Aspect-Based Sentiment Analysis (get_aspect_sentiment).
    • Added logic to load a locally fine-tuned CHIME model if available, falling back to Hugging Face.
    • Updated analyze_sentiment endpoint to include aspect and CHIME analysis.
  • dreamsApp/docs/federated-learning.md
    • Added new documentation detailing the Federated Learning implementation for the self-correcting CHIME model, including overview, architecture, how it works, file structure, configuration, logging, testing, safety mechanisms, future enhancements, troubleshooting, and API reference.
  • dreamsApp/location_proximity.py
    • Added a new module with stub functions for location proximity analysis, including Location and ProximityResult TypedDicts, and functions for extracting, computing, clustering, calculating distance, validating, and finding nearby locations.
  • location_proximity/README.md
    • Added a new README for the Location-Proximity Analysis Module, detailing its core concept (multi-dimensional proximity), research background, components, quick start, research applications, integration with DREAMS, dependencies, future enhancements, research contribution, citation, and contributing guidelines.
  • location_proximity/RESEARCH.md
    • Added a new research document outlining the theoretical and methodological foundation for location-emotion proximity analysis, covering research problem, theoretical framework, methodology, expected findings, applications, limitations, literature review, contribution, evaluation plan, and ethical considerations.
  • ml_experiments_anish/experiment1_chime_text_overfitting/EXP-001_overfitting_reduction.ipynb
    • Added a Jupyter Notebook for an ML experiment to reduce overfitting in CHIME text classification, comparing BERT and DistilBERT models with regularization.
  • tests/data/README.md
    • Added a README for the test data, describing locations.json, sentiments.json, and expected_results.json.
  • tests/data/expected_results.json
    • Added expected results for proximity, clustering, and emotion patterns for location-proximity testing.
  • tests/data/locations.json
    • Added synthetic location data for testing location-proximity analysis.
  • tests/data/sentiments.json
    • Added synthetic sentiment data for testing location-proximity analysis.
  • tests/test_chime.py
    • Added new unit tests for CHIME analysis, mocking the Hugging Face pipeline.
  • tests/test_fl.py
    • Added new end-to-end test for the Federated Learning loop, including mock data creation, worker execution, and verification of database and model updates.
  • tests/test_location_proximity.py
    • Added new unit tests for the location_proximity module, testing the stub functions.
  • tests/test_sentiment.py
    • Modified existing sentiment tests to use a mock for get_image_caption_and_sentiment and get_aspect_sentiment.
    • Updated tests to reflect changes in API (e.g., image_path_or_url is now required, empty caption is allowed).
    • Added tests for aspect sentiment integration.
  • tests/test_temporal_narrative_graph.py
    • Added new unit tests for Episode, EpisodeSegmentation, EpisodeProximity, NarrativeEdge, and TemporalNarrativeGraph classes.
  • tests/test_time_aware_proximity.py
    • Added new unit tests for align_timelines_by_window, temporal_distance, and proximity_matrix functions.
  • tests/test_timeline.py
    • Added new unit tests for EmotionEvent and EmotionTimeline immutability, chronological ordering, and time_gaps method.
    • Added tests for build_emotion_timeline function.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/data-integrity.yml
Activity
  • The pull request introduces several new features and extensive documentation.
  • The author pradeeban has merged recent pull requests from the dev branch into main, indicating a consolidation of new developments.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is a substantial pull request that introduces several major new features and refactors, including a robust federated learning system for self-correcting models, a comprehensive data integrity validation module, and a suite of well-designed temporal analytics components. The new features are well-documented with architecture diagrams and test plans, which is excellent. The implementation of the federated learning worker, with its atomic locking and validation gates, is particularly well-designed for safety and concurrency.

I've found two issues: one critical syntax error in the new location extractor that will prevent the application from running, and one minor code redundancy in an integration script. My feedback is focused on fixing these points.

Comment on lines +31 to +34
def to_degrees(val):
if not (isinstance(val, (tuple, list)) and len(val) == 3):
def to_degrees(val):
return sum((c[0]/c[1] if isinstance(c, tuple) else float(c)) / 60**i for i, c in enumerate(val))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This block contains a SyntaxError. The function to_degrees is defined twice, and the if statement on line 32 is incomplete. This will prevent the module from being imported and cause the application to crash.

Suggested change
def to_degrees(val):
if not (isinstance(val, (tuple, list)) and len(val) == 3):
def to_degrees(val):
return sum((c[0]/c[1] if isinstance(c, tuple) else float(c)) / 60**i for i, c in enumerate(val))
def to_degrees(val):
"""Converts GPS coordinates from DMS to decimal degrees."""
if not isinstance(val, (tuple, list)) or len(val) != 3:
raise ValueError(f"Invalid GPS coordinate format: {val}")
return sum(float(c) / 60**i for i, c in enumerate(val))

Comment on lines +122 to +130
transcript_path = None
for pattern in ("transcript*.txt", "clip-*.txt"):
matches = glob.glob(os.path.join(sample_dir, pattern))
if matches:
transcript_path = matches[0]
break

description_matches = glob.glob(os.path.join(sample_dir, "description*.txt"))
description_path = description_matches[0] if description_matches else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block of code for finding transcript_path and description_path is redundant. The correct paths are already found on lines 119 and 120 using the _find_first_file helper. This block can be removed to make the code more concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants