Skip to content
Inbasekaran Perumal edited this page Jul 27, 2024 · 2 revisions

VisionGuard Architecture

High-Level Architecture

graph TD
    subgraph Client
        UI[User Interface]
        GVD[Real-time Gaze Vector Display]
        GCW[Gaze Calibration Window]
        STW[Ongoing Screen Time Widget]
        NAM[Notification Alert Message Box]
    end

    subgraph Backend
        CL[Core Logic]
        GDM[Gaze Detection Engine]
        GVC[Gaze Vector Calibration]
        EGT[Eye Gaze Time Tracker]
        BNS[Break Notification System]
    end

    subgraph Data
        DS[Data Storage]
        UP[User Preferences]
        CD[Calibration Data]
        UM[Usage Metrics]
    end

    UI -->|User Input| CL
    CL -->|Processed Data| UI
    CL -->|Store/Fetch Data| DS
    DS -->|Send Data| CL

    style UI fill:#f0f9ff,stroke:#0275d8,stroke-width:2px
    style GVD fill:#f0f9ff,stroke:#0275d8,stroke-width:1px
    style GCW fill:#f0f9ff,stroke:#0275d8,stroke-width:1px
    style STW fill:#f0f9ff,stroke:#0275d8,stroke-width:1px
    style NAM fill:#f0f9ff,stroke:#0275d8,stroke-width:1px
    style CL fill:#fff3cd,stroke:#ffb22b,stroke-width:2px
    style GDM fill:#fff3cd,stroke:#ffb22b,stroke-width:1px
    style GVC fill:#fff3cd,stroke:#ffb22b,stroke-width:1px
    style EGT fill:#fff3cd,stroke:#ffb22b,stroke-width:1px
    style BNS fill:#fff3cd,stroke:#ffb22b,stroke-width:1px
    style DS fill:#f2dede,stroke:#d9534f,stroke-width:2px
    style UP fill:#f2dede,stroke:#d9534f,stroke-width:1px
    style CD fill:#f2dede,stroke:#d9534f,stroke-width:1px
    style UM fill:#f2dede,stroke:#d9534f,stroke-width:1px
Loading

Description of Modules

  • Client (UI)

    • User Interface (UI): The main interface through which users interact with the application.
    • Real-time Gaze Vector Display (GVD): Displays the user's real-time gaze vector.
    • Gaze Calibration Window (GCW): Window for calibrating the gaze detection system.
    • Ongoing Screen Time Widget (STW): Widget showing the ongoing screen time.
    • Notification Alert Message Box (NAM): Box for displaying notifications and alerts.
  • Backend

    • Core Logic (CL): Central module that processes user input and manages the flow of data.
    • Gaze Detection Engine (GDM): Engine responsible for detecting and processing gaze vectors.
    • Gaze Vector Calibration (GVC): Handles the calibration of gaze vectors.
    • Eye Gaze Time Tracker (EGT): Tracks the amount of time the user's gaze is on the screen.
    • Break Notification System (BNS): Manages notifications to prompt the user to take breaks.
  • Data

    • Data Storage (DS): Central repository for storing all data.
    • User Preferences (UP): Stores user-specific preferences and settings.
    • Calibration Data (CD): Holds data related to the calibration of the gaze detection system.
    • Usage Metrics (UM): Keeps track of usage statistics and metrics.

VisionGuard Backend

The VisionGuard Backend leverages the OpenVINO model zoo to estimate a user's gaze and calculate the accumulated screen gaze time. The following networks are integral to the backend:

Face Detection Model

This model identifies the locations of faces within an image. You can choose from the following networks:

Head Pose Estimation Model

This model estimates the head pose in Tait-Bryan angles. It outputs yaw, pitch, and roll angles in degrees, which serve as inputs for the gaze estimation model. The following network can be used:

Facial Landmark Detection Model

This model estimates the coordinates of facial landmarks for detected faces. Keypoints at the corners of the eyes are used to locate the eye regions required for the gaze estimation model. You can choose from:

Eye State Estimation Model

This model determines the open or closed state of the eyes in detected faces. The following model can be used:

Gaze Estimation Model

This model takes three inputs: square crops of the left and right eye images, and three head pose angles (yaw, pitch, and roll). It outputs a 3-D vector representing the direction of a person’s gaze in a Cartesian coordinate system. The following network is used:

Pipeline Diagram

graph TD
A[Image Input] --> B[Face Detection]
B --> |Face Image| C[Facial Landmark Detection]
B --> |Face Image| D[Head Pose Estimation]
C --> E[Eye State Estimation]
D --> |Head Pose Angles| F[Gaze Estimation]
C --> |Eye Image| F
E --> |Eye State| F
F --> |Gaze Vector|G[Gaze Time Estimation]
G -->  H[Accumulate Screen Gaze Time]

%% Styling
style B fill:#FFDDC1,stroke:#333,stroke-width:2px
style C fill:#FFDDC1,stroke:#333,stroke-width:2px
style D fill:#FFDDC1,stroke:#333,stroke-width:2px
style E fill:#FFDDC1,stroke:#333,stroke-width:2px
style F fill:#FFDDC1,stroke:#333,stroke-width:2px
style G fill:#FFDDC1,stroke:#333,stroke-width:2px

style A fill:#C1E1FF,stroke:#333,stroke-width:2px
style H fill:#C1E1FF,stroke:#333,stroke-width:2px
Loading
Clone this wiki locally