Skip to content

Technical-Mavle/DataIngestionFrontend_SAGAR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SAGAR Data Ingestion Portal

A modern, interactive web application for uploading and ingesting datasets into the SAGAR data lakehouse. Built with React, Vite, and Supabase, featuring a beautiful glassmorphism UI with an animated 3D globe background, integrated with the proprietary SAGAR-QC quality control system for automated data validation and comprehensive quality reporting.

πŸ“‹ Table of Contents

🎯 Overview

The SAGAR Data Ingestion Portal is a secure, user-friendly interface for the CMLRE (Centre for Marine Living Resources and Ecology) to upload datasets. The application provides:

  • Secure Authentication: Simple username/password login with session persistence
  • File Upload: Drag-and-drop or browse file selection
  • Real-time Processing: Visual feedback during data ingestion
  • Automated Quality Control: Proprietary SAGAR-QC module with intelligent test selection
  • Comprehensive Quality Reports: Interactive JSON reports and formal PDF downloads
  • Automated Pipeline: Automatic triggering of backend ingestion services
  • Modern UI: Glassmorphism design with animated 3D globe background

✨ Features

Authentication System

  • Login Page: Secure username/password authentication
  • Session Persistence: Login state saved in localStorage
  • Credential Display: Temporary credentials shown on login page (Username: admin, Password: admin123)
  • Logout Functionality: Secure logout with state cleanup
  • Error Handling: Clear error messages for invalid credentials

File Upload System

  • Multiple File Selection: Upload up to 100 files at once
  • File Selection: Browse button with multiple file selection support
  • File List Display: Shows all selected files with individual file sizes
  • File Management: Remove individual files from the selection before upload
  • Multi-format Support: Automatically converts any file type to CSV:
    • CSV files (no conversion needed)
    • TSV/TXT files (converts tabs to commas)
    • Excel files (XLSX, XLS) - converts first sheet to CSV
    • JSON files (converts objects/arrays to CSV)
    • Other text files (auto-detects delimiter)
  • Sequential Processing: Files are processed one by one to ensure quality
  • Backend Processing: All cleaning and processing handled by Data Processing Engine API
  • Real-time Progress Tracking: Individual status updates for each file being processed
  • Upload Status: Real-time status messages showing current file and progress
  • Error Handling: Comprehensive error messages per file; one failure doesn't stop other files

Processing Pipeline

  • Client-side Processing:
    • File Conversion: Converts any file type (Excel, JSON, TSV, TXT, etc.) to CSV for each selected file
    • Sends CSV files to backend API sequentially (one at a time)
  • Backend Processing (Data Processing Engine) - Per File:
    1. CSV Cleaning:
      • Removes lines above header
      • Ensures proper CSV format
      • Handles both comma and tab-separated files
    2. SAGAR-QC Quality Control:
      • Data analysis and intelligent test selection
      • QC test execution and flag assignment
      • Quality report generation
    3. Parquet Conversion: Converts cleaned CSV (with QC flags) to Parquet using pandas/pyarrow
    4. Storage Upload: Automatic upload to processed-data bucket
    5. Metadata Storage: Stores metadata and quality report in metadata_sagar table
  • Visual Feedback:
    • Real-time progress tracking for each file
    • Individual status indicators (converting, processing, completed, failed)
    • Animated spinner during processing
    • Success confirmation with summary
    • List of processed files with individual quality report access
  • State Management: Smooth transitions between upload, processing, and completion states
  • Batch Summary: Shows total successful and failed files after batch completion

Quality Control System (SAGAR-QC)

  • Intelligent Test Selection: AI-powered (Gemini 2.5 Flash) or rule-based test selection based on data characteristics
  • Comprehensive QC Tests:
    • IOOS-QC/QARTOD Standards: Gross range, spike detection, flat line, rate of change, climatology, temporal consistency
    • SAGAR-Specific Tests: Location validation, duplicate detection, missing data analysis
  • Data Type Awareness: Automatically detects occurrence data vs. sensor data and applies appropriate tests
  • GPS Format Support: Multi-format GPS coordinate parsing (Decimal Degrees, NMEA 0183, DDM, DMS, UTM)
  • Row-wise & Column-wise Testing: Intelligent application based on data type (occurrence data uses row-wise checks)
  • QC Flagging System: Standard flags (GOOD, SUSPECT, FAIL, MISSING, UNKNOWN) added to data
  • Quality Reports:
    • Interactive JSON Reports: Detailed metrics, test results, and recommendations
    • Formal PDF Reports: Academic-style downloadable reports with charts and detailed analysis
  • Test Rationale: Explains why specific tests were selected for each dataset

Quality Report Display

  • Individual Reports: Each processed file has its own quality report
  • Report Access: View quality reports for any processed file from the results list
  • Interactive Dashboard: Visual charts (Pie and Bar charts) showing flag distribution
  • Expandable Test Results: Column-specific details with expandable dropdowns
  • Test Metrics: Detailed statistics for each QC test executed
  • Download Options:
    • Download JSON quality report for each file
    • Download formal PDF report (academic-style) for each file
  • Visual Indicators: Color-coded flags and quality scores
  • Batch Overview: See all processed files with their individual status and report access

User Interface

  • 3D Globe Background: Interactive rotating globe with decorative points
  • Glassmorphism Design: Modern frosted glass effect with backdrop blur
  • Responsive Layout: Works on desktop and mobile devices
  • Smooth Animations: CSS keyframe animations for loading states
  • Color Scheme: Dark theme with cyan/green gradient accents

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   React App     β”‚
β”‚   (Frontend)     β”‚
β”‚  Multiple Files β”‚
β”‚  (up to 100)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β”‚ For each file (sequential):
         β”‚ 1. Convert to CSV (client-side)
         β”‚ 2. Send cleaned CSV
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Data Processingβ”‚
β”‚  Engine API      β”‚
β”‚  (FastAPI)       β”‚
β”‚  Per File:       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β”‚ 3. CSV Cleaning
         β”‚ 4. SAGAR-QC Processing
         β”‚    β”œβ”€ Data Analysis
         β”‚    β”œβ”€ Intelligent Test Selection (Gemini AI)
         β”‚    β”œβ”€ QC Test Execution
         β”‚    └─ Flag Assignment
         β”‚ 5. Convert CSV β†’ Parquet (with flags)
         β”‚ 6. Generate Quality Report (JSON)
         β”‚ 7. Upload Parquet
         β”‚ 8. Store Metadata + QC Report
         β”‚ 9. Return QC Report to Frontend
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Supabase       β”‚
β”‚  Storage        β”‚
β”‚  (processed-    β”‚
β”‚   data bucket)  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Supabase DB    β”‚
β”‚  (metadata_     β”‚
β”‚   sagar table)  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Quality Report β”‚
β”‚  Display & PDF  β”‚
β”‚  Generation     β”‚
β”‚  (Per File)     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Note: Files are processed sequentially (one at a time) to ensure quality control and prevent resource conflicts. Each file receives its own quality report and is stored independently.

πŸ› οΈ Tech Stack

Frontend

  • React 18.3.1: UI library
  • Vite 5.4.0: Build tool and dev server
  • react-globe.gl 2.36.0: 3D globe visualization
  • Three.js 0.180.0: 3D graphics library
  • @supabase/supabase-js 2.45.4: Supabase client library
  • recharts 2.15.4: Interactive charts for quality reports
  • html2canvas 1.4.1: HTML to canvas conversion for PDF
  • jspdf 2.5.2: PDF generation library

Backend/Infrastructure

  • Data Processing Engine: FastAPI service for CSV to Parquet conversion
  • SAGAR-QC Module: Proprietary quality control system
    • Google Gemini AI 2.5 Flash: Intelligent test selection
    • IOOS-QC/QARTOD Tests: Standard oceanographic QC tests
    • Custom QC Tests: SAGAR-specific quality checks
  • Supabase:
    • Storage buckets for processed data
    • Database for metadata storage

Styling

  • Inline Styles: React inline styles for component styling
  • CSS Animations: Keyframe animations for loading states
  • Glassmorphism: Backdrop blur effects

πŸ“ Project Structure

data-ingestion/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ App.jsx                    # Main application component
β”‚   β”œβ”€β”€ main.jsx                   # React entry point
β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”œβ”€β”€ GlobeBackground.jsx    # 3D globe background component
β”‚   β”‚   β”œβ”€β”€ SimpleFilePicker.jsx   # File selection component
β”‚   β”‚   β”œβ”€β”€ QualityReport.jsx      # Quality report display component
β”‚   β”‚   └── ui/
β”‚   β”‚       └── file-upload.jsx    # Alternative file upload component
β”‚   └── lib/
β”‚       β”œβ”€β”€ utils.js               # Utility functions
β”‚       β”œβ”€β”€ fileProcessing.js      # File conversion to CSV utilities
β”‚       └── pdfGenerator.js        # PDF report generation
β”œβ”€β”€ DataProcessingEngine/          # Backend processing API
β”‚   β”œβ”€β”€ main.py                    # FastAPI application
β”‚   β”œβ”€β”€ processing.py              # CSV to Parquet conversion + QC integration
β”‚   β”œβ”€β”€ config.py                  # Supabase & Gemini configuration
β”‚   β”œβ”€β”€ requirements.txt            # Python dependencies
β”‚   β”œβ”€β”€ .env.example                # Environment variables template
β”‚   β”œβ”€β”€ SAGAR_QC/                  # Proprietary QC module
β”‚   β”‚   β”œβ”€β”€ __init__.py            # Module initialization
β”‚   β”‚   β”œβ”€β”€ qc_flags.py            # QC flag definitions
β”‚   β”‚   β”œβ”€β”€ qc_tests.py            # QC test implementations
β”‚   β”‚   β”œβ”€β”€ qc_analyzer.py         # Data analysis & test selection
β”‚   β”‚   └── qc_pipeline.py         # QC pipeline orchestration
β”‚   └── README.md                   # Processing engine documentation
β”œβ”€β”€ supabase/
β”‚   β”œβ”€β”€ config.toml                # Supabase configuration
β”‚   └── functions/
β”‚       └── trigger-ingestion/
β”‚           β”œβ”€β”€ index.ts           # Edge Function (legacy)
β”‚           └── deno.json          # Deno configuration
β”œβ”€β”€ index.html                     # HTML entry point
β”œβ”€β”€ vite.config.js                  # Vite configuration
β”œβ”€β”€ package.json                   # Dependencies and scripts
β”œβ”€β”€ env.js.example                 # Environment variables template
└── README.md                      # This file

πŸš€ Setup Instructions

Prerequisites

  • Node.js (v16 or higher)
  • npm or yarn
  • Supabase account and project
  • (Optional) Netlify account for deployment

Installation Steps

  1. Clone the repository

    git clone <repository-url>
    cd data-ingestion
  2. Install dependencies

    npm install
  3. Set up environment variables

    • Copy env.js.example to .env (or create .env file)
    • Fill in your Supabase credentials and login credentials
  4. Configure Supabase

    • Create a storage bucket named processed-data in your Supabase project
    • Create a table named metadata_sagar for storing file metadata
  5. Run development server

    npm run dev
  6. Open in browser

    • Navigate to http://localhost:5173 (or the port shown in terminal)

πŸ” Environment Variables

Create a .env file in the root directory with the following variables:

VITE_LOGIN_USERNAME=admin
VITE_LOGIN_PASSWORD=admin123
VITE_PROCESSING_API_URL=http://localhost:8000

Note:

  • VITE_PROCESSING_API_URL is the URL of the Data Processing Engine API. Defaults to http://localhost:8000 for development. For production, set this to your deployed API URL (e.g., https://dataprocessingengine-sagar.onrender.com).
  • Supabase credentials are only needed in the backend (DataProcessingEngine/.env), not in the frontend.
  • Gemini API key is configured in the backend (DataProcessingEngine/.env) for intelligent QC test selection.

Environment Variable Descriptions

  • VITE_LOGIN_USERNAME: Username for portal login
  • VITE_LOGIN_PASSWORD: Password for portal login
  • VITE_PROCESSING_API_URL: URL of the Data Processing Engine API (defaults to https://dataprocessingengine-sagar.onrender.com)

Backend Environment Variables (DataProcessingEngine/.env)

SUPABASE_URL=https://your-project.supabase.co
SUPABASE_KEY=your-service-role-key-here
GEMINI_API_KEY=your-gemini-api-key-here  # Optional: for AI-powered test selection

Note: For production, use secure environment variable management. Never commit .env files to version control.

🧩 Components

App.jsx

Main application component that handles:

  • Authentication state management
  • File cleaning (client-side)
  • API communication with Data Processing Engine
  • UI state transitions (login β†’ upload β†’ processing β†’ complete)
  • LocalStorage session persistence

Key Features:

  • Login/logout functionality
  • Client-side CSV cleaning
  • Direct API calls to processing backend
  • Error handling and status messages
  • Responsive glassmorphism UI

GlobeBackground.jsx

3D interactive globe component using react-globe.gl:

  • Rotating 3D Earth visualization
  • Decorative points at specific coordinates (Bangalore, Delhi, Mumbai)
  • Atmosphere effect with white glow
  • Auto-rotation enabled
  • Responsive to window resize

Coordinates Displayed:

  • Bangalore: 12.97Β°N, 77.59Β°E (Green)
  • Delhi: 28.61Β°N, 77.20Β°E (Cyan)
  • Mumbai: 19.07Β°N, 72.87Β°E (Gold)

SimpleFilePicker.jsx

Multiple file selection component:

  • Browse button for multiple file selection (up to 100 files)
  • File list display with individual file names and sizes
  • Remove individual files from selection
  • File count display
  • Glassmorphism styling
  • Scrollable file list for large selections

QualityReport.jsx

Comprehensive quality report display component:

  • Interactive Charts: Pie chart for flag distribution, bar chart for quality metrics
  • Test Results Display: Detailed results for each QC test executed
  • Expandable Columns: Click to expand column-specific test results
  • Quality Metrics: Overall quality score, flag percentages, test statistics
  • Download Options:
    • Download JSON report
    • Download formal PDF report (academic-style)
  • Test Rationale: Displays why specific tests were selected
  • Visual Indicators: Color-coded flags and status indicators

πŸ”„ Workflow

1. User Login

  1. User enters username and password
  2. Credentials validated against environment variables
  3. On success:
    • Login state set to true
    • Session saved to localStorage
    • User redirected to upload interface
  4. On failure:
    • Error message displayed
    • User remains on login page

2. File Upload & Processing

  1. User selects files via browse button (supports up to 100 files, any file type)
  2. Selected files displayed in picker with file names and sizes
  3. User can remove individual files before upload
  4. User clicks "Upload [N] Files" button
  5. Sequential Processing (one file at a time): For each file: a. Client-side Processing:
    • File Conversion: File is converted to CSV format (if not already CSV)
      • Excel files (XLSX, XLS) β†’ CSV (first sheet)
      • JSON files β†’ CSV
      • TSV/TXT files β†’ CSV (tabs converted to commas)
      • CSV files β†’ No conversion needed b. Backend Processing (Data Processing Engine API):
    • CSV Cleaning:
      • Removes lines above header
      • Ensures proper CSV format
      • Handles both comma and tab-separated files
    • SAGAR-QC Quality Control:
      • Data Analysis: Analyzes CSV structure and data characteristics
      • Intelligent Test Selection:
        • Uses Gemini AI (if available) to analyze headers and select appropriate tests
        • Falls back to rule-based selection if AI unavailable
        • Logs which system was used (AI or rule-based)
      • QC Test Execution: Runs selected tests:
        • For Occurrence Data (species records, biodiversity): missing_data (row-wise), location (if coordinates present)
        • For Sensor Data (time-series): gross_range, spike, flat_line, rate_of_change, temporal_consistency, climatology, missing_data (column-wise), duplicate_detection
      • Flag Assignment: Adds flag column to DataFrame with QC flags (GOOD, SUSPECT, FAIL, MISSING, UNKNOWN)
      • Quality Report Generation: Creates comprehensive JSON report with:
        • Summary statistics
        • Detailed metrics per test
        • Test rationale
        • Recommendations
    • Parquet Conversion: Converts CSV with QC flags to Parquet format
    • Storage Upload: Parquet file uploaded to processed-data bucket
    • Metadata Storage: Metadata + quality report stored in metadata_sagar table c. Progress Tracking:
    • Real-time status updates for each file (converting, processing, completed, failed)
    • Individual progress indicators
    • Error messages displayed per file if processing fails
  6. Quality Report Display:
    • List of all processed files displayed after batch completion
    • Individual "View Report" button for each file
    • Interactive charts showing flag distribution for selected file
    • Expandable test results with column-specific details
    • Download options for JSON and PDF reports per file
  7. Completion:
    • Success animation with checkmark
    • Summary message: "[N] files processed and stored to lakehouse"
    • List of processed files with status indicators
    • Individual quality reports available for viewing and download
    • Return to upload interface

3. Error Handling

  • Upload Errors: Displayed in red with error message per file
  • Network Errors: Caught and displayed to user; doesn't stop processing of other files
  • Validation Errors: File selection validation before upload (max 100 files)
  • QC Errors: QC test failures are logged but don't stop processing
  • Individual File Errors: One file failure doesn't affect other files in the batch
  • Error Recovery: Failed files are clearly marked with error details in the results list

⚑ Data Processing Engine

Overview

The Data Processing Engine is a FastAPI service located in the DataProcessingEngine/ folder that handles CSV to Parquet conversion and storage.

Setup

  1. Navigate to the DataProcessingEngine directory:

    cd DataProcessingEngine
  2. Install dependencies:

    pip install -r requirements.txt
  3. Configure environment variables:

    • Copy .env.example to .env
    • Fill in your Supabase credentials and optional Gemini API key:
      SUPABASE_URL=https://your-project.supabase.co
      SUPABASE_KEY=your-service-role-key-here
      GEMINI_API_KEY=your-gemini-api-key-here  # Optional: enables AI-powered test selection
      
    • Note: If GEMINI_API_KEY is not provided, the system will use rule-based test selection
  4. Run the server:

    uvicorn main:app --reload --port 8000

API Endpoint

POST /process-csv

Processes a cleaned CSV file:

  • Runs SAGAR-QC quality control tests
  • Adds QC flags to data
  • Converts CSV to Parquet format using pandas/pyarrow
  • Uploads to processed-data bucket in Supabase Storage
  • Stores metadata and quality report in metadata_sagar table

Request:

  • Method: POST
  • Content-Type: multipart/form-data
  • Body: File upload (cleaned CSV file)

Response:

{
  "status": "success",
  "processed_file": "filename.parquet",
  "metadata": {
    "columns": [...],
    "inferred_types": {...},
    "total_rows": 1000,
    "quality_control": {
      "summary": {...},
      "detailed_metrics": {...},
      "test_results": {...}
    },
    "quality_report_json": {
      "summary": {...},
      "detailed_metrics": {...},
      "test_results": {...},
      "test_rationale": "...",
      "recommendations": [...]
    }
  }
}

For more details, see DataProcessingEngine/README.md.

πŸ”¬ Quality Control (SAGAR-QC)

Overview

The SAGAR-QC module is a proprietary quality control system integrated into the Data Processing Engine. It provides comprehensive data validation using both IOOS-QC/QARTOD standards and custom SAGAR-specific tests.

Key Features

Intelligent Test Selection

  • AI-Powered (Gemini 2.5 Flash): Analyzes CSV headers and data characteristics to intelligently select appropriate QC tests
  • Rule-Based Fallback: Uses rule-based logic if Gemini AI is unavailable
  • Data Type Detection: Automatically distinguishes between:
    • Occurrence Data: Species records, biodiversity data, checklists (row-wise testing)
    • Sensor Data: Time-series measurements, real-time sensor data (column-wise testing)

QC Tests Available

IOOS-QC/QARTOD Standard Tests:

  • Gross Range Test: Validates data within acceptable physical/biological ranges
  • Spike Test: Detects sudden, unrealistic value changes
  • Flat Line Test: Identifies constant values (sensor malfunction)
  • Rate of Change Test: Validates rate of change between consecutive values
  • Climatology Test: Compares against historical climatological ranges
  • Temporal Consistency Test: Validates temporal ordering and gaps

SAGAR-Specific Tests:

  • Location Test: Validates GPS coordinates in multiple formats:
    • Decimal Degrees (DD)
    • NMEA 0183 (DDMM.MMMM, DDMMSS.SSSS)
    • Degrees Decimal Minutes (DDM)
    • Degrees Minutes Seconds (DMS)
    • UTM coordinates
  • Missing Data Test:
    • Row-wise for occurrence data (checks critical identifier fields)
    • Column-wise for sensor data (flags columns with excessive missing values)
  • Duplicate Detection: Identifies duplicate records

QC Flag System

Standard flags applied to data:

  • GOOD (1): Data passes all applicable tests
  • UNKNOWN (2): Insufficient information to determine quality
  • SUSPECT (3): Data may be questionable but not definitively bad
  • FAIL (4): Data fails quality tests
  • MISSING (9): Data value is missing

Quality Reports

JSON Report Structure:

{
  "summary": {
    "quality_status": "GOOD|SUSPECT|FAIL",
    "flag_summary": {...},
    "total_rows": 1000,
    "tests_executed": [...]
  },
  "detailed_metrics": {
    "overall_quality_score": 95.5,
    "good_percentage": 90.0,
    "suspect_percentage": 8.0,
    "fail_percentage": 2.0
  },
  "test_results": {
    "test_name": {
      "rows_flagged": 50,
      "columns_checked": [...],
      "column_results": {...}
    }
  },
  "test_rationale": "Explanation of why tests were selected",
  "recommendations": [...]
}

PDF Report Features:

  • Academic-style formal report
  • Title page with dataset information
  • Executive summary
  • Data characteristics analysis
  • Quality control methodology
  • Test selection rationale
  • Detailed test results with charts
  • Recommendations for data improvement

GPS Format Support

The location test supports multiple GPS coordinate formats:

  1. Decimal Degrees (DD): 12.9716, 77.5946
  2. NMEA 0183 DDMM.MMMM: 958.217, 7614.599 (variable length)
  3. NMEA 0183 DDMMSS.SSSS: 095821.7, 0761459.9
  4. Degrees Decimal Minutes (DDM): 12Β°58.217', 77Β°14.599'
  5. Degrees Minutes Seconds (DMS): 12Β°58'13", 77Β°14'36"
  6. UTM: Universal Transverse Mercator coordinates

The system automatically detects the format and converts to decimal degrees for validation.

Usage in Processing Pipeline

The SAGAR-QC module is automatically executed during the data processing pipeline:

  1. CSV is cleaned and parsed
  2. Data structure is analyzed
  3. Appropriate tests are selected (AI or rule-based)
  4. Tests are executed and flags assigned
  5. Quality report is generated
  6. DataFrame with flags is converted to Parquet
  7. Report is stored in metadata and returned to frontend

Configuration

QC test behavior can be configured via test-specific parameters:

  • Range bounds for gross range test
  • Spike detection thresholds
  • Missing data percentage limits
  • Climatology reference data
  • And more...

See DataProcessingEngine/SAGAR_QC/ for detailed implementation.

πŸ’» Development

Available Scripts

# Start development server
npm run dev

# Build for production
npm run build

# Preview production build
npm run preview

Development Server

  • Runs on http://localhost:5173 by default
  • Hot Module Replacement (HMR) enabled
  • Fast refresh for React components

Code Structure

  • Component-based: React functional components with hooks
  • State Management: React useState and useEffect hooks
  • Styling: Inline styles with glassmorphism effects
  • Error Handling: Try-catch blocks and error state management

πŸ“¦ Build & Preview

Build for Production

npm run build

This creates an optimized production build in the dist/ directory:

  • Minified JavaScript
  • Optimized assets
  • Tree-shaking for smaller bundle size

Preview Production Build

npm run preview

Starts a local server to preview the production build before deployment.

🚒 Deployment

Netlify Deployment

  1. Create Netlify Site

    • Connect your Git repository
    • Or drag and drop the dist folder after building
  2. Configure Build Settings

    • Build command: npm run build
    • Publish directory: dist
    • Node version: 18.x or higher
  3. Set Environment Variables In Netlify dashboard β†’ Site settings β†’ Environment variables:

    VITE_SUPABASE_URL=your-supabase-url
    VITE_SUPABASE_ANON_KEY=your-anon-key
    VITE_LOGIN_USERNAME=admin
    VITE_LOGIN_PASSWORD=admin123
    
  4. Deploy

    • Push to main branch (auto-deploy)
    • Or trigger manual deploy from Netlify dashboard

Other Deployment Options

Vercel:

  • Similar to Netlify
  • Set environment variables in project settings
  • Auto-detects Vite configuration

Supabase Hosting:

  • Can host static sites
  • Environment variables configured in Supabase dashboard

Traditional Hosting:

  • Build locally: npm run build
  • Upload dist/ folder contents to web server
  • Configure environment variables on server

πŸ”’ Security Considerations

  1. Authentication: Currently uses client-side validation. For production:

    • Move authentication to server-side
    • Use Supabase Auth for proper user management
    • Implement JWT tokens
  2. Environment Variables:

    • Never commit .env files
    • Use secure secret management in production
    • Rotate keys regularly
  3. Storage Permissions:

    • Configure Supabase Storage bucket policies
    • Restrict upload permissions appropriately
    • Enable RLS (Row Level Security) if needed
  4. API Endpoints:

    • Secure backend ingestion API
    • Use API keys or authentication tokens
    • Implement rate limiting

πŸ› Troubleshooting

Common Issues

Upload fails:

  • Check that the Data Processing Engine API is running
  • Verify VITE_PROCESSING_API_URL is set correctly in .env (defaults to http://localhost:8000 for development)
  • Check browser console for detailed error messages
  • Ensure backend has proper Supabase credentials configured
  • For multiple files: Check individual file status in the processing list to identify which files failed

Login not working:

  • Verify environment variables are set correctly
  • Check that VITE_ prefix is used (required for Vite)
  • Restart dev server after changing .env

Globe not displaying:

  • Check internet connection (globe uses external image URLs)
  • Verify react-globe.gl and three are installed
  • Check browser console for WebGL errors

Backend API not responding:

  • Verify the Data Processing Engine is running (uvicorn main:app --reload --port 8000)
  • Check backend logs for errors
  • Ensure backend .env has correct Supabase credentials
  • Verify CORS is enabled in the backend for your frontend URL

Quality Control not working:

  • Check backend logs for QC test execution messages
  • Verify SAGAR_QC module is properly installed
  • Check if Gemini AI is being used (look for "Using Gemini AI" or "Using rule-based" in logs)
  • If using Gemini AI, ensure GEMINI_API_KEY is set in backend .env
  • Review test selection rationale in quality report to understand which tests were selected

Quality Report not displaying:

  • Check browser console for errors
  • Verify quality_report_json is present in API response for the specific file
  • Ensure recharts and PDF libraries are installed (npm install)
  • Check that QualityReport component is properly imported in App.jsx
  • For multiple files: Click "View Report" button for the specific file you want to view
  • Verify the file was successfully processed (check status indicator)

πŸ“ License

[Add your license information here]

πŸ‘₯ Contributors

[Add contributor information here]

πŸ“ž Support

For issues, questions, or contributions, please create an issue or contact the development team.


Built with ❀️ for CMLRE SAGAR Data Lakehouse

About

Data Ingestion Platform for SAGAR

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors