Skip to content

Bio-Block is a decentralized document management system that uses blockchain, IPFS, and vector databases to securely store and verify healthcare records.

License

Notifications You must be signed in to change notification settings

Amank1412/bio-block

 
 

Repository files navigation

Bio-Block: Secure Document Management System

Bio-Block is a decentralized document management system that leverages blockchain technology, IPFS (InterPlanetary File System), and vector databases to provide secure, verifiable, and privacy-preserving document storage and management for healthcare data.

Key Features

🔐 Advanced Security & Privacy

  • Streaming Encryption: Memory-safe encryption for large files (>5MB) with real-time progress tracking
  • PHI Anonymization: Automatic anonymization of Personal Health Information in Excel and image files
  • Blockchain Verification: Document hashes stored on Ethereum for tamper-proof verification
  • Decentralized Storage: IPFS-based storage with encryption and secure access controls

🏥 Healthcare Data Management

  • Multi-format Support: Excel (.xlsx, .xls), CSV, ODS, TSV, and other spreadsheet formats (.xlsm, .xlsb), plus medical images (.jpg, .jpeg, .png)
  • Smart Anonymization: Wallet-based hashing for personal data, OCR+NLP for medical images
  • Preview System: Free 5% preview of Excel data for evaluation before purchase
  • Metadata Collection: Comprehensive tagging with disease types, demographics, and data sources

🔍 Intelligent Search & Discovery

  • Vector Search: Natural language queries using ChromaDB for semantic document discovery
  • Advanced Filtering: Filter by data type, gender, source, file type, and other metadata
  • Combined Search: Semantic search enhanced with metadata filters for precise results

💰 Marketplace & Economics

  • Document Marketplace: Set prices and earn from document sales
  • Earnings Tracking: Real-time earnings display and withdrawal functionality
  • Preview Downloads: Free evaluation of data quality before purchase
  • Wallet Integration: Seamless Ethereum wallet connectivity

Architecture

Bio-Block follows a microservices architecture with separate frontend and backend services:

Architecture Diagram

Project Structure

healthy/
├── prototype/                 # React frontend application
│   ├── src/
│   │   ├── App.js            # Main application with navigation
│   │   ├── contractService.js # Smart contract interactions
│   │   ├── upload_data.js    # Document upload interface with streaming encryption
│   │   ├── search_data.js    # Document search interface with smart decryption
│   │   ├── Dashboard.js      # User dashboard with earnings and document management
│   │   ├── encryptionUtils.js # Traditional document encryption utilities
│   │   ├── utils/
│   │   │   └── streamingEncryption.js # Memory-safe streaming encryption for large files
│   │   └── DocumentStorage.sol # Smart contract source
│   └── package.json
├── python_backend/           # FastAPI service
│   ├── main.py               # ChromaDB, search endpoints, and image PHI anonymization
│   ├── requirements.txt
│   ├── vercel.json           # Vercel deployment config
│   ├── tests/                # Python API test suite
│   │   ├── test_api.py       # Comprehensive API tests with unittest
│   │   └── test.jpg          # Test image for anonymization tests
│   └── chroma_db/            # Local ChromaDB storage
├── javascript_backend/        # Express.js API server
│   ├── controllers/          # Business logic controllers
│   │   ├── anonymizeController.js # Excel file anonymization logic
│   │   ├── ipfsController.js      # IPFS interaction logic
│   │   └── healthController.js    # Health check logic
│   ├── routes/              # API route definitions
│   │   ├── anonymize.js     # Excel anonymization routes
│   │   ├── ipfs.js          # IPFS routes
│   │   └── health.js        # Health check routes
│   ├── tests/               # JavaScript API test suite
│   │   ├── api.test.js      # Mocha/Chai/SuperTest API tests
│   │   └── test.xlsx        # Test Excel file with PHI data
│   ├── server.js            # Main server file
│   ├── vercel.json          # Vercel deployment config
│   └── package.json
└── README.md

Frontend (React)

  • Modern UI built with React.js and Tailwind CSS
  • Interactive progress tracking for uploads and encryption
  • Wallet integration for Ethereum connectivity
  • Document marketplace and earnings dashboard

JavaScript Backend (Express.js - Port 3001)

  • Excel file processing and PHI anonymization
  • IPFS file upload handling
  • Preview generation for Excel files
  • RESTful API with MVC architecture

Python Backend (FastAPI - Port 3002)

  • Vector database operations using ChromaDB
  • Image PHI anonymization using Presidio/OCR
  • Semantic search and document filtering
  • Advanced ML-based text processing

Smart Contracts (Solidity)

  • Document verification on Ethereum blockchain
  • Marketplace functionality for document sales
  • Earnings tracking and withdrawal system

Quick Start

Prerequisites

  • Node.js (v14+)
  • Python (v3.8+)
  • MetaMask or Ethereum wallet
  • Git

Installation

  1. Clone and setup

    git clone https://github.com/yourusername/bio-block.git
    cd bio-block
  2. Backend setup

    # Python backend
    cd python_backend
    pip install -r requirements.txt
    python -m spacy download en_core_web_lg
    
    # JavaScript backend
    cd ../javascript_backend
    npm install
    
    # Frontend
    cd ../prototype
    npm install
  3. Environment configuration

    Create .env in prototype/:

    REACT_APP_PINATA_JWT=your_pinata_jwt_key
    REACT_APP_ENCRYPTION_KEY=your_32_byte_encryption_key
    REACT_APP_PYTHON_BACKEND_URL=http://localhost:3002
    REACT_APP_JS_BACKEND_URL=http://localhost:3001
  4. Run the application

    # Terminal 1: Python backend
    cd python_backend && uvicorn main:app --reload --port 3002
    
    # Terminal 2: JavaScript backend  
    cd javascript_backend && node server.js
    
    # Terminal 3: Frontend
    cd prototype && npm start

Access the application at http://localhost:3000

API Endpoints

JavaScript Backend (Express.js)

🔧 Local URL: http://localhost:3001

  • GET / - Root endpoint with API information
  • GET /api/health - Health check endpoint to verify server status
  • POST /api/anonymize - Anonymize PHI (Personal Health Information) in spreadsheet files with optional preview generation
    • Input: Spreadsheet file (.xlsx, .xls, .csv, .ods, .tsv, .xlsm, .xlsb) via multipart form data
    • Optional: Wallet address for personal data anonymization
    • Optional: generatePreview=true parameter to create 5% sample preview
    • Output: Full anonymized spreadsheet file, and preview file (if requested) containing first 5% of rows (min 5, max 50)
  • POST /api/ipfs/upload - Upload a file to IPFS
    • Input: file via multipart form data
    • Output: IPFS hash of the uploaded file
  • Organized with MVC architecture (controllers and routes)

Python Backend (FastAPI)

🔧 Local URL: http://localhost:3002

  • GET / - Health check and API information
  • POST /store - Store document summaries and metadata in ChromaDB
  • POST /search - Search documents using natural language queries
  • POST /filter - Filter documents by metadata criteria (data type, gender, data source, file type)
  • POST /search_with_filter - Combined semantic search with metadata filtering
  • POST /anonymize_image - Anonymize PHI in medical images using Presidio ML models with OCR+spaCy fallback
    • Input: Image file (.jpg, .jpeg, .png) via multipart form data
    • Output: Anonymized image with advanced ML-based PHI redaction
    • Method: Presidio (primary), Tesseract OCR + spaCy NLP (fallback)
  • Returns similarity scores, document metadata, and summaries

Example API Usage

# Health check - JavaScript backend
curl http://localhost:3001/api/health

# Health check - Python backend
curl http://localhost:3002/

# Search documents (POST request)
curl -X POST http://localhost:3002/search \
  -H "Content-Type: application/json" \
  -d '{"query": "patient information", "k": 5}'

# Filter documents by metadata
curl -X POST http://localhost:3002/filter \
  -H "Content-Type: application/json" \
  -d '{"filters": {"dataType": "Personal", "gender": "Male"}, "n_results": 10}'

# Combined search with filters
curl -X POST http://localhost:3002/search_with_filter \
  -H "Content-Type: application/json" \
  -d '{"query": "diabetes research", "filters": {"dataType": "Institution", "dataSource": "Hospital"}, "n_results": 5}'

# Anonymize medical image using Presidio ML models
curl -X POST http://localhost:3002/anonymize_image \
  -F "file=@medical_scan.jpg"

# Test spreadsheet anonymization (JavaScript backend)
curl -X POST http://localhost:3001/api/anonymize \
  -F "file=@sample_data.xlsx" \
  -F "generatePreview=true"

# Test with different spreadsheet formats
curl -X POST http://localhost:3001/api/anonymize \
  -F "file=@test_sample.csv" \
  -F "generatePreview=true"

curl -X POST http://localhost:3001/api/anonymize \
  -F "file=@test_sample.tsv" \
  -F "generatePreview=true"

Testing

Bio-Block includes comprehensive test suites for both backend services:

Running Tests

# Python backend tests (6 tests)
cd python_backend && python tests/test_api.py

# JavaScript backend tests (3 tests)
cd javascript_backend && npm test

Test Coverage

  • Python Backend: Store, search, filter, anonymize image endpoints
  • JavaScript Backend: Health check, Excel anonymization, IPFS upload
  • Automated Test Data: Dynamic generation of test files with sample data

Contributing

We welcome contributions! Here's how to get started:

Development Setup

  1. Fork the repository
  2. Follow the Quick Start guide
  3. Create a feature branch: git checkout -b feature/your-feature
  4. Make your changes and test thoroughly
  5. Submit a pull request

Code Style

  • Follow existing code patterns
  • Add tests for new features
  • Update documentation as needed
  • Ensure all tests pass before submitting

Project Structure

healthy/
├── prototype/                 # React frontend
├── python_backend/           # FastAPI service
├── javascript_backend/       # Express.js API
└── README.md                 # This file

For detailed setup instructions, see the Quick Start section above.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

Built with these amazing technologies:


Bio-Block - Secure, decentralized healthcare document management for the Web3 era.

About

Bio-Block is a decentralized document management system that uses blockchain, IPFS, and vector databases to securely store and verify healthcare records.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 59.7%
  • Python 33.1%
  • HTML 3.1%
  • Solidity 2.3%
  • CSS 1.8%