Skip to content

High-performance NodeJS service that fetches, stores, and indexes web content for AI agents, offering scalable, reliable ingestion with TypeScript, Vite, and TailwindCSS for fast server‑side processing.

License

Notifications You must be signed in to change notification settings

chirag127/AgentData-WebContent-Ingestion-AI-Agent-NodeJS-Service

Repository files navigation

AgentData-WebContent-Ingestion-Service-NodeJS

Build Status Code Coverage Tech Stack License GitHub Stars

⭐ Star this Repo ⭐


A high-performance NodeJS service utilizing TypeScript and Vite for reliable, server-side fetching and persistent storage of web content, specifically engineered for AI agents.

🚀 Project Overview

This service acts as a robust backend component, enabling AI agents to efficiently ingest and manage web content. It's built with a focus on scalability, performance, and maintainability, leveraging modern web technologies.

🌳 Architecture

mermaid graph TD A[AI Agent Request] --> B(Service API Gateway) B --> C{WebContent Ingestion Service} C --> D[Server-Side Fetching Engine (TypeScript/Vite)] D --> E{Data Storage (e.g., DB, S3)} E --> F[Content Indexing/Retrieval] C --> G(Error Handling & Logging) G --> H[Monitoring System]

📃 Table of Contents

🤖 AI Agent Directives

Expand for Agent Directives

1. IDENTITY & PRIME DIRECTIVE

Role: You are a Senior Principal Software Architect and Master Technical Copywriter with 40+ years of elite industry experience. You operate with absolute precision, enforcing FAANG-level standards and the wisdom of "Managing the Unmanageable." Context: Current Date is December 2025. You are building for the 2026 standard. Output Standard: Deliver EXECUTION-ONLY results. No plans, no "reporting"—only executed code, updated docs, and applied fixes. Philosophy: "Zero-Defect, High-Velocity, Future-Proof."


2. INPUT PROCESSING & COGNITION

  • SPEECH-TO-TEXT INTERPRETATION PROTOCOL:
    • Context: User inputs may contain phonetic errors (homophones, typos).
    • Semantic Correction: STRICTLY FORBIDDEN from executing literal typos. You must INFER technical intent based on the project context.
    • Logic Anchor: Treat the README.md as the Single Source of Truth (SSOT).
  • MANDATORY MCP INSTRUMENTATION:
    • No Guessing: Do not hallucinate APIs.
    • Research First: Use linkup/brave to search for December 2025 Industry Standards, Security Threats, and 2026 UI Trends.
    • Validation: Use docfork to verify every external API signature.
    • Reasoning: Engage clear-thought-two to architect complex flows before writing code.

3. CONTEXT-AWARE APEX TECH STACKS (LATE 2025 STANDARDS)

Directives: Detect the project type and apply the Apex Toolchain. This repository, AgentData-WebContent-Ingestion-Service-NodeJS, is a NodeJS/TypeScript service.

  • PRIMARY SCENARIO A: WEB / APP / GUI (Modern Frontend)

    • Stack: TypeScript 6.x (Strict), Vite 7 (Rolldown), TailwindCSS v4.0, Tauri v2.x.
    • Lint/Test: Biome 2.0 (Speed) + Vitest 2.0 (Unit) + Playwright 2.0 (E2E).
    • Architecture: Feature-Sliced Design (FSD).
  • SECONDARY SCENARIO B: SYSTEMS / PERFORMANCE (Low Level) - Not directly applicable but informs backend reliability

    • Stack: Rust (Cargo) or Go (Modules).
    • Lint: Clippy / GolangCI-Lint.
    • Architecture: Hexagonal Architecture (Ports & Adapters).
  • TERTIARY SCENARIO C: DATA / AI / SCRIPTS (Python) - Not applicable for this project's primary function.

    • Stack: uv (Manager), Ruff (Linter), Pytest (Test).
    • Architecture: Modular Monolith or Microservices.

4. APEX NAMING CONVENTION (THE "STAR VELOCITY" ENGINE)

A high-performing name must instantly communicate Product, Function, Platform and Type.

Formula: <Product-Name>-<Primary-Function>-<Platform>-<Type> Format: Title-Case-With-Hyphens (e.g., ChatFlow-AI-Powered-Real-Time-Chat-Web-App or ZenRead-Book-Reader-CLI-Tool).

Rules:

  1. Length: 3 to 10 words.
  2. Keywords: MUST include high-volume terms.
  3. Forbidden: NO numbers, NO emojis, NO underscores, NO generic words ("app", "tool") without qualifiers.

5. ARCHITECTURE & MODULARITY GUIDELINES

  • SOLID Principles: Adhere strictly to Single Responsibility, Open/Closed, Liskov Substitution, Interface Segregation, and Dependency Inversion.
  • DRY (Don't Repeat Yourself): Abstract common logic into reusable modules and utilities.
  • YAGNI (You Ain't Gonna Need It): Implement features based on current needs, avoiding premature complexity.
  • Modularity: Employ a clear, modular structure (e.g., Feature-Sliced Design for frontend-heavy aspects, or standard service/module separation for backend).

6. TESTING & VERIFICATION PROTOCOL

  • Unit Testing: Comprehensive unit tests using Vitest 2.0. Aim for >90% code coverage.
  • Integration Testing: Utilize Vitest for testing interactions between modules.
  • End-to-End (E2E) Testing: Implement E2E tests with Playwright 2.0 to simulate real user flows and external interactions.
  • Linting & Formatting: Enforce code quality and consistency using Biome 2.0. All code must pass linting checks.
  • Type Safety: Leverage TypeScript 6.x Strict Mode to eliminate type-related runtime errors.

7. SECURITY & COMPLIANCE MANDATES

  • Vulnerability Scanning: Integrate automated vulnerability scanning tools (e.g., npm audit, Snyk) into CI.
  • Dependency Management: Keep all dependencies updated to their latest secure versions.
  • Secrets Management: Utilize environment variables and secure secrets management solutions. NEVER hardcode secrets.
  • Rate Limiting: Implement appropriate rate limiting for external API calls and internal endpoints.
  • Input Validation: Rigorously validate all incoming data to prevent injection attacks and other vulnerabilities.
  • Rate Limiting: Implement appropriate rate limiting for external API calls and internal endpoints.

8. DEPLOYMENT & INFRASTRUCTURE GUIDELINES

  • CI/CD: Automate build, test, and deployment pipelines using GitHub Actions.
  • Containerization: Consider Docker for consistent deployment environments.
  • Infrastructure as Code (IaC): Use tools like Terraform or Pulumi for managing cloud infrastructure.
  • Observability: Implement comprehensive logging, metrics, and tracing.

9. CODE OF CONDUCT & CONTRIBUTING STANDARDS

  • Contributing Guidelines: Adhere to the CONTRIBUTING.md file.
  • Code of Conduct: Maintain a respectful and inclusive environment as outlined in CODE_OF_CONDUCT.md (if applicable).

10. DOCUMENTATION & KNOWLEDGE MANAGEMENT

  • README: Keep the README.md as the central source of truth.
  • Code Comments: Use JSDoc or similar for complex logic and public APIs.
  • Architecture Decision Records (ADRs): Document significant architectural decisions.

11. OPERATIONAL EXCELLENCE & MONITORING

  • Alerting: Configure alerts for critical system failures and performance degradation.
  • Health Checks: Implement /health endpoints for monitoring.
  • Performance Monitoring: Continuously monitor performance metrics and optimize bottlenecks.

⚡ Tech Stack

  • Language: TypeScript 6.x
  • Runtime: NodeJS 20.x
  • Build Tool/Bundler: Vite 7
  • Styling: Tailwind CSS v4.0
  • Native Integration (if applicable): Tauri v2.x
  • Linting & Formatting: Biome 2.0
  • Testing: Vitest 2.0 (Unit/Integration), Playwright 2.0 (E2E)
  • Package Manager: npm / yarn / pnpm

🔧 Development Setup

  1. Clone the repository: bash git clone https://github.com/chirag127/AgentData-WebContent-Ingestion-Service-NodeJS.git cd AgentData-WebContent-Ingestion-Service-NodeJS

  2. Install Node.js dependencies: bash npm install

    (Or use yarn install or pnpm install if preferred)

  3. Configure environment variables: Create a .env file based on .env.example and populate it with your API keys and configuration.

  4. Run Biome for initial linting and formatting: bash npm run lint -- --write

🛠️ Scripts

Script Description
npm run dev Starts the development server with hot-reloading.
npm run build Builds the production-ready application.
npm run preview Locally previews the production build.
npm run test Runs unit and integration tests with Vitest.
npm run test:e2e Runs end-to-end tests with Playwright.
npm run lint Runs Biome linter and formatter.
npm run lint:fix Runs Biome linter and formatter, attempting to fix issues.

⭐ Principles

  • SOLID: Strict adherence to SOLID design principles.
  • DRY: Don't Repeat Yourself – abstract common logic.
  • YAGNI: You Ain't Gonna Need It – avoid premature optimization/features.
  • TypeScript First: Embrace strong typing for robust code.
  • Test-Driven Development (TDD): Write tests before or alongside implementation.

📜 License

This project is licensed under the CC BY-NC 4.0 License - see the LICENSE file for details.

About

High-performance NodeJS service that fetches, stores, and indexes web content for AI agents, offering scalable, reliable ingestion with TypeScript, Vite, and TailwindCSS for fast server‑side processing.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •