Skip to content

calypr/path_hierarchy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Path Aggregation API

:experimental: This project is experimental and may change significantly.

A small FastAPI service that returns aggregate file counts and total sizes for any (resource, path) pair. Backed by PostgreSQL tables and triggers that maintain an aggregate table path_agg_res from index_record changes.

Endpoints

  • GET /{resource}/path/{path}

Response:

{
  "resource": "<extracted_resource>",
  "path": "<extracted_path>",
  "size": <extracted_size>,
  "file_count": <calculated_file_count>
}

Quickstart (dev)

# 1) Install Poetry if needed
# https://python-poetry.org/docs/#installation

poetry install
poetry run uvicorn app.main:app --reload
# then open http://127.0.0.1:8000//programs/a/project/b/path//data

Environment

Create a .env file with:

DATABASE_URL=postgresql://user:pass@localhost:5432/mydb

If DATABASE_URL is missing or the DB call fails, the service will return zeros (and log the error).

Database / Migration

Run migrations/001_init.sql against your Postgres instance. It will:

  • create index_record, index_record_authz (if not present),
  • create helper functions,
  • create path_agg_res aggregate table,
  • create triggers to keep path_agg_res updated on inserts/updates/deletes to index_record and index_record_authz.

Tests

poetry run pytest

Integration tests with Postgres (SQLAlchemy)

These tests spin up a disposable Postgres in Docker via Testcontainers, apply the SQL migration using SQLAlchemy, then run API tests against the live database.

Requirements:

  • Postgres

Run:

poetry install
poetry run pytest -q

Architecture

See migrations/001_init.sql for database schema, functions, and triggers. See openapi.json for API schema.

sequenceDiagram
    autonumber

    actor Client
    participant API as FastAPI (GET /{resource}/path/{path})
    participant DB as PostgreSQL
    participant TR as Triggers (PL/pgSQL)
    participant SRC as Source Tables<br/>index_record, index_record_authz
    participant AGG as Aggregate Table<br/>path_agg_res

    %% ---- Query path aggregate ----
    Client->>API: GET /{resource}/path/{path}
    API->>API: normalize(resource, path)
    API->>DB: SELECT file_count, total_size<br/>FROM path_agg_res<br/>WHERE resource=$1 AND path=$2
    DB-->>API: {file_count, total_size} or NULL
    API->>API: default to {0,0} if NULL
    API-->>Client: 200 JSON {resource, path, file_count, size}

    %% ---- Background maintenance via triggers ----
    rect rgb(245,245,245)
      note over SRC,TR: Any data change that affects paths/resources
      Client->>API: (ingest/update via upstream app)*
      API->>DB: INSERT/UPDATE/DELETE on index_record<br/>and/or index_record_authz
      DB->>TR: AFTER INSERT/UPDATE/DELETE fire
      TR->>TR: compute path prefixes (/a, /a/b, …)
      TR->>AGG: UPSERT/UPDATE per (path, resource)<br/>file_count ±1, total_size ±Δ
      AGG-->>TR: ok
      TR-->>DB: done
    end

    note over Client,API:This service exposes the read endpoint. Writes typically come from indexd workflows/services.
Loading

Database Trigger

A PostgreSQL trigger is used to automatically update the path_agg_res aggregation table whenever indexd tables are modified.
This ensures that aggregated counts and sizes remain consistent without manual intervention.

  • Trigger Function:
    The trigger function listens for INSERT, UPDATE, or DELETE events on the source data table. On each event, it recalculates affected aggregates and updates the path_agg_res table.

  • Benefits:

    • Keeps aggregation data up-to-date.
    • Reduces the need for expensive queries at runtime.
    • Ensures data integrity between source and aggregate tables.

Technologies

  • Python 3.12: Main programming language.
  • FastAPI: REST API framework.
  • SQLAlchemy / psycopg: Database access.
  • PostgreSQL: Relational database.
  • Pytest: Testing framework.
  • Poetry: Dependency management.

Components

  • API Layer (app/):
    Exposes REST endpoints for resource and path aggregation. Handles request validation and response formatting.

  • Database Layer (app/db.py):
    Contains functions for querying and aggregating data from PostgreSQL. Uses raw SQL and connection helpers.

  • Models (app/models.py):
    Defines data structures for resources, paths, and aggregation results.

  • Tests (tests/):
    Unit and integration tests for API and database logic.

Data Flow

  1. Client Request:
    Client sends HTTP request to API endpoint with resource, path, and depth parameters.

  2. API Processing:
    FastAPI validates input, normalizes paths, and calls aggregation logic.

  3. Database Query:
    Aggregation function constructs SQL query to fetch matching paths and children up to specified depth.

  4. Response:
    Results are formatted as JSON and returned to the client.

Directory Structure

app/
  ├── main.py         # FastAPI app entrypoint
  ├── db.py           # Database access functions
  ├── models.py       # Data models
  └── ...
tests/
  └── ...             # Test cases and fixtures
pyproject.toml        # Project configuration
README.md             # Project overview

Notes

  • All database queries are parameterized to prevent SQL injection.
  • API endpoints are documented with OpenAPI via FastAPI.
  • Integration tests require a running PostgreSQL instance.

About

experimental: indexd path hierarchy service

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published