Skip to content

manbradcalf/graphlint

Repository files navigation

graphlint

An ontology linter for labeled property graphs.

Define your graph schema in SHACL. Validate it against any LPG database.

How it works

flowchart LR
    subgraph input [" "]
        SHACL["movies.shacl.ttl(SHACL schema)"]
    end

    subgraph shacl_parser ["shacl_parser.py"]
        RDFLib["rdflibTurtle → RDF graph"]
        SHWalk["Walk SHACL shapesextract constraints"]
        IR["Validation Plan(Check objects)"]
        RDFLib --> SHWalk --> IR
    end

    subgraph mapping ["Mapping"]
        direction TB
        M1["URI → node labelmovies#Movie → :Movie"]
        M2["URI → propertymovies#title → title"]
        M3["URI → relationshipmovies#hasActor → :HAS_ACTOR"]
    end

    subgraph backends ["backends/"]
        Cypher["cypher.pyNeo4j, Memgraph"]
        GQL["gql.pyISO GQL"]
    end

    subgraph runner ["runner.py"]
        Compile["compile_plan()Check → query string"]
        Execute["execute_plan()run against live DB"]
        DryRun["dry_run()print queries only"]
    end

    subgraph output [" "]
        Report["Validation Report✓ pass / ✗ violationper node, per check"]
    end

    SHACL --> RDFLib
    SHWalk -.-> mapping
    IR --> Compile
    Compile --> Cypher & GQL
    Cypher & GQL --> Execute & DryRun
    Execute --> Report
Loading
  1. Write shapes in SHACL — human-readable, formally grounded schema language
  2. Parser compiles shapes into a vendor-neutral validation plan (list of Check objects)
  3. Mapping converts RDF URIs to LPG names (labels, properties, relationship types) using conventions or explicit overrides
  4. Backends translate each check into an executable query (Cypher or GQL)
  5. Runner executes queries against your database; violations are collected into a report

Quick start

uv run main.py

main.py

from graphlint.parser import parse_schema
from graphlint.backends.cypher import CypherBackend
from graphlint.runner import dry_run, execute_plan

with open("examples/movies.shacl.ttl") as f:
    schema = f.read()

plan = parse_schema(schema, source="movies.shacl.ttl")
# Or with strict mode for closed-world coverage checks:
# plan = parse_schema(schema, source="movies.shacl.ttl", strict=True)

# Dry run — see the generated queries without a database
print(dry_run(plan, CypherBackend()))

# Or execute against a live Neo4j instance
from neo4j import GraphDatabase
driver = GraphDatabase.driver("neo4j://localhost:7687", auth=("neo4j", "password"))
report = execute_plan(plan, CypherBackend(), driver, target_uri="neo4j://localhost:7687")
print(report.print_table())

Example schema (SHACL)

@prefix ex:  <http://example.org/movies#> .
@prefix sh:  <http://www.w3.org/ns/shacl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ex:MovieShape
    a sh:NodeShape ;
    sh:targetClass ex:Movie ;
    sh:property [ sh:path ex:title ;     sh:datatype xsd:string ;  sh:minCount 1 ] ;
    sh:property [ sh:path ex:released ;  sh:datatype xsd:integer ; sh:minCount 1 ] ;
    sh:property [ sh:path ex:tagline ;   sh:datatype xsd:string ] ;
    sh:property [ sh:path ex:hasActor ;  sh:nodeKind sh:IRI ; sh:node ex:PersonShape ; sh:minCount 1 ] ;
    sh:property [ sh:path ex:hasDirector ; sh:nodeKind sh:IRI ; sh:node ex:PersonShape ; sh:minCount 1 ; sh:maxCount 1 ] .

ex:PersonShape
    a sh:NodeShape ;
    sh:targetClass ex:Person ;
    sh:property [ sh:path ex:name ; sh:datatype xsd:string ; sh:minCount 1 ] ;
    sh:property [ sh:path ex:born ; sh:datatype xsd:integer ] .

This compiles into validation checks covering property existence, type constraints, allowed values, and relationship cardinality.

Strict mode

By default, graphlint only validates what your schema declares (open-world assumption). Enable strict mode to also check for things your schema doesn't mention:

Check Severity What it catches
Undeclared labels warning Node labels in the database not declared as shapes
Undeclared relationship types warning Relationship types not referenced by any shape
Undeclared properties warning Properties on declared node types not mentioned in the schema
Empty shapes warning Shapes declared in the schema with zero matching nodes
plan = parse_schema(schema, source="movies.shacl.ttl", strict=True)

In the playground, toggle the strict checkbox in the connection bar.

Playground

Interactive web UI for testing schemas against a live database:

uv run python playground.py
# Open http://127.0.0.1:8420

Features:

  • Live editing — SHACL editor with auto-compile on keystroke
  • Database connection — Connect to any Neo4j/Memgraph instance via Bolt
  • Strict mode toggle — Enable closed-world coverage checks
  • Four output tabs:
    • Checks — validation plan grouped by shape, color-coded by severity
    • Cypher — generated queries with syntax highlighting
    • Results — validation report with pass/fail/warning cards, violating node details, and vacuous check detection (skips checks when no data exists for a shape)
    • JSON — raw validation plan

Backends

Backend Status Target databases
Cypher Neo4j, Memgraph
GQL ISO GQL-compliant databases
Gremlin planned Amazon Neptune, JanusGraph

Project structure

graphlint/
├── graphlint/
│   ├── __init__.py          # Package metadata
│   ├── parser.py            # Shared types, unified entry point
│   ├── shacl_parser.py      # SHACL/Turtle → Validation Plan (IR)
│   ├── runner.py            # Execute plan, produce reports
│   └── backends/
│       ├── __init__.py      # Backend protocol
│       ├── cypher.py        # Cypher query generation
│       └── gql.py           # GQL query generation
├── examples/
│   └── movies.shacl.ttl     # Example schema (SHACL)
├── templates/
│   └── playground.html      # Playground UI template
├── playground.py             # Interactive web playground
└── tests/
    └── test_shacl_pipeline.py  # SHACL pipeline tests

Dependencies

  • rdflib — RDF graph library (SHACL parser)
  • neo4j — Neo4j driver (optional, only needed for execution)
  • fastapi, uvicorn, jinja2 — playground web UI (optional)

How is this different from neosemantics (n10s)?

Neosemantics is a Neo4j plugin that bridges RDF and property graphs — importing/exporting RDF, loading ontologies, inferencing, and validating against SHACL. Validation is roughly 15% of its surface area. Graphlint is 100% focused on schema validation.

Facet neosemantics (n10s) graphlint
Schema language SHACL SHACL
Deployment Neo4j server plugin (Java JAR) External Python tool
Database support Neo4j only Neo4j, Memgraph, ISO GQL (Gremlin planned)
RDF import/export Full (Turtle, N-Triples, RDF/XML) None
Ontology/inferencing OWL, RDFS, SKOS with class/property hierarchy reasoning None
Transactional enforcement Yes — can roll back writes that violate constraints No — read-only audit
Dry-run / CI mode No — requires running Neo4j Yes — generates queries without a database
Interactive tooling No Web playground for live schema exploration
Target audience Semantic Web practitioners adopting Neo4j Graph DB developers who want schema linting

n10s assumes you're coming from the RDF world into Neo4j. Graphlint assumes you're already in the LPG world and want to borrow SHACL's rigor without adopting the full Semantic Web stack.

Scope: what graphlint does and doesn't validate

Graphlint validates your labeled property graph against schema constraints. It answers: "does my graph data conform to these shapes?"

It does not validate that the schema you provide is itself well-formed or idiomatic. If you hand it a SHACL document with misspelled predicates or unusual patterns, graphlint will silently produce fewer checks rather than reject the input. This is a deliberate tradeoff for a POC — schema authoring validation is a solved problem (pySHACL for SHACL), and graphlint assumes your schema has already been validated through those tools or your own review.

In short:

  • In scope: LPG data ↔ schema constraint checking
  • Out of scope: schema document validation, SHACL meta-validation

Roadmap

graphlint aims to be a practical bridge between formal graph schemas and real-world graph databases. Some features are not yet implemented:

Planned

  • Gremlin backend (Amazon Neptune, JanusGraph)
  • SPARQL backend for RDF stores
  • Schema-level validation (meta-SHACL, ShExC syntax checking)
  • Complex SHACL paths (sh:alternativePath, sequence paths, sh:zeroOrMorePath, sh:oneOrMorePath)

LPG–RDF gap Some SHACL/ShEx features assume RDF semantics that don't exist natively in labeled property graphs:

Feature RDF Concept LPG Status
sh:uniqueLang Language-tagged literals No equivalent in LPG — acknowledged but not enforced
rdfs:subClassOf traversal Class hierarchies Supported when declared in the SHACL file; no runtime inference
Blank node shapes Anonymous resources LPG nodes always have identity
Named graphs / sh:shapesGraph RDF datasets Single-graph validation only

These gaps reflect fundamental modeling differences between RDF and LPG, not missing features. graphlint documents them transparently so users from either community can make informed decisions.

Status

Early prototype. The core pipeline works: parse SHACL, compile to Cypher/GQL, execute against Neo4j, produce validation reports.

License

TBD

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages