Skip to content

Miskler/genschema

Repository files navigation

🔍 genschema

logo.webp

A powerful, intelligent library for generating JSON Schema from multiple JSON instances with smart merging, advanced inference, and modular refinements.

Tests Coverage Python PyPI - Package Version License BlackCode mypy

⭐ Star us on GitHub | 📚 Read the Docs | 🐛 Report Bug

✨ Features

  • 🎯 Intelligent Merging – Combines multiple JSON instances into a single schema
  • 🔗 Configurable Combinators – Use anyOf or oneOf for conflicting types/properties
  • 🧠 Advanced Inference – Automatic format detection (email, uuid, date-time, etc.)
  • 📍 Required & Empty Handling – Smart inference of required, minProperties, minItems, etc.
  • 🔍 Pseudo-Array Detection – Treats inhomogeneous arrays as object-like structures when needed
  • Modular Pipeline – Chain of configurable comparators for full control
  • 🛠️ CLI & Python API – Flexible usage from command line or code
  • 📝 Rich Output – Colored console feedback with timing and instance count

🚀 Quick Start

Installation

pip install genschema

30-Second Python Example

from genschema import Converter, PseudoArrayHandler
from genschema.comparators import (
    FormatComparator,
    RequiredComparator,
    EmptyComparator,
    DeleteElement,
)

conv = Converter(
    pseudo_handler=PseudoArrayHandler(),
    base_of="anyOf",  # or "oneOf"
)

# Add JSON data (files, dicts, or existing schemas)
conv.add_json("example1.json")
conv.add_json("example2.json")
conv.add_json({"name": "Alice", "email": "alice@example.com"})

# Register optional refinements
conv.register(FormatComparator())
conv.register(RequiredComparator())
conv.register(EmptyComparator())
conv.register(DeleteElement())
conv.register(DeleteElement("isPseudoArray"))

# Generate schema
result = conv.run()

print(result)  # Pretty-printed JSON Schema

CLI Usage

# Basic: single or multiple files
genschema input1.json input2.json -o schema.json

# Use oneOf instead of anyOf
genschema *.json --base-of oneOf -o schema.json

# Disable refinements
genschema data.json --no-format --no-required --no-pseudo-array

# Read from stdin
cat data.json | genschema - -o schema.json

📊 Comparison with GenSON

Feature genschema GenSON
Multiple Instance Merging Yes Yes
Variant Type Handling Configurable anyOf or oneOf anyOf only
Format Inference Yes (email, date-time, uuid, uri, etc.) No
Required Properties Configurable inference Yes (present in all objects)
Empty/Min-Max Handling Yes (minProperties, minItems, etc.) Limited
Pseudo-Array Detection Yes No
Modular Extensions Comparator pipeline (easy to add/remove) SchemaStrategy subclasses
CLI Support Full-featured with rich output Basic (genson)
Performance (avg. benchmark) ~2.1× slower Faster

Note: Performance measured on static datasets of varying complexity. genschema prioritizes richer inference and flexibility over raw speed.

🏗️ Architecture

Modular pipeline design for clean, extensible code:

┌─────────────────┐      ┌─────────────────┐
│   Input JSONs   │      │  Input Schemas  │
└─────────────────┘      └─────────────────┘
         │                       │
         └──────────┬────────────┘
                    ▼
            ┌───────────────┐
            │ Pipeline Run  │
            └───────────────┘
                    ▼
         ┌───────────────────┐
         │  Process Layer    │◀─────┐
         └───────────────────┘      │
                    │               │
                    ▼               │
        ┌─────────────────────┐     │
        │ Comparators Chain   │─────┘
        └─────────────────────┘
                    │
                    ▼
            ┌───────────────┐
            │    Result     │
            └───────────────┘

🛠️ Development

Setup

git clone https://github.com/Miskler/genschema.git
cd genschema
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -e ".[dev]"    # or make install-dev if Makefile exists

Common Commands

make test          # Run tests with coverage
make lint          # Lint code
make type-check    # mypy checking
make format        # Format with black
make docs          # Build documentation

📚 Documentation

🤝 Contributing

We welcome contributions!

Fork the repository, create a feature branch, and submit a pull request.
Ensure tests pass and code follows black/mypy style.

make test
make lint
make type-check

📄 License

AGPL-3.0 License – see LICENSE file for details.

Made with ❤️ for developers working with evolving JSON data

About

A powerful, intelligent library for generating JSON Schema from multiple JSON instances with smart merging, advanced inference, and modular refinements.

Topics

Resources

License

Stars

Watchers

Forks

Contributors