Skip to content

Lamb-Project/DiagramLens

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Diagram Annotator for Technical Documentation

A powerful Python tool that automatically identifies, categorizes, and generates detailed technical descriptions for diagrams in markdown documentation using vision-capable Large Language Models (LLMs) through Ollama.

Since the software is intended to be used in academical settings, we use Ollama to leverage local MLLMs like qwen3-vl:32b (the tested model).

The system performs well with md. files generated with PDF OCR analyzers like https://github.com/granludo/deepseekocr-mlx (the one we tested).

By Marc Alier & Juanan Pereira https://lamb-project.org

🎯 Overview

This tool processes markdown files containing software engineering diagrams and:

  • Automatically categorizes diagrams into 35+ types (UML, C4, ERD, flowcharts, etc.)
  • Generates detailed technical descriptions tailored to each diagram type
  • Uses context-aware analysis to improve categorization accuracy
  • Produces annotated documentation with inline technical descriptions
  • Creates comprehensive summaries of all diagrams found

✨ Key Features

Context-Aware Categorization

  • Analyzes surrounding text to predict diagram types before visual inspection
  • Combines textual context with visual analysis for higher accuracy
  • Tracks prediction accuracy to measure context usefulness

Extensive Diagram Support

Supports 35+ diagram types including:

  • UML Diagrams: Class, Sequence, Use Case, State, Activity, Component, etc.
  • Architecture: C4 Model, System Architecture, Cloud Architecture, Microservices
  • Data Modeling: ERD, Database Schema, Data Flow Diagrams
  • Process: Flowcharts, BPMN, Gantt Charts
  • Technical: Network Diagrams, Git Workflows, API Specifications
  • Design: UI Mockups, Wireframes
  • Analysis: Decision Trees, Fault Trees, Mind Maps

Intelligent Description Generation

  • Custom prompts for each diagram type focusing on relevant details
  • Structured analysis based on diagram-specific elements
  • Technical accuracy in terminology and notation identification

πŸ“‹ Requirements

  • Python 3.8+
  • Ollama installed and running locally
  • A vision-capable model installed in Ollama (e.g., qwen2-vl:7b, llava, bakllava)
  • uv for dependency management (recommended)

πŸš€ Installation

1. Install Ollama

# macOS/Linux
curl -fsSL https://ollama.ai/install.sh | sh

# Start Ollama service
ollama serve

2. Pull a Vision Model

# Recommended: Qwen2-VL (good balance of quality and speed)
ollama pull qwen2-vl:7b

# Alternative options:
# ollama pull llava:13b
# ollama pull bakllava

3. Install Python Dependencies

# Using uv (recommended)
uv add requests pillow rich

# Or using pip
pip install requests pillow rich

πŸ’» Usage

Basic Usage

uv run annotate_images_enhanced.py \
    --input docs/architecture.md \
    --output docs/architecture_annotated.md \
    --summary docs/diagram_summary.md \
    --categories image_categories_enhanced.json \
    --model qwen3-vl:8b

Advanced Options

uv run annotate_images_enhanced.py \
    --input docs/architecture.md \
    --output docs/architecture_annotated.md \
    --summary docs/diagram_summary.md \
    --categories image_categories_enhanced.json \
    --model qwen3-vl:32b \
    --context-size 750 \    # Amount of surrounding text to analyze
    --verbose              # Show detailed progress

Command-Line Arguments

Argument Description Required Default
--input Path to source markdown file Yes -
--output Path for annotated markdown output Yes -
--summary Path for diagram summary output Yes -
--categories JSON file with diagram categories Yes -
--model Ollama vision model to use No qwen2-vl:7b
--context-size Characters of context to analyze No 500
--verbose Show detailed progress No False

πŸ“ Project Structure

diagram-annotator/
β”œβ”€β”€ annotate_images_enhanced.py    # Main script
β”œβ”€β”€ image_categories_enhanced.json # Diagram categories & prompts
β”œβ”€β”€ README.md                       # This file
β”œβ”€β”€ examples/                       # Example documents
β”‚   β”œβ”€β”€ input/                     # Sample markdown files
β”‚   └── output/                    # Generated outputs
└── tests/                         # Test documents

πŸ”§ Configuration

Customizing Categories

Edit image_categories_enhanced.json to:

  • Add new diagram types
  • Modify categorization prompts
  • Adjust context indicators
  • Customize description generation prompts

Example structure:

{
  "categories": ["class diagram", "sequence diagram", ...],
  "category_prompts": {
    "class diagram": {
      "prompt": "Describe this UML Class Diagram...",
      "focus_areas": ["classes", "methods", ...],
      "keywords": ["class", "inheritance", ...]
    }
  },
  "context_indicators": {
    "class diagram": ["UML", "inheritance", "class", ...]
  }
}

Model Selection

Different models offer different trade-offs:

Model Quality Speed Memory Best For
qwen2-vl:7b Good Fast 8GB General use
qwen2-vl:72b Excellent Slow 40GB+ High accuracy
llava:13b Good Medium 16GB Balanced
bakllava Fair Fast 8GB Quick processing

πŸ“Š Output Examples

Annotated Markdown

The tool inserts technical descriptions after each diagram:

![System Architecture](diagrams/architecture.png)

**Diagram Type:** Architecture Diagram

**Technical Description:**
This architecture diagram shows a microservices-based system with:
1. API Gateway serving as the entry point
2. Three microservices: User Service, Order Service, Payment Service
3. PostgreSQL database for User Service
4. MongoDB for Order Service
5. Redis cache layer
6. RabbitMQ message broker for inter-service communication
7. All services deployed in Docker containers
...

Summary Document

Generates a comprehensive summary with:

  • Total diagram count
  • Category distribution statistics
  • Context prediction accuracy
  • Detailed entry for each diagram with description

🎯 Use Cases

  • Documentation Generation: Automatically document existing diagrams
  • Documentation Validation: Verify diagrams match their descriptions
  • Knowledge Extraction: Extract technical details from visual documentation
  • Accessibility: Generate text descriptions for screen readers
  • Documentation Migration: Convert visual-heavy docs to text-searchable format
  • Quality Assurance: Ensure diagram completeness and clarity

License

This project is licensed under the GNU General Public License v3.0. See the full license text in the LICENSE file.

For a concise summary of the GPL‑3.0 terms, you can also refer to the SPDX license identifier.

πŸ› Troubleshooting

Common Issues

Ollama Connection Error

# Check if Ollama is running
curl http://localhost:11434/api/tags

# Start Ollama if needed
ollama serve

Model Not Found

# List available models
ollama list

# Pull the required model
ollama pull qwen2-vl:7b

Image Processing Errors

  • Ensure images are in supported formats (PNG, JPG, GIF, WebP)
  • Check image file sizes (default limit: 5MB)
  • Verify image paths are relative to the markdown file

Low Accuracy

  • Try a larger model (e.g., qwen3-vl:72b)
  • Increase context size with --context-size 1000
  • Ensure diagram images are clear and high-resolution

🀝 Contributing

Contributions are welcome! Areas for improvement:

  1. Additional Diagram Types: Add support for more specialized diagrams
  2. Improved Prompts: Refine categorization and description prompts
  3. Performance Optimization: Batch processing, caching
  4. Output Formats: Support for different output formats (HTML, PDF)
  5. Integration: GitHub Actions, documentation pipelines

πŸ“„ License

This software is licensed GPL 3.0 (c) Marc Alier, Juanan Pereira LAMB project https://lamb-project.org Universitat Politècnica de Catalunya (www.upc.edu) Universidad del Pais Vasco / Euskal Herriko Universitea (www.ehu.eus)

πŸ™ Acknowledgments

  • Built with Ollama for local LLM inference
  • Uses vision models like Qwen3-VL
  • Grial Research Group - Universidad de Salamanca

πŸ“§ Support

For issues, questions, or suggestions:

  • Open an issue on GitHub
  • Check existing issues for solutions
  • Consult the troubleshooting section

Note: This tool requires significant computational resources for vision model inference. Performance will vary based on your hardware capabilities and chosen model size.

About

Examines and provides descriptions of technical diagrams refered in a .md document

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published