A demonstration project integrating Spring AI with Docling for intelligent PDF parsing and conversational AI capabilities. This project showcases how to build a modern RAG (Retrieval-Augmented Generation) application that can extract, process, and chat about PDF document content.
For a detailed walkthrough and explanation of this project, read the accompanying blog post:
A Practical Walkthrough for Parsing PDFs and Chatting About Their Content
- PDF Processing: Extract and parse PDF documents using Docling's MCP server
- Spring AI Integration: Leverage Spring AI's MCP client for seamless tool integration
- Conversational AI: Chat about document content using OpenAI models
- REST API: Simple HTTP endpoints for document interaction
- React Frontend: Modern UI built with React and Chakra UI
- Docker Support: Easy deployment with Docker Compose
The project consists of three main components:
-
Spring Boot Backend (Java 21)
- Spring AI MCP client integration
- REST API for document queries
- OpenAI model integration
-
Docling MCP Server (Docker)
- PDF parsing and conversion
- Document indexing and retrieval
- Exposed via MCP protocol
-
React Frontend
- Chat interface for document interaction
- Built with Chakra UI
- Java 21+
- Maven 3.6+
- Node.js 16+ (for frontend)
- Docker and Docker Compose
- OpenAI API key
-
Clone the repository
git clone <repository-url> cd spring-ai-docling
-
Set up OpenAI API key
Create a
src/main/resources/application.propertiesfile:spring.ai.openai.api-key=${OPENAI_API_KEY} spring.ai.openai.chat.options.model=gpt-4
Export your API key:
export OPENAI_API_KEY=your-api-key-here -
Start Docling MCP server
docker-compose up -d
The Docling server will be available at
http://localhost:8000 -
Place PDF documents
Add your PDF files to the
documents/directory. These will be automatically mounted into the Docker container. -
Run the Spring Boot application
./mvnw spring-boot:run
The backend API will start on
http://localhost:8080 -
Run the frontend (optional)
cd frontend npm install npm startThe React app will open at
http://localhost:3000
Send a POST request to /ai with your query:
curl -X POST http://localhost:8080/ai \
-H "Content-Type: application/json" \
-d '{
"input": "Can you extract the contents of this pdf: /data/your-document.pdf"
}'- Extract document content:
"Can you extract the contents of this pdf: /data/business-plan.pdf" - Ask about document:
"Can you translate the summary to Dutch for document <document-id>?" - List available tools:
"Can you tell me which tools are available for the MCP server docling?"
See test-calls.http for more examples.
A Python virtual environment is included for experimenting with Docling directly:
source .venv/bin/activate
pythonfrom docling.document_converter import DocumentConverter
converter = DocumentConverter()
result = converter.convert("documents/your-document.pdf")
print(result.document.export_to_markdown())- Backend: Spring Boot 3.5.8, Spring AI 1.1.0
- Frontend: React 18, Chakra UI
- AI/ML: OpenAI GPT-4, Docling
- Infrastructure: Docker, Docker Compose
- Build Tools: Maven, npm
spring-ai-docling/
├── src/main/java/ # Spring Boot application
│ └── org/rag4j/docling/
├── frontend/ # React application
├── documents/ # PDF files (mounted in Docker)
├── .venv/ # Python virtual environment
├── docker-compose.yml # Docling server configuration
├── pom.xml # Maven dependencies
└── test-calls.http # Example API calls
Contributions are welcome! This is a demonstration project, so feel free to experiment and extend it.
This project is provided as-is for educational and demonstration purposes.