This directory contains examples demonstrating the ThemisDB NLP Text Analyzer capabilities introduced in PR #317.
The NLP Text Analyzer is a lightweight, CPU-efficient NLP component for ThemisDB that provides text analysis features optimized for AQL query optimization and execution plan orchestration.
- ✅ Lightweight - No heavy ML frameworks required
- ✅ CPU-Only - Works without GPU
- ✅ Fast - Millisecond latency for typical queries
- ✅ Thread-Safe - Parallelizable for multi-query scenarios
- ✅ AQL-Optimized - Specifically designed for database query analysis
- ThemisDB server running (v1.5.0+)
- Python 3.8+ or C++ compiler (GCC 11+, Clang 15+, MSVC 19.30+)
- ThemisDB client library (for Python examples)
#include "analytics/nlp_text_analyzer.h"
using namespace themis::analytics;
// Initialize analyzer
NlpTextAnalyzer::Config config;
config.enable_stemming = true;
config.enable_stopwords = true;
config.max_keywords = 10;
NlpTextAnalyzer analyzer(config);
// Detect language
auto lang = analyzer.detectLanguage("Der schnelle braune Fuchs...");
// Returns: Language::GERMAN
// Extract keywords
auto keywords = analyzer.extractKeywords("Database query optimization...");
// Returns: ["database", "query", "optimization"]
// Analyze query complexity
auto complexity = analyzer.analyzeQueryComplexity(aql_query);// In query optimizer
auto analyzer = std::make_unique<NlpTextAnalyzer>();
auto analysis = analyzer->analyzeAqlQuery(query_text);
if (analysis.complexity > 0.8) {
// Use advanced optimization strategies
optimizer->enableAdvancedOptimizations();
}Currently, this directory serves as a placeholder for future Python and C++ examples. The NLP analyzer is integrated into ThemisDB core and used automatically by:
- AQL Query Engine - Query complexity estimation
- Query Optimizer - Semantic hints for optimization
- Index Selection - Suggesting appropriate indexes
- Query Orchestration - Planning multi-step queries
The analyzer supports the following languages:
- 🇩🇪 German (de)
- 🇬🇧 English (en)
- 🇫🇷 French (fr)
- 🇪🇸 Spanish (es)
- 🇮🇹 Italian (it)
- 🇳🇱 Dutch (nl)
| Function | Description | Use Case |
|---|---|---|
detectLanguage(text) |
Identifies text language | Multi-language query support |
extractKeywords(text, max) |
Extracts key terms | Index selection hints |
analyzeQueryComplexity(query) |
Estimates query cost | Optimization decisions |
tokenize(text) |
Splits text into tokens | Query parsing |
removeStemming(text) |
Normalizes word forms | Semantic matching |
For comprehensive documentation including architecture, API reference, performance metrics, and implementation details, see:
📖 NLP Text Analyzer Documentation
To add examples to this directory:
- Create example files (Python or C++ preferred)
- Add documentation explaining the use case
- Include sample queries and expected outputs
- Submit PR referencing the NLP feature
- Version: 1.0
- Status: ✅ Production Ready
- PR Reference: #317
- Examples: 📝 Coming soon
- Query Optimization Examples - Vector search with NLP
- Analytics Examples - Advanced analytics features
- AQL Documentation - AQL query language guide