ThemisDB Query Expert Certification (TQE)

Certification Overview

The ThemisDB Query Expert Certification (TQE) is an advanced certification that validates your expertise in writing complex, optimized queries across ThemisDB's multi-model architecture. This certification demonstrates that you can design efficient query strategies, optimize performance, implement advanced graph traversals, and leverage vector similarity search for AI/ML workloads.

Certification Details

Certification Code: TQE
Level: Advanced
Duration: 120 minutes (exam) + project submission
Question Count: 30-35 questions + hands-on project
Question Types: Multiple choice, scenario-based, and practical coding
Passing Score: 75% (23/30 minimum on exam + passing project)
Validity: 2 years
Prerequisites: ThemisDB Fundamentals Certification (TDF)
Exam Fee: $250 USD
Retake Fee: $125 USD
Language: English

Target Audience

This certification is ideal for:

Senior Software Developers building complex applications
Data Engineers designing data pipelines
Backend Developers optimizing database queries
Application Architects designing query strategies
Analytics Engineers building reporting systems
Performance Engineers tuning query performance
ML Engineers implementing vector search
API Developers optimizing data access layers

Prerequisites

Required Certification

ThemisDB Fundamentals Certification (TDF) - Must be current (not expired)

Technical Prerequisites

6-12 months hands-on experience with ThemisDB
Strong understanding of AQL fundamentals
Experience with query optimization concepts
Familiarity with index structures
Understanding of graph algorithms (basic)
Knowledge of vector embeddings (for AI/ML sections)

Recommended Experience

Built at least one production application using ThemisDB
Optimized queries for performance
Worked with multi-model data patterns
Experience with query profiling and debugging

Learning Objectives

Upon completing this certification, you will be able to:

1. Advanced AQL Techniques (25%)

Write complex nested queries and subqueries
Use window functions and CTEs
Implement recursive queries
Apply advanced aggregation patterns
Master query composition and reusability
Utilize dynamic query generation

2. Graph Traversal and Algorithms (20%)

Implement shortest path algorithms
Perform depth-first and breadth-first traversals
Calculate graph metrics (centrality, clustering)
Use pattern matching in graphs
Optimize graph query performance
Handle large-scale graph analytics

3. Vector Similarity Search (15%)

Understand vector embedding concepts
Implement k-NN searches
Use approximate nearest neighbor (ANN) algorithms
Optimize vector index configuration
Apply similarity metrics (cosine, euclidean)
Integrate with ML/AI pipelines

4. Query Optimization (20%)

Analyze query execution plans
Identify performance bottlenecks
Apply optimization techniques
Understand optimizer hints
Optimize joins and subqueries
Reduce query complexity

5. Index Strategies (15%)

Design effective indexing strategies
Use composite and partial indexes
Implement full-text search indexes
Configure vector indexes
Understand index trade-offs
Monitor index performance

6. Performance Tuning (5%)

Profile query performance
Use caching strategies
Optimize batch operations
Handle large result sets
Minimize network latency
Apply query result pagination

Advanced AQL Topics

Complex Queries and Subqueries

Correlated Subqueries

-- Find customers who ordered above their average
SELECT c.customer_id, c.name, o.order_total
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_total > (
    SELECT AVG(o2.order_total)
    FROM orders o2
    WHERE o2.customer_id = c.customer_id
)
ORDER BY o.order_total DESC;

Common Table Expressions (CTEs)

WITH regional_sales AS (
    SELECT region, SUM(amount) as total_sales
    FROM orders
    GROUP BY region
),
top_regions AS (
    SELECT region
    FROM regional_sales
    WHERE total_sales > (SELECT AVG(total_sales) FROM regional_sales)
)
SELECT r.region, r.total_sales, p.product_name, SUM(o.amount) as product_sales
FROM top_regions t
INNER JOIN regional_sales r ON t.region = r.region
INNER JOIN orders o ON r.region = o.region
INNER JOIN products p ON o.product_id = p.product_id
GROUP BY r.region, r.total_sales, p.product_name
ORDER BY r.region, product_sales DESC;

Recursive Queries

-- Find organizational hierarchy
WITH RECURSIVE emp_hierarchy AS (
    -- Base case: top-level managers
    SELECT employee_id, name, manager_id, 1 as level
    FROM employees
    WHERE manager_id IS NULL
    
    UNION ALL
    
    -- Recursive case: subordinates
    SELECT e.employee_id, e.name, e.manager_id, eh.level + 1
    FROM employees e
    INNER JOIN emp_hierarchy eh ON e.manager_id = eh.employee_id
)
SELECT * FROM emp_hierarchy
ORDER BY level, name;

Window Functions

-- Calculate running totals and rankings
SELECT 
    order_date,
    customer_id,
    order_total,
    -- Running total
    SUM(order_total) OVER (
        PARTITION BY customer_id 
        ORDER BY order_date
    ) as running_total,
    -- Ranking within customer
    ROW_NUMBER() OVER (
        PARTITION BY customer_id 
        ORDER BY order_total DESC
    ) as order_rank,
    -- Moving average (last 3 orders)
    AVG(order_total) OVER (
        PARTITION BY customer_id 
        ORDER BY order_date 
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) as moving_avg
FROM orders
ORDER BY customer_id, order_date;

Advanced Aggregations

-- Multi-level aggregation with GROUPING SETS
SELECT 
    region,
    category,
    product_name,
    SUM(sales) as total_sales,
    COUNT(*) as order_count
FROM sales_data
GROUP BY GROUPING SETS (
    (region, category, product_name),  -- Detailed
    (region, category),                -- By category in region
    (region),                          -- By region
    ()                                 -- Grand total
)
ORDER BY region, category, product_name;

-- Using ROLLUP for hierarchical totals
SELECT 
    year,
    quarter,
    month,
    SUM(revenue) as total_revenue
FROM financial_data
GROUP BY ROLLUP(year, quarter, month);

-- Using CUBE for all combinations
SELECT 
    product_category,
    customer_segment,
    sales_channel,
    SUM(revenue) as total_revenue
FROM sales
GROUP BY CUBE(product_category, customer_segment, sales_channel);

Dynamic Query Construction

-- Build queries dynamically with parameters
LET filters = {
    min_price: 100,
    max_price: 500,
    categories: ["electronics", "computers"],
    in_stock: true
};

FOR product IN products
    FILTER product.price >= filters.min_price
    FILTER product.price <= filters.max_price
    FILTER product.category IN filters.categories
    FILTER filters.in_stock ? product.stock > 0 : true
    SORT product.price ASC
    RETURN product;

Graph Traversal Algorithms

Shortest Path

-- Find shortest path between two nodes
FOR vertex, edge, path IN OUTBOUND SHORTEST_PATH
    'persons/alice' TO 'persons/bob'
    GRAPH 'social_network'
    RETURN {
        vertices: path.vertices[*].name,
        edges: path.edges[*].type,
        distance: LENGTH(path.edges)
    };

-- All shortest paths with weights
FOR path IN OUTBOUND K_SHORTEST_PATHS
    'locations/start' TO 'locations/end'
    GRAPH 'road_network'
    OPTIONS {
        weightAttribute: 'distance',
        defaultWeight: 1
    }
    LIMIT 5
    RETURN {
        path: path.vertices[*].name,
        total_distance: SUM(path.edges[*].distance)
    };

Graph Traversal

-- Depth-first traversal with pruning
FOR vertex, edge, path IN 1..5 OUTBOUND
    'products/laptop'
    GRAPH 'recommendations'
    OPTIONS {
        order: "dfs",
        uniqueVertices: "global"
    }
    PRUNE edge.strength < 0.5
    FILTER vertex.category == "electronics"
    RETURN DISTINCT vertex;

-- Breadth-first with path tracking
FOR vertex, edge, path IN 1..3 ANY
    'persons/john'
    GRAPH 'social_network'
    OPTIONS {
        order: "bfs",
        uniqueVertices: "path"
    }
    RETURN {
        person: vertex.name,
        path_length: LENGTH(path.edges),
        connection_types: path.edges[*].type
    };

Pattern Matching

-- Find triangles in graph (A->B, B->C, C->A)
FOR v1 IN persons
    FOR v2 IN OUTBOUND v1 knows
        FOR v3 IN OUTBOUND v2 knows
            FILTER v3 == v1
            RETURN DISTINCT {
                triangle: [v1.name, v2.name, v3.name]
            };

-- Find influential nodes (high degree centrality)
FOR vertex IN persons
    LET out_degree = LENGTH(
        FOR v IN OUTBOUND vertex knows RETURN 1
    )
    LET in_degree = LENGTH(
        FOR v IN INBOUND vertex knows RETURN 1
    )
    SORT (out_degree + in_degree) DESC
    LIMIT 10
    RETURN {
        name: vertex.name,
        connections: out_degree + in_degree,
        out_connections: out_degree,
        in_connections: in_degree
    };

Graph Analytics

-- Calculate PageRank
LET graph_data = (
    FOR vertex IN vertices
        LET outbound = (
            FOR v IN OUTBOUND vertex edges
                RETURN v._id
        )
        RETURN {
            id: vertex._id,
            outbound: outbound
        }
)

-- Clustering coefficient
FOR vertex IN social_network_vertices
    LET neighbors = (
        FOR v IN ANY vertex social_network_edges
            RETURN v._id
    )
    LET neighbor_connections = (
        FOR n1 IN neighbors
            FOR n2 IN neighbors
                FILTER n1 != n2
                FOR edge IN social_network_edges
                    FILTER (edge._from == n1 AND edge._to == n2)
                        OR (edge._from == n2 AND edge._to == n1)
                    RETURN 1
    )
    LET possible_connections = LENGTH(neighbors) * (LENGTH(neighbors) - 1)
    RETURN {
        vertex: vertex._id,
        clustering_coefficient: possible_connections > 0 
            ? LENGTH(neighbor_connections) / possible_connections 
            : 0
    };

Vector Similarity Search

Vector Embeddings Basics

-- Store document with vector embedding
INSERT INTO documents {
    title: "Machine Learning Basics",
    content: "Introduction to ML concepts...",
    vector: [0.23, 0.45, 0.67, 0.12, 0.89, ...],  // 768-dim embedding
    created_at: NOW()
};

-- Create vector index
CREATE INDEX vector_idx ON documents (vector)
OPTIONS {
    type: "vector",
    dimensions: 768,
    metric: "cosine",
    algorithm: "hnsw",
    ef_construction: 200,
    m: 16
};

K-Nearest Neighbors Search

-- Find similar documents
LET query_vector = [0.25, 0.43, 0.65, 0.15, 0.87, ...]

FOR doc IN documents
    LET similarity = COSINE_SIMILARITY(doc.vector, query_vector)
    SORT similarity DESC
    LIMIT 10
    RETURN {
        title: doc.title,
        similarity: similarity,
        content: SUBSTRING(doc.content, 0, 200)
    };

-- Using vector index for faster search
FOR doc IN documents
    VECTOR_SEARCH(doc.vector, query_vector, 10)
    RETURN {
        title: doc.title,
        distance: VECTOR_DISTANCE(doc.vector, query_vector, "cosine"),
        content: doc.content
    };

Similarity Metrics

-- Cosine similarity (normalized dot product)
LET cosine_sim = COSINE_SIMILARITY(vec1, vec2)

-- Euclidean distance
LET euclidean_dist = EUCLIDEAN_DISTANCE(vec1, vec2)

-- Manhattan distance
LET manhattan_dist = MANHATTAN_DISTANCE(vec1, vec2)

-- Dot product
LET dot_prod = DOT_PRODUCT(vec1, vec2)

-- Combined similarity with filters
FOR doc IN documents
    LET similarity = COSINE_SIMILARITY(doc.vector, query_vector)
    FILTER similarity > 0.8
    FILTER doc.category == "technology"
    FILTER doc.published_date > DATE_SUB(NOW(), INTERVAL 30 DAY)
    SORT similarity DESC
    LIMIT 20
    RETURN doc;

Hybrid Search (Vector + Full-Text)

-- Combine vector similarity with keyword search
LET query_vector = EMBED_TEXT("machine learning algorithms")
LET keyword_results = (
    FOR doc IN FULLTEXT(documents, "content", "machine learning")
        RETURN doc
)
LET vector_results = (
    FOR doc IN documents
        VECTOR_SEARCH(doc.vector, query_vector, 50)
        RETURN {doc: doc, vector_score: COSINE_SIMILARITY(doc.vector, query_vector)}
)

// Combine and re-rank
FOR result IN UNION(keyword_results, vector_results)
    LET keyword_score = result IN keyword_results ? 1 : 0
    LET vector_score = result.vector_score || 0
    LET combined_score = 0.6 * vector_score + 0.4 * keyword_score
    SORT combined_score DESC
    LIMIT 10
    RETURN {
        title: result.doc.title,
        combined_score: combined_score,
        vector_score: vector_score,
        has_keywords: keyword_score > 0
    };

Query Optimization

Execution Plan Analysis

-- View execution plan
EXPLAIN
SELECT c.name, COUNT(o.id) as order_count
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
WHERE c.region = "North"
GROUP BY c.id, c.name
HAVING order_count > 10;

-- Returns plan showing:
-- - Index usage
-- - Join strategy
-- - Filter application order
-- - Estimated rows at each stage

Optimization Techniques

Index Usage

-- Bad: Full table scan
SELECT * FROM orders WHERE customer_id = 123;

-- Good: Uses index
CREATE INDEX idx_customer ON orders (customer_id);
SELECT * FROM orders WHERE customer_id = 123;

-- Better: Covering index
CREATE INDEX idx_customer_date ON orders (customer_id, order_date, total);
SELECT order_date, total FROM orders WHERE customer_id = 123;

Join Optimization

-- Bad: Multiple separate queries
LET customer = DOCUMENT("customers/123")
LET orders = (FOR o IN orders FILTER o.customer_id == 123 RETURN o)
LET products = (FOR o IN orders FOR p IN products 
                FILTER p.id == o.product_id RETURN p)

-- Good: Single optimized query with proper joins
FOR c IN customers
    FILTER c._id == "customers/123"
    FOR o IN orders
        FILTER o.customer_id == c.id
        FOR p IN products
            FILTER p.id == o.product_id
            RETURN {customer: c, order: o, product: p}

Subquery Optimization

-- Bad: Correlated subquery in SELECT
SELECT 
    c.name,
    (SELECT COUNT(*) FROM orders WHERE customer_id = c.id) as order_count
FROM customers c;

-- Good: Use JOIN and GROUP BY
SELECT c.name, COUNT(o.id) as order_count
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name;

Filter Early

-- Bad: Filter after joins
SELECT *
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN products p ON o.product_id = p.id
WHERE c.region = "North" AND o.order_date > '2024-01-01';

-- Good: Filter before joins
SELECT *
FROM (SELECT * FROM customers WHERE region = "North") c
JOIN (SELECT * FROM orders WHERE order_date > '2024-01-01') o 
    ON c.id = o.customer_id
JOIN products p ON o.product_id = p.id;

Optimizer Hints

-- Force index usage
SELECT /*+ INDEX(orders idx_customer_date) */
    customer_id, order_date, total
FROM orders
WHERE customer_id = 123 AND order_date > '2024-01-01';

-- Control join order
SELECT /*+ LEADING(c, o, p) */
    c.name, o.total, p.name
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN products p ON o.product_id = p.id;

-- Disable specific optimization
SELECT /*+ NO_MERGE(subq) */
    *
FROM (SELECT * FROM large_table WHERE condition) subq;

Index Strategies

Index Types

B-Tree Index (Default)

-- Single column
CREATE INDEX idx_email ON users (email);

-- Composite index (order matters!)
CREATE INDEX idx_region_status ON customers (region, status);

-- Best for: equality, range queries, sorting

Hash Index

-- Optimized for equality only
CREATE INDEX idx_user_hash ON sessions (user_id)
OPTIONS {type: "hash"};

-- Best for: exact match lookups
-- Not suitable for: range queries, sorting

Full-Text Index

-- Text search index
CREATE INDEX idx_content_fulltext ON articles (title, content)
OPTIONS {
    type: "fulltext",
    analyzer: "text_en",
    features: ["frequency", "position"]
};

-- Usage
FOR doc IN FULLTEXT(articles, "content", "query terms")
    RETURN doc;

Geospatial Index

-- For location queries
CREATE INDEX idx_location ON stores (location)
OPTIONS {type: "geo"};

-- Find nearby stores
FOR store IN stores
    FILTER GEO_DISTANCE(store.location, [lat, lng]) < 5000  // 5km
    RETURN store;

Vector Index

-- For similarity search
CREATE INDEX idx_embedding ON documents (embedding)
OPTIONS {
    type: "vector",
    dimensions: 768,
    metric: "cosine",
    algorithm: "hnsw",  // Hierarchical Navigable Small World
    ef_construction: 200,
    m: 16
};

Partial Indexes

-- Index only subset of data
CREATE INDEX idx_active_users ON users (last_login)
WHERE active = true;

-- Smaller index, faster queries for active users
SELECT * FROM users 
WHERE active = true AND last_login > '2024-01-01';

Index Maintenance

-- Analyze index usage
SHOW INDEX STATISTICS FOR TABLE orders;

-- Rebuild fragmented index
REBUILD INDEX idx_customer_date ON orders;

-- Drop unused index
DROP INDEX idx_old_unused ON orders;

Index Trade-offs

Aspect	Benefit	Cost
Write Performance	-	Slower inserts/updates
Read Performance	Much faster queries	-
Storage	-	Additional disk space
Memory	Cached indexes speed queries	Higher memory usage
Maintenance	-	Requires monitoring & tuning

Hands-On Labs

Lab 1: Complex Query Optimization (4 hours)

Objective: Optimize a slow-running analytics query

Scenario: E-commerce platform with 10M orders, 1M customers

Tasks:

Analyze the provided slow query execution plan
Identify bottlenecks (missing indexes, inefficient joins)
Create appropriate indexes
Rewrite query for better performance
Measure improvement (target: 10x faster)

Deliverables:

Original vs optimized query comparison
Execution plan analysis
Index creation scripts
Performance metrics

Lab 2: Graph Traversal (3 hours)

Objective: Implement recommendation engine using graph traversals

Scenario: Social network with friend connections and product purchases

Tasks:

Build friend-of-friend recommendation query
Implement collaborative filtering (bought together)
Calculate influence scores
Find shortest connection path between users
Optimize graph queries for 1M+ nodes

Deliverables:

Graph traversal queries
Performance benchmarks
Recommendation results

Lab 3: Vector Similarity Search (4 hours)

Objective: Build semantic search for document collection

Scenario: 100K technical documents with embeddings

Tasks:

Create and configure vector index
Implement k-NN search
Build hybrid search (vector + full-text)
Optimize for query latency (<100ms)
Handle relevance ranking

Deliverables:

Vector search implementation
Hybrid search queries
Performance tuning results
Search quality metrics

Lab 4: Multi-Model Integration (5 hours)

Objective: Build complex application using all models

Scenario: IoT platform with devices, time-series data, relationships

Tasks:

Design schema across document, graph, time-series
Write queries joining multiple models
Implement real-time analytics
Optimize for high write throughput
Build dashboard queries

Deliverables:

Schema design document
Multi-model queries
Performance optimization report

Sample Exam Questions

Section 1: Advanced AQL

Question 1: What is the main advantage of using a CTE over a subquery?

A) CTEs are always faster
B) CTEs can be referenced multiple times
C) CTEs use less memory
D) CTEs are required for joins

Answer: B

Question 2: Which window function would you use to calculate a running total?

A) ROW_NUMBER()
B) RANK()
C) SUM() OVER()
D) COUNT() OVER()

Answer: C

Question 3: What does GROUPING SETS allow that GROUP BY doesn't?

A) Faster aggregation
B) Multiple grouping combinations in one query
C) Better index usage
D) Parallel execution

Answer: B

Question 4: In a recursive CTE, what prevents infinite loops?

A) Automatic depth limit
B) Cycle detection in the recursive term
C) Maximum iteration configuration
D) All of the above

Answer: D

Question 5: When should you use a correlated subquery?

A) Never, they're always slow
B) When the subquery depends on the outer query
C) Only for EXISTS checks
D) When joining is not possible

Answer: B

Section 2: Graph Algorithms

Question 6: Which algorithm finds the shortest path in a weighted graph?

A) Depth-First Search
B) Breadth-First Search
C) Dijkstra's Algorithm
D) PageRank

Answer: C

Question 7: What is the time complexity of BFS in a graph with V vertices and E edges?

A) O(V)
B) O(E)
C) O(V + E)
D) O(V * E)

Answer: C

Question 8: What does the PRUNE keyword do in a graph traversal?

A) Deletes vertices
B) Stops traversal down a path
C) Removes duplicate results
D) Optimizes the query

Answer: B

Question 9: Which centrality measure counts the shortest paths through a vertex?

A) Degree Centrality
B) Closeness Centrality
C) Betweenness Centrality
D) PageRank

Answer: C

Question 10: What is the purpose of uniqueVertices: "path" option?

A) Ensures global uniqueness
B) Allows vertex repetition in different paths
C) Prevents cycles within a single path
D) Improves performance

Answer: C

Section 3: Vector Search

Question 11: What does cosine similarity measure?

A) Euclidean distance between vectors
B) Angle between vectors (direction similarity)
C) Manhattan distance
D) Vector magnitude difference

Answer: B

Question 12: Why use approximate nearest neighbor (ANN) instead of exact k-NN?

A) Better accuracy
B) Simpler implementation
C) Much faster for large datasets
D) Uses less memory

Answer: C

Question 13: What is HNSW in vector indexing?

A) A distance metric
B) An approximate nearest neighbor algorithm
C) A vector dimension
D) A similarity function

Answer: B

Question 14: When would you use Manhattan distance over cosine similarity?

A) For normalized vectors
B) For directional similarity
C) For coordinate-based data
D) Never, cosine is always better

Answer: C

Question 15: What is the main trade-off when increasing the 'ef' parameter in HNSW?

A) Accuracy vs build time
B) Accuracy vs query time
C) Memory vs accuracy
D) Build time vs query time

Answer: B

Section 4: Query Optimization

Question 16: What does EXPLAIN show you?

A) Query results
B) Query execution plan
C) Index definitions
D) Table schema

Answer: B

Question 17: Why are covering indexes beneficial?

A) They use less disk space
B) They're faster to create
C) Query doesn't need to access table data
D) They work for all queries

Answer: C

Question 18: What is a join selectivity?

A) Which join algorithm is used
B) The order of tables in the join
C) The percentage of rows that match
D) The number of joins in a query

Answer: C

Question 19: When should you use query hints?

A) Always, they improve all queries
B) Never, the optimizer knows best
C) When optimizer makes suboptimal choices
D) Only for simple queries

Answer: C

Question 20: What is predicate pushdown?

A) Moving filters before joins
B) Pushing queries to replicas
C) Distributing queries across shards
D) Caching filter results

Answer: A

Section 5: Index Strategies

Question 21: For a composite index on (A, B, C), which queries can use it?

A) Only queries filtering on all three
B) Queries filtering on A, or A+B, or A+B+C
C) Any queries using A, B, or C
D) Only queries filtering on A

Answer: B

Question 22: When should you use a hash index?

A) For range queries
B) For exact equality lookups
C) For sorting
D) For full-text search

Answer: B

Question 23: What is a partial index?

A) An incomplete index
B) An index on part of a column
C) An index with a WHERE clause
D) A temporary index

Answer: C

Question 24: Why might you have multiple indexes on the same column?

A) Faster queries
B) Redundancy for safety
C) Different index types for different query patterns
D) It's not allowed

Answer: C

Question 25: What happens when you have too many indexes?

A) Queries become faster
B) Writes become slower
C) Better memory usage
D) Automatic optimization

Answer: B

Scenario-Based Questions

Question 26: You have a query joining 4 large tables that takes 30 seconds. The execution plan shows sequential scans on all tables. What should you do first?

A) Add more RAM
B) Rewrite the query
C) Add appropriate indexes on join columns
D) Use query hints

Answer: C

Question 27: Your graph traversal query for 3-hop friend recommendations times out. How do you optimize it?

A) Increase the traversal depth
B) Add PRUNE conditions and use uniqueVertices
C) Remove all filters
D) Use a different graph

Answer: B

Question 28: Vector similarity search returns results in 2 seconds but you need <100ms. What's the best approach?

A) Buy faster hardware
B) Reduce vector dimensions
C) Use approximate nearest neighbor (ANN) index
D) Cache all results

Answer: C

Question 29: A query performs well with 1000 rows but slows dramatically with 1M rows. Why?

A) Bad code
B) Algorithmic complexity (e.g., O(n²) operation)
C) Network latency
D) Disk failure

Answer: B

Question 30: Your analytics query needs data from document, graph, and time-series models. How do you structure it?

A) Three separate queries
B) Single query with multi-model joins
C) Materialize views for each model first
D) Use only one model

Answer: B

Practical Coding Questions

Question 31: Write a query to find the top 10 customers by total order value in the last 30 days, including their order count and average order value.

Solution:

SELECT 
    c.customer_id,
    c.name,
    COUNT(o.id) as order_count,
    SUM(o.total) as total_value,
    AVG(o.total) as avg_order_value
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
WHERE o.order_date >= DATE_SUB(NOW(), INTERVAL 30 DAY)
GROUP BY c.customer_id, c.name
ORDER BY total_value DESC
LIMIT 10;

Question 32: Write a graph traversal query to find all products that customers who bought product X also bought, ranked by frequency.

Solution:

FOR order1 IN orders
    FILTER order1.product_id == "products/X"
    FOR order2 IN orders
        FILTER order2.customer_id == order1.customer_id
        FILTER order2.product_id != "products/X"
        COLLECT product_id = order2.product_id 
        WITH COUNT INTO frequency
        SORT frequency DESC
        LIMIT 10
        FOR product IN products
            FILTER product._id == product_id
            RETURN {
                product: product,
                frequency: frequency
            };

Question 33: Optimize this slow query:

SELECT * FROM users WHERE email LIKE '%@gmail.com';

Solution:

-- Add computed column and index
ALTER TABLE users ADD COLUMN email_domain VARCHAR GENERATED ALWAYS AS (
    SUBSTRING(email, POSITION('@' IN email) + 1)
);

CREATE INDEX idx_email_domain ON users (email_domain);

-- Optimized query
SELECT * FROM users WHERE email_domain = 'gmail.com';

Question 34: Write a query using vector similarity to find documents similar to a given document, but only in the same category and published in the last year.

Solution:

LET target_doc = DOCUMENT("documents/123")

FOR doc IN documents
    FILTER doc.category == target_doc.category
    FILTER doc.published_date >= DATE_SUB(NOW(), INTERVAL 1 YEAR)
    FILTER doc._id != target_doc._id
    LET similarity = COSINE_SIMILARITY(doc.embedding, target_doc.embedding)
    FILTER similarity > 0.7
    SORT similarity DESC
    LIMIT 10
    RETURN {
        document: doc,
        similarity: similarity
    };

Question 35: Write a recursive CTE to calculate the total size of a folder including all subfolders.

Solution:

WITH RECURSIVE folder_sizes AS (
    -- Base case: files
    SELECT 
        folder_id,
        file_size as size
    FROM files
    
    UNION ALL
    
    -- Recursive case: subfolders
    SELECT 
        f.parent_folder_id as folder_id,
        fs.size
    FROM folders f
    INNER JOIN folder_sizes fs ON f.folder_id = fs.folder_id
)
SELECT 
    folder_id,
    SUM(size) as total_size
FROM folder_sizes
WHERE folder_id = 'root_folder_id'
GROUP BY folder_id;

Project Assignment

Overview

Build a complete query optimization solution for a multi-model e-commerce application.

Scenario

You're given a ThemisDB database with:

5M products (document model)
2M customers (document model)
20M orders (document model)
Product categories graph (graph model)
Clickstream events (time-series model)
Product embeddings for search (vector model)

Requirements

Part 1: Query Analysis (20%)

Profile 10 provided slow queries
Document execution plans
Identify bottlenecks
Prioritize optimization efforts

Part 2: Index Strategy (25%)

Design comprehensive index strategy
Create required indexes
Justify each index
Document trade-offs

Part 3: Query Optimization (30%)

Optimize all 10 queries
Rewrite for better performance
Use advanced AQL features
Achieve 10x performance improvement minimum

Part 4: Multi-Model Integration (25%)

Build product recommendation system using graph traversal
Implement semantic search using vectors
Create real-time analytics using time-series
Integrate all models in unified queries

Part 5: Documentation (10%)

Performance comparison report
Query optimization techniques used
Index strategy rationale
Best practices guide

Deliverables

Code Repository
- All optimized queries
- Index creation scripts
- Test data generators
- Performance benchmarks
Documentation
- Optimization report (5-10 pages)
- Before/after comparisons
- Lessons learned
- Recommendations
Presentation
- 10-minute video walkthrough
- Key findings and achievements
- Demo of optimizations

Evaluation Criteria

Criterion	Weight	Description
Performance Improvement	30%	Actual query speedup achieved
Technical Correctness	25%	Queries produce correct results
Index Strategy	20%	Appropriate and justified indexes
Code Quality	15%	Clean, readable, maintainable
Documentation	10%	Clear explanations and analysis

Passing Criteria

Minimum 70% overall score
All queries must function correctly
At least 5x average performance improvement
Complete documentation

Submission

Submit within 2 weeks of exam completion
Upload to certification portal
Include GitHub repository link

Certification Criteria

Overall Requirements

To pass the TQE certification, you must:

Score 75% or higher on written exam (23/30 minimum)
Pass the project assignment (70% or higher)
Complete within time limits
Demonstrate practical competency

Evaluation Breakdown

Written Exam (60%)

Topic Area	Questions	Weight
Advanced AQL	8-10	25%
Graph Algorithms	6-7	20%
Vector Search	4-5	15%
Query Optimization	6-7	20%
Index Strategies	4-5	15%
Performance Tuning	2-3	5%
Total	30-35	100%

Project Assignment (40%)

Must achieve 70% or higher
All sections must be completed
Code must execute successfully
Performance targets must be met

Retake Policy

If you fail the exam:

14-day waiting period
$125 retake fee
Must retake entire exam

If you fail the project:

Can resubmit within 30 days
$75 resubmission fee
Address evaluator feedback

Certification Benefits

Professional Recognition

Advanced-level digital badge
Query Expert designation
Portfolio-ready project
Industry recognition

Career Opportunities

Average 20% salary increase
Senior developer roles
Database specialist positions
Performance engineering roles

Community Access

Expert forum access
Quarterly expert webinars
Early access to query features
Influence product roadmap

Continuing Education

25% off advanced training
Free query optimization workshops
Conference speaker opportunities
Mentorship program eligibility

Next Steps

After earning TQE certification:

Apply Your Skills: Use in production projects
Share Knowledge: Blog, present, teach
Contribute: ThemisDB query optimization
Advance: Pursue TOC or TSC certification
Specialize: Become query optimization consultant

Support and Resources

Certification Support: certification@themisdb.com
Technical Questions: query-experts@themisdb.com
Project Help: project-support@themisdb.com

Study Materials: https://learn.themisdb.com/query-expert

Ready to become a ThemisDB Query Expert?

FilesExpand file tree

QUERY_CERTIFICATION.md

Latest commit

History

QUERY_CERTIFICATION.md

File metadata and controls

ThemisDB Query Expert Certification (TQE)

Certification Overview

Certification Details

Target Audience

Prerequisites

Required Certification

Technical Prerequisites

Recommended Experience

Learning Objectives

1. Advanced AQL Techniques (25%)

2. Graph Traversal and Algorithms (20%)

3. Vector Similarity Search (15%)

4. Query Optimization (20%)

5. Index Strategies (15%)

6. Performance Tuning (5%)

Advanced AQL Topics

Complex Queries and Subqueries

Correlated Subqueries

Common Table Expressions (CTEs)

Recursive Queries

Window Functions

Advanced Aggregations

Dynamic Query Construction

Graph Traversal Algorithms

Shortest Path

Graph Traversal

Pattern Matching

Graph Analytics

Vector Similarity Search

Vector Embeddings Basics

K-Nearest Neighbors Search

Similarity Metrics

Hybrid Search (Vector + Full-Text)

Query Optimization

Execution Plan Analysis

Optimization Techniques

Index Usage

Join Optimization

Subquery Optimization

Filter Early

Optimizer Hints

Index Strategies

Index Types

B-Tree Index (Default)

Hash Index

Full-Text Index

Geospatial Index

Vector Index

Partial Indexes

Index Maintenance

Index Trade-offs

Hands-On Labs

Lab 1: Complex Query Optimization (4 hours)

Lab 2: Graph Traversal (3 hours)

Lab 3: Vector Similarity Search (4 hours)

Lab 4: Multi-Model Integration (5 hours)

Sample Exam Questions

Section 1: Advanced AQL

Section 2: Graph Algorithms

Section 3: Vector Search

Section 4: Query Optimization

Section 5: Index Strategies

Scenario-Based Questions

Practical Coding Questions

Project Assignment

Overview

Scenario

Requirements

Deliverables

Evaluation Criteria

Passing Criteria

Submission

Certification Criteria

Overall Requirements