Skip to content

Enhanced Virtual Tables

Temp edited this page Sep 23, 2025 · 1 revision

Enhanced Virtual Tables

Last Updated: September 23, 2025 1:48 PM EST

The SQLite MCP Server provides intelligent data import capabilities with automatic schema detection, type inference, and seamless conversion of CSV and JSON files into queryable SQLite tables.

πŸ“š See Real-World Use Cases for complete CSV/JSON import workflows, schema inference examples, and data processing pipelines.


πŸ”§ Available Enhanced Virtual Table Tools

Tool Description
create_enhanced_csv_table Create CSV tables with automatic data type inference
create_json_collection_table Create virtual tables for JSON file collections (JSONL, arrays)
analyze_csv_schema Analyze CSV files and infer data types without creating tables
analyze_json_schema Analyze JSON file collections and infer schema

πŸ“Š Smart CSV Import

Automatic Schema Detection

// 1. Analyze CSV structure
analyze_csv_schema({
  "csv_file_path": "./data/sales.csv"
})

// 2. Create enhanced virtual table
create_enhanced_csv_table({
  "table_name": "sales_data",
  "csv_file_path": "./data/sales.csv",
  "has_header": true
})

// 3. Query the data
read_query({
  "query": "SELECT * FROM sales_data WHERE amount > 1000"
})

Advanced CSV Processing

// Enhanced CSV table with custom options
create_enhanced_csv_table({
  "table_name": "customer_data",
  "csv_file_path": "./data/customers.csv",
  "has_header": true,
  "delimiter": ",",
  "auto_detect_types": true,
  "sample_size": 1000,  // Rows to sample for type inference
  "null_values": ["", "NULL", "N/A", "null"],
  "date_formats": ["YYYY-MM-DD", "MM/DD/YYYY", "DD-MM-YYYY"],
  "numeric_separators": {
    "decimal": ".",
    "thousands": ","
  }
})

Schema Analysis Without Import

// Analyze CSV without creating table
analyze_csv_schema({
  "csv_file_path": "./data/large_dataset.csv",
  "sample_size": 5000,
  "detect_encoding": true,
  "analyze_quality": true
})

Returns:

  • Inferred column types (INTEGER, REAL, TEXT, DATE, BOOLEAN)
  • Data quality metrics (null percentage, unique values)
  • Encoding detection
  • Delimiter detection
  • Header detection
  • Sample data preview

πŸ“„ JSON Collection Processing

JSON Lines (JSONL) Files

// Create table from JSONL file
create_json_collection_table({
  "table_name": "api_logs",
  "json_file_path": "./logs/api_requests.jsonl",
  "format": "jsonl",
  "auto_flatten": true,
  "max_depth": 3
})

// Query flattened JSON data
read_query({
  "query": `
    SELECT 
      timestamp,
      method,
      url,
      status_code,
      response_time_ms
    FROM api_logs 
    WHERE status_code >= 400
  `
})

JSON Array Files

// Import JSON array file
create_json_collection_table({
  "table_name": "product_catalog",
  "json_file_path": "./data/products.json",
  "format": "array",
  "schema_inference": true,
  "nested_objects": "flatten",  // flatten, json_column, separate_tables
  "arrays": "json_column"       // json_column, separate_rows, ignore
})

Complex JSON Schema Analysis

// Analyze complex JSON structure
analyze_json_schema({
  "json_file_path": "./data/complex_data.json",
  "format": "array",
  "max_sample": 1000,
  "analyze_nested": true,
  "suggest_normalization": true
})

Returns:

  • Inferred schema with data types
  • Nested object structure analysis
  • Array handling suggestions
  • Normalization recommendations
  • Sample data extraction

🎯 Intelligent Type Inference

Data Type Detection

The enhanced virtual tables automatically detect and convert:

Numeric Types:

// Automatically detected as INTEGER or REAL
"123"        β†’ INTEGER
"123.45"     β†’ REAL
"1,234.56"   β†’ REAL (handles thousands separators)
"$1,234.56"  β†’ REAL (removes currency symbols)

Date/Time Types:

// Various date formats automatically detected
"2025-09-23"           β†’ DATE
"09/23/2025"           β†’ DATE
"23-Sep-2025"          β†’ DATE
"2025-09-23 14:30:00"  β†’ DATETIME

Boolean Types:

// Boolean value detection
"true", "false"        β†’ BOOLEAN
"yes", "no"           β†’ BOOLEAN
"1", "0"              β†’ BOOLEAN (contextual)
"Y", "N"              β†’ BOOLEAN

Text Types:

// Default fallback for complex data
"Mixed content 123"    β†’ TEXT
"Complex string data"  β†’ TEXT

Custom Type Mapping

// Override automatic type detection
create_enhanced_csv_table({
  "table_name": "custom_types",
  "csv_file_path": "./data/data.csv",
  "has_header": true,
  "type_overrides": {
    "id": "TEXT",           // Force ID as text
    "amount": "REAL",       // Ensure decimal precision
    "date": "DATE",         // Force date parsing
    "active": "BOOLEAN"     // Convert to boolean
  },
  "custom_parsers": {
    "amount": "CAST(REPLACE(REPLACE($1, '$', ''), ',', '') AS REAL)",
    "date": "date($1)"
  }
})

πŸ” Data Quality Analysis

Comprehensive Data Profiling

// Analyze data quality during import
create_enhanced_csv_table({
  "table_name": "quality_analyzed",
  "csv_file_path": "./data/messy_data.csv",
  "has_header": true,
  "quality_analysis": true,
  "generate_report": true
})

Quality Metrics Provided:

  • Null value percentages per column
  • Unique value counts
  • Data type consistency
  • Outlier detection
  • Pattern analysis
  • Completeness scores

Data Cleaning Suggestions

// Get cleaning recommendations
analyze_csv_schema({
  "csv_file_path": "./data/dirty_data.csv",
  "suggest_cleaning": true,
  "detect_duplicates": true,
  "validate_constraints": true
})

Cleaning Suggestions:

  • Duplicate row identification
  • Inconsistent formatting detection
  • Missing value patterns
  • Constraint violations
  • Normalization opportunities

πŸ’‘ Real-World Use Cases

E-commerce Data Import

// 1. Analyze product data structure
const schema = await analyze_csv_schema({
  "csv_file_path": "./import/products.csv",
  "analyze_quality": true,
  "suggest_normalization": true
});

// 2. Create enhanced table with optimizations
create_enhanced_csv_table({
  "table_name": "products_import",
  "csv_file_path": "./import/products.csv",
  "has_header": true,
  "auto_detect_types": true,
  "quality_analysis": true,
  "type_overrides": {
    "sku": "TEXT",
    "price": "REAL",
    "in_stock": "BOOLEAN"
  }
});

// 3. Validate and clean data
read_query({
  "query": `
    SELECT 
      COUNT(*) as total_rows,
      COUNT(DISTINCT sku) as unique_skus,
      COUNT(CASE WHEN price > 0 THEN 1 END) as valid_prices,
      COUNT(CASE WHEN name IS NOT NULL AND name != '' THEN 1 END) as named_products
    FROM products_import
  `
});

// 4. Import clean data to permanent table
write_query({
  "query": `
    INSERT INTO products (sku, name, price, category, in_stock)
    SELECT sku, name, price, category, in_stock
    FROM products_import
    WHERE sku IS NOT NULL 
    AND price > 0 
    AND name IS NOT NULL
  `
});

Log File Analysis

// 1. Analyze log file structure
analyze_json_schema({
  "json_file_path": "./logs/application.jsonl",
  "format": "jsonl",
  "max_sample": 10000,
  "analyze_nested": true
});

// 2. Create structured log table
create_json_collection_table({
  "table_name": "app_logs",
  "json_file_path": "./logs/application.jsonl",
  "format": "jsonl",
  "auto_flatten": true,
  "nested_objects": "flatten",
  "timestamp_columns": ["timestamp", "created_at"],
  "index_columns": ["level", "component", "timestamp"]
});

// 3. Analyze error patterns
read_query({
  "query": `
    SELECT 
      component,
      level,
      COUNT(*) as count,
      MIN(timestamp) as first_occurrence,
      MAX(timestamp) as last_occurrence
    FROM app_logs
    WHERE level IN ('ERROR', 'WARN')
    GROUP BY component, level
    ORDER BY count DESC
  `
});

Survey Data Processing

// 1. Import survey responses from JSON
create_json_collection_table({
  "table_name": "survey_responses",
  "json_file_path": "./data/survey_results.json",
  "format": "array",
  "auto_flatten": true,
  "nested_objects": "separate_tables",
  "arrays": "separate_rows"
});

// 2. Analyze response patterns
read_query({
  "query": `
    SELECT 
      question_id,
      response_value,
      COUNT(*) as frequency,
      COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (PARTITION BY question_id) as percentage
    FROM survey_responses
    WHERE question_type = 'multiple_choice'
    GROUP BY question_id, response_value
    ORDER BY question_id, frequency DESC
  `
});

🎯 Best Practices

1. Always Analyze Before Import

// Analyze first to understand your data
const analysis = await analyze_csv_schema({
  "csv_file_path": "./data/unknown_data.csv",
  "analyze_quality": true,
  "suggest_cleaning": true
});

console.log(`Detected ${analysis.columns.length} columns`);
console.log(`Data quality score: ${analysis.quality_score}`);

2. Use Appropriate Sample Sizes

// Large files: use sampling for performance
create_enhanced_csv_table({
  "table_name": "large_dataset",
  "csv_file_path": "./data/huge_file.csv",
  "sample_size": 10000,  // Sample for type inference
  "streaming_mode": true  // Process in chunks
});

// Small files: analyze everything
create_enhanced_csv_table({
  "table_name": "small_dataset",
  "csv_file_path": "./data/small_file.csv",
  "sample_size": -1  // Analyze all rows
});

3. Handle Nested JSON Appropriately

// Simple flattening for shallow objects
create_json_collection_table({
  "table_name": "simple_json",
  "json_file_path": "./data/simple.json",
  "format": "array",
  "auto_flatten": true,
  "max_depth": 2
});

// Preserve structure for complex objects
create_json_collection_table({
  "table_name": "complex_json",
  "json_file_path": "./data/complex.json",
  "format": "array",
  "nested_objects": "json_column",
  "preserve_structure": true
});

4. Validate After Import

// Always validate imported data
read_query({
  "query": `
    SELECT 
      'Data Import Validation' as check_type,
      COUNT(*) as total_rows,
      COUNT(CASE WHEN primary_key IS NULL THEN 1 END) as missing_keys,
      COUNT(DISTINCT primary_key) as unique_keys,
      CASE 
        WHEN COUNT(*) = COUNT(DISTINCT primary_key) THEN 'PASS'
        ELSE 'FAIL'
      END as uniqueness_check
    FROM imported_table
  `
});

πŸ“š Related Pages


πŸ“Š Smart Import Tip: Enhanced virtual tables combine the flexibility of virtual tables with intelligent schema detection. Use them to quickly explore and import external data sources with minimal manual configuration.

Clone this wiki locally