Skip to content

Latest commit

 

History

History
740 lines (569 loc) · 20.1 KB

File metadata and controls

740 lines (569 loc) · 20.1 KB

KortexDL Enhanced Python API Documentation

Overview

This document describes the enhanced Python API for KortexDL, which provides comprehensive machine learning capabilities including accuracy monitoring, data preprocessing, cross-validation, model evaluation, and diagnostic features.

Table of Contents

  1. Core Enhanced Classes
  2. Enhanced Network Methods
  3. Evaluation and Metrics
  4. Memory Management
  5. Thread Safety
  6. Health Monitoring
  7. Examples and Usage

Core Enhanced Classes

TrainingMetrics

The TrainingMetrics class provides comprehensive training performance metrics during model training and evaluation.

from kortexdl import TrainingMetrics

# Access metrics during training
metrics = network.get_current_metrics()
print(f"Loss: {metrics.loss}")
print(f"R² Score: {metrics.r2_score}")
print(f"MAPE: {metrics.mape}%")        # Mean Absolute Percentage Error
print(f"MAE: {metrics.mae}")          # Mean Absolute Error  
print(f"RMSE: {metrics.rmse}")        # Root Mean Square Error
print(f"Correlation: {metrics.correlation}")

Attributes:

  • loss (float): Current loss value from training
  • r2_score (float): R² coefficient of determination (0.0 to 1.0, higher is better)
  • mape (float): Mean Absolute Percentage Error (lower is better)
  • mae (float): Mean Absolute Error (lower is better)
  • rmse (float): Root Mean Square Error (lower is better)
  • correlation (float): Correlation coefficient between predictions and targets
  • epoch (int): Current epoch number
  • step (int): Current training step
  • learning_rate (float): Current learning rate
  • training_time (float): Training time in milliseconds

AccuracyConfig

The AccuracyConfig class configures advanced training options and monitoring settings.

from kortexdl import AccuracyConfig

config = AccuracyConfig()
config.enable_feature_scaling = True
config.enable_target_normalization = True
config.enable_early_stopping = True
config.early_stopping_patience = 20
config.early_stopping_delta = 0.001
config.l2_regularization = 0.0001
config.cross_validation_folds = 5

network.configure_accuracy(config)

Configuration Options:

  • enable_feature_scaling (bool): Enable automatic feature scaling normalization
  • enable_target_normalization (bool): Enable automatic target normalization
  • enable_outlier_detection (bool): Enable statistical outlier detection
  • enable_early_stopping (bool): Enable early stopping to prevent overfitting
  • enable_cross_validation (bool): Enable cross-validation during training
  • early_stopping_patience (int): Number of epochs to wait for improvement before stopping
  • early_stopping_delta (float): Minimum improvement threshold for early stopping
  • l2_regularization (float): L2 regularization coefficient (weight decay)
  • gradient_clip_threshold (float): Gradient clipping threshold for stable training
  • cross_validation_folds (int): Number of folds for cross-validation (typically 3-10)

ModelEvaluator

The ModelEvaluator class provides comprehensive model evaluation with statistical metrics and detailed reports.

from kortexdl import ModelEvaluator

# Create evaluator
evaluator = ModelEvaluator()

# Regression evaluation
result = evaluator.evaluate_regression(network, X_test, y_test, LossType.MSE)
print(result.r2_score)
print(result.mae)
evaluator.print_report(result)

# Classification evaluation  
result = evaluator.evaluate_classification(network, X_test, y_test, LossType.BINARY_CE)
print(result.accuracy)
print(result.precision)
print(result.recall)
print(result.f1_score)
evaluator.print_report(result)

# Multiclass evaluation
result = evaluator.evaluate_multiclass(network, X_test, y_test, LossType.CATEGORICAL_CE)
print(result.accuracy)
print(result.macro_f1_score)
print(result.macro_precision)
print(result.macro_recall)
evaluator.print_report(result)

Evaluation Results:

  • Regression: r2_score, mape, mae, rmse, correlation
  • Binary Classification: accuracy, precision, recall, f1_score, specific
  • Multiclass Classification: accuracy, macro_f1_score, macro_precision, macro_recall, weighted_f1_score

Enhanced Network Methods

train_with_monitoring()

Train network with comprehensive accuracy monitoring and early stopping.

# Basic training
results = network.train_with_monitoring(
    inputs=X_train,
    targets=y_train, 
    loss_type=bn.MSE,
    learning_rate=0.01,
    batch_size=32,
    epochs=100
)

# Enhanced training with monitoring
config = AccuracyConfig()
config.enable_early_stopping = True
config.early_stopping_patience = 20
network.configure_accuracy(config)

results = network.train_with_monitoring(
    inputs=X_train,
    targets=y_train,
    loss_type=bn.MSE,
    learning_rate=0.01,
    batch_size=64,
    epochs=200
)

Parameters:

  • inputs (numpy.ndarray): Training input data
  • targets (numpy.ndarray): Training target data
  • loss_type (LossType): Loss function type (MSE, MAE, BINARY_CE, etc.)
  • learning_rate (float): Learning rate for optimization
  • batch_size (int): Batch size for mini-batch training
  • epochs (int): Number of training epochs

Returns: Training results including comprehensive metrics

evaluate_comprehensive()

Evaluate network performance with detailed metrics.

# Comprehensive evaluation
test_metrics = network.evaluate_comprehensive(
    inputs=X_test,
    targets=y_test,
    loss_type=bn.MSE
)

print(f"Test Loss: {test_metrics.loss}")
print(f"Test R²: {test_metrics.r2_score}")
print(f"Test MAPE: {test_metrics.mape}%")
print(f"Test MAE: {test_metrics.mae}")
print(f"Test RMSE: {test_metrics.rmse}")

cross_validate()

Perform k-fold cross-validation with accuracy monitoring.

# Cross-validation
cv_results = network.cross_validate(
    inputs=X_train,
    targets=y_train,
    loss_type=bn.MSE,
    learning_rate=0.01,
    batch_size=64,
    epochs=100,
    folds=5
)

# Analyze cross-validation results
avg_r2 = np.mean([r.r2_score for r in cv_results])
avg_mape = np.mean([r.mape for r in cv_results])
print(f"Average R²: {avg_r2}")
print(f"Average MAPE: {avg_mape}%")

preprocess_data()

Automatically preprocess data with scaling and normalization.

# Automatic preprocessing
X_processed, y_processed = network.preprocess_data(X_train, y_train)

# Manual preprocessing with visualization
config = AccuracyConfig()
config.enable_feature_scaling = True
config.enable_target_normalization = True

X_scaled, y_normalized = network.preprocess_data(X_raw, y_raw)
print(f"Mean before scaling: {np.mean(X_raw, axis=0)}")
print(f"Mean after scaling: {np.mean(X_scaled, axis=0)}")

get_training_history()

Access complete training history with all metrics.

# Get training history
history = network.get_training_history()

# Plot training progression
epochs = [h.epoch for h in history]
losses = [h.loss for h in history]
r2_scores = [h.r2_score for h in history]

plt.plot(epochs, losses, label='Loss')
plt.plot(epochs, r2_scores, label='R² Score')
plt.legend()
plt.show()

get_current_metrics()

Get current training metrics and performance indicators.

# Get current metrics during training
metrics = network.get_current_metrics()

print(f"Current Loss: {metrics.loss}")
print(f"Learning Rate: {metrics.learning_rate}")
print(f"Training Time: {metrics.training_time}ms")

configure_accuracy()

Configure network with accuracy and monitoring settings.

# Configure accuracy monitoring
config = AccuracyConfig()

# Set advanced training options
config.enable_feature_scaling = True
config.enable_target_normalization = True
config.enable_early_stopping = True
config.early_stopping_patience = 30
config.early_stopping_delta = 0.001
config.l2_regularization = 0.0001
config.gradient_clip_threshold = 1.0

# Apply configuration
network.configure_accuracy(config)
print("Accuracy monitoring configured!")

Evaluation and Metrics

Basic Computation Functions

Calculate individual metrics on data:

# Compute basic metrics
mse = bn.compute_mse(y_true, y_pred)
rmse = bn.compute_rmse(y_true, y_pred)
mae = bn.compute_mae(y_true, y_pred)
r2 = bn.compute_r2_score(y_true, y_pred)

Advanced Statistics

Access comprehensive statistical analysis:

# Multiple metrics at once
from kortexdl import ModelEvaluator

evaluator = ModelEvaluator()
result = evaluator.evaluate_regression(network, X_test, y_test, bn.MSE)

# Access detailed statistics
print(f"R² Score: {result.r2_score:.4f}")
print(f"MAPE: {result.mape:.2f}%")
print(f"MAE: {result.mae:.6f}")
print(f"RMSE: {result.rmse:.6f}")
print(f"Correlation: {result.correlation:.4f}")

# Generate detailed report
evaluator.print_report(result)

Classification Metrics

Comprehensive classification evaluation:

# Binary classification
result = evaluator.evaluate_classification(
    network, X_test, y_test, bn.BINARY_CE
)

print(f"Accuracy: {result.accuracy:.4f}")
print(f"Precision: {result.precision:.4f}")
print(f"Recall: {result.recall:.4f}")
print(f"F1-Score: {result.f1_score:.4f}")

# Detailed report with confusion matrix
evaluator.print_report(result)

# Multiclass evaluation
result = evaluator.evaluate_multiclass(
    network, X_test, y_test, bn.CATEGORICAL_CE
)

print(f"Macro F1: {result.macro_f1_score:.4f}")
print(f"Weighted F1: {result.weighted_f1_score:.4f}")

Memory Management

Memory Monitoring

Track and optimize memory usage during training:

# Get current memory usage
memory_info = bn.get_memory_info()
print(f"Total memory: {memory_info.get('total_bytes', 'unknown')} bytes")
print(f"Used memory: {memory_info.get('usage_bytes', 'unknown')} bytes")

# Check if sufficient memory available
if bn.check_memory_available():
    print("✓ Sufficient memory available")
else:
    print("⚠️ Memory constraints detected")

# Get detailed memory statistics
memory_stats = bn.get_memory_stats()
print(f"Peak usage: {memory_stats.get('peak_usage', 'unknown')} bytes")

# Reset memory statistics
bn.reset_memory_stats()

Memory Validation

Validate memory requirements before large models:

# Check memory requirements
if bn.validate_memory_requirements():
    print("✓ System can handle this model")
    large_network = bn.Network([1000, 512, 256, 128, 1], [bn.ReLU, bn.ReLU, bn.ReLU, bn.Linear])
else:
    print("⚠️ Consider reducing model size")

Thread Safety

Thread Safety Configuration

Enable thread-safe concurrent operations:

# Enable thread safety
network.enable_thread_safety(True)
is_enabled = bn.is_thread_safety_enabled()

# Configure OpenMP settings
bn.require_thread_safety()

# Get thread status
thread_status = bn.get_thread_status()
print(f"Thread count: {thread_status.get('thread_count', 'unknown')}")

Parallel Training

Configure parallel processing with thread safety:

# Enable parallel training
network.enable_thread_safety(True)
if bn.is_thread_safety_enabled():
    print("✓ Thread safety enabled for parallel operations")

# Train with thread safety
results = network.train_with_monitoring(
    inputs=X_train, targets=y_train, 
    loss_type=bn.MSE, learning_rate=0.01, 
    batch_size=128, epochs=100
)

# Check thread safety statistics
safety_stats = network.get_thread_safety_stats()
print("Thread safety statistics available")

Health Monitoring

Numerical Health Checks

Monitor for numerical stability issues during training:

# Check for numerical issues
if network.has_numerical_issues():
    print("⚠️ Numerical issues detected")
    warnings = network.get_health_warnings()
    for warning in warnings:
        print(f"Warning: {warning}")
else:
    print("✓ No numerical issues detected")

# Get diagnostic summary
summary = network.get_diagnostic_summary()
print(f"Network health: {summary}")

Diagnostic Monitoring

Enable advanced diagnostic monitoring:

# Enable diagnostics
network.enable_diagnostics(True)

# Reset diagnostic counters
network.reset_diagnostics()

# Later: check diagnostic results
if network.has_numerical_issues():
    print("⚠️ Check numerical stability")
    warnings = network.get_health_warnings()
    for warning in warnings:
        print(f"  - {warning}")

Examples and Usage

Basic Regression Example

import kortexdl as bn
import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

# Generate sample data
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Configure accuracy monitoring
config = bn.AccuracyConfig()
config.enable_feature_scaling = True
config.enable_early_stopping = True
config.early_stopping_patience = 15

# Create and configure network
network = bn.Network([10, 128, 64, 32, 1], [bn.ReLU, bn.ReLU, bn.ReLU, bn.Linear])
network.configure_accuracy(config)

# Train with monitoring
results = network.train_with_monitoring(
    inputs=X_train, targets=y_train,
    loss_type=bn.MSE, learning_rate=0.01,
    batch_size=64, epochs=100
)

# Evaluate comprehensively
test_metrics = network.evaluate_comprehensive(
    inputs=X_test, targets=y_test, 
    loss_type=bn.MSE
)

print(f"R² Score: {test_metrics.r2_score:.4f}")
print(f"MAPE: {test_metrics.mape:.2f}%")

Classification Example

import kortexdl as bn
import numpy as np
from sklearn.datasets import make_classification

# Generate classification data
X, y = make_classification(n_samples=2000, n_features=20, n_classes=2)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Create classification network
network = bn.Network([20, 128, 64, 1], [bn.ReLU, bn.ReLU, bn.Sigmoid])

# Train network
network.train_with_monitoring(
    inputs=X_train, targets=y_train,
    loss_type=bn.BINARY_CE, learning_rate=0.01,
    batch_size=128, epochs=50
)

# Evaluate classification
from kortexdl import ModelEvaluator
evaluator = ModelEvaluator()

result = evaluator.evaluate_classification(
    network=network, inputs=X_test, targets=y_test,
    loss_type=bn.BINARY_CE
)

print(f"Accuracy: {result.accuracy:.4f}")
print(f"Precision: {result.precision:.4f}")
print(f"Recall: {result.recall:.4f}")

evaluator.print_report(result)

Cross-Validation Example

import kortexdl as bn
from sklearn.datasets import make_regression

# Generate regression data
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1)

# Configure cross-validation
config = bn.AccuracyConfig()
config.enable_cross_validation = True
config.cross_validation_folds = 5

network = bn.Network([10, 64, 32, 1], [bn.ReLU, bn.ReLU, bn.Linear])
network.configure_accuracy(config)

# Perform cross-validation
results = network.cross_validate(
    inputs=X, targets=y,
    loss_type=bn.MSE, learning_rate=0.01,
    batch_size=64, epochs=50, folds=5
)

# Analyze results
avg_r2 = np.mean([r.r2_score for r in results])
avg_mape = np.mean([r.mape for r in results])
std_r2 = np.std([r.r2_score for r in results])

print(f"Average R²: {avg_r2:.4f} ± {std_r2:.4f}")
print(f"Average MAPE: {avg_mape:.2f}%")

Data Preprocessing Example

import kortexdl as bn

# Enable preprocessing
config = bn.AccuracyConfig()
config.enable_feature_scaling = True
config.enable_target_normalization = True

network = bn.Network([10, 64, 1], [bn.ReLU, bn.Linear])
network.configure_accuracy(config)

# Apply preprocessing
X_scaled, y_normalized = network.preprocess_data(X_raw, y_raw)

print(f"Original mean: {np.mean(X_raw, axis=0)}")
print(f"Scaled mean: {np.mean(X_scaled, axis=0)}")
print(f"Original target std: {np.std(y_raw)}")
print(f"Normalized target std: {np.std(y_normalized)}")

Memory and Diagnostic Example

import kortexdl as bn

# Check memory before training
memory_info = bn.get_memory_info()
print(f"Available memory: {memory_info.get('available_bytes', 'unknown')} bytes")

if bn.validate_memory_requirements():
    print("✓ Can handle this model size")
    
    # Enable health monitoring
    network = bn.Network([100, 64, 32, 1], [bn.ReLU, bn.ReLU, bn.Linear])
    network.enable_diagnostics(True)
    
    # Train with monitoring
    results = network.train_with_monitoring(
        inputs=X_train, targets=y_train,
        loss_type=bn.MSE, learning_rate=0.01,
        batch_size=128, epochs=100
    )
    
    # Check health status
    if network.has_numerical_issues():
        warnings = network.get_health_warnings()
        print("⚠️ Numerical stability issues:")
        for warning in warnings:
            print(f"  - {warning}")
    
else:
    print("⚠️ Consider reducing model size")

# Final memory usage
print(f"Peak memory usage: {bn.get_memory_stats().get('peak_usage', 'unknown')} bytes")

Best Practices

1. Accuracy Configuration

config = AccuracyConfig()
config.early_stopping_patience = max(10, int(epochs * 0.2))
config.early_stopping_delta = 0.001
config.l2_regularization = 0.0001  # Prevent overfitting

2. Memory Management

# Check memory before large models
if bn.check_memory_available(requested_bytes=model_memory_estimate):
    network = create_large_model()
else:
    network = create_smaller_model()

3. Thread Safety for Production

# Enable thread safety for concurrent operations
network.enable_thread_safety(True)
if bn.is_thread_safety_enabled():
    print("✓ Thread safety active")

4. Health Monitoring

# Enable diagnostics for production
network.enable_diagnostics(True)
network.enable_diagnostics(True)  # Enable before training

# Check after training
if network.has_numerical_issues():
    warnings = network.get_health_warnings()
    print(f"Found {len(warnings)} warnings")

Performance Tips

1. Data Types

Always use numpy.float32 for optimal performance:

X_train = X_train.astype(np.float32)
y_train = y_train.astype(np.float32)

2. Batch Sizes

Use powers of 2 for better memory alignment:

batch_size = 64  # or 128, 256, 512

3. Network Architecture

Keep architectures balanced:

# Good: decreasing size gradually
network = bn.Network([input_size, 128, 64, 32, output_size], 
                    [bn.ReLU, bn.ReLU, bn.ReLU, bn.ReLU, bn.Linear])

# Avoid: dramatic changes
network = bn.Network([input_size, 512, 16, 512, output_size], ...)

4. Regularization

Use appropriate regularization to prevent overfitting:

config.l2_regularization = 0.0001  # for most cases
config.gradient_clip_threshold = 1.0  # for stable training

Troubleshooting

Common Issues

  1. Loss not decreasing: Check learning rate, try reducing by 10x
  2. TrainingMemory not improving: Ensure data is properly scaled/normalized
  3. NaN/Inf values: Reduce learning rate, check for zero division, enable gradient clipping
  4. Slow convergence: Increase learning rate, reduce network complexity
  5. Memory issues: Reduce batch size, simplify network architecture

Health Check Script

def health_check(network, X_train, y_train):
    """Comprehensive network health check."""
    
    # Check training metrics
    metrics = network.get_current_metrics()
    if metrics.loss == 0.0:
        print("⚠️ Zero loss detected")
    
    # Check for numerical issues
    if network.has_numerical_issues():
        print("⚠️ Numerical instability detected")
        warnings = network.get_health_warnings()
        for warning in warnings:
            print(f"  - {warning}")
    
    # Check input data quality
    if np.isnan(X_train).any() or np.isnan(y_train).any():
        print("⚠️ NaN values in input data")
    
    # Check for infinite values
    if np.isinf(X_train).any() or np.isinf(y_train).any():
        print("⚠️ Infinite values in input data")
    
    # Basic statistics
    print(f"Input range: [{X_train.min():.3f}, {X_train.max():.3f}]")
    print(f"Target range: [{y_train.min():.3f}, {y_train.max():.3f}]")

This documentation provides comprehensive guidance for using all enhanced features of the KortexDL Python API. All features are fully tested and available through the standard package installation.