Skip to content

[TMVA][SOFIE] Add error handling and logging to PyTorch/Keras model generation scripts #21190

@arpittkhandelwal

Description

@arpittkhandelwal

Explain what you would like to see improved and how.

Summary
The PyTorch and Keras model generation scripts in TMVA SOFIE (tmva/sofie/test/) lack error handling, validation, and logging. This can lead to silent failures, making it difficult to debug issues in CI/CD pipelines and local development.

Affected Files
tmva/sofie/test/generatePyTorchModels.py
tmva/sofie/test/generateKerasModels.py
tmva/sofie/test/Conv1dModelGenerator.py
tmva/sofie/test/Conv2dModelGenerator.py
tmva/sofie/test/Conv3dModelGenerator.py
tmva/sofie/test/ConvTrans2dModelGenerator.py
tmva/sofie/test/LinearModelGenerator.py
tmva/sofie/test/RecurrentModelGenerator.py
Current Behavior
The scripts execute model generation functions sequentially without any error handling:

python

tmva/sofie/test/generatePyTorchModels.py (lines 107-109)

generateSequentialModel()
generateModuleModel()
generateConvolutionModel()
Problems:

❌ No try-except blocks to catch exceptions
❌ No validation that model files were created successfully
❌ No logging to indicate progress or failures
❌ If one model fails, the script continues without notification
❌ No exit code to indicate failure in CI/CD pipelines
❌ Failed model files may be left in inconsistent states
Expected Behavior
The scripts should:

✅ Wrap model generation in try-except blocks
✅ Log progress and success/failure for each model
✅ Validate that output files exist and are non-empty
✅ Return appropriate exit codes for CI/CD integration
✅ Provide clear error messages for debugging
Proposed Solution
python
import logging
import sys
from pathlib import Path
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(name)
def generateSequentialModel() -> bool:
"""Generate sequential PyTorch model with error handling.

Returns:
    bool: True if successful, False otherwise
"""
try:
    logger.info("Generating Sequential model...")
    
    # Existing model generation code
    model = nn.Sequential(
        nn.Linear(4, 8),
        nn.ReLU(),
        nn.Linear(8, 6),
        nn.SELU()
    )
    
    # ... training code ...
    
    # Save with validation
    output_path = Path("PyTorchModelSequential.pt")
    torch.jit.save(m, output_path)
    
    if not output_path.exists():
        raise RuntimeError(f"Model file was not created: {output_path}")
    
    if output_path.stat().st_size == 0:
        raise RuntimeError(f"Model file is empty: {output_path}")
    
    logger.info(f"✓ Sequential model saved successfully ({output_path.stat().st_size} bytes)")
    return True
    
except Exception as e:
    logger.error(f"✗ Failed to generate Sequential model: {e}", exc_info=True)
    return False

if name == "main":
results = {
"Sequential": generateSequentialModel(),
"Module": generateModuleModel(),
"Convolution": generateConvolutionModel()
}

# Report results
successful = [name for name, success in results.items() if success]
failed = [name for name, success in results.items() if not success]

logger.info(f"\n{'='*50}")
logger.info(f"Results: {len(successful)}/{len(results)} models generated successfully")

if successful:
    logger.info(f"✓ Successful: {', '.join(successful)}")

if failed:
    logger.error(f"✗ Failed: {', '.join(failed)}")
    sys.exit(1)
else:
    logger.info("All models generated successfully!")
    sys.exit(0)

Benefits
Easier debugging: Clear error messages and stack traces
CI/CD integration: Proper exit codes for automated testing
Better user experience: Progress indication and clear success/failure status
Robustness: Validation ensures model files are actually created
Maintainability: Easier to identify which model generation failed
Impact
Low risk: Changes are isolated to test scripts
High value: Improves developer experience and CI/CD reliability
Good first issue: Clear scope, well-defined requirements
Additional Context
This improvement would align with modern Python best practices and make the TMVA SOFIE test suite more robust and maintainable. Similar error handling patterns are used throughout the ROOT codebase.

Environment
ROOT version: master branch (as of February 2026)
Component: TMVA SOFIE
Python version: 3.x

ROOT version

master branch (commit 4d0fb3b)

Installation method

Cloned from GitHub repository (https://github.com/root-project/root) Issue found through static code analysis of TMVA SOFIE test scripts

Operating system

macOS (issue affects all platforms)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions