Skip to content

Conversation

@tonyreina
Copy link

@tonyreina tonyreina commented Oct 14, 2025

Category:

New feature (non-breaking change which adds functionality)

Description:

This PR adds Contrast-Limited Adaptive Histogram Equalization (CLAHE) to the DALI image operators.

CLAHE performs local histogram equalization with clipping and bilinear blending of lookup tables (LUTs) between neighboring tiles. This technique enhances local contrast while preventing over-amplification of noise. The implementation maintains exact algorithmic compatibility with OpenCV's cv::createCLAHE() while providing significant GPU performance optimizations.

Additional information:

Affected modules and functionalities:

  • Added clahe_op.cc and clahe_op.cu for GPU implementation with CUDA kernels
  • Added clahe_cpu.cc for CPU implementation using OpenCV
  • Added comprehensive operator schema with detailed documentation
  • Added Jupyter Notebook example

Key points relevant for the review:

  • Algorithmic Compatibility: Follows the algorithm used by OpenCV
  • Performance Optimizations: Includes automatic optimizations:
    • Kernel fusion (RGB→LAB + histogram computation)
    • Warp-privatized histograms for larger tiles (≥1024 pixels)
    • Vectorized memory access for larger images (≥8192 pixels)
    • Adaptive algorithm selection based on image size and tile configuration
  • Feature Support:
    • Supports grayscale (1-channel) and RGB (3-channel) uint8 images in HWC layout
    • Two RGB processing modes: luminance-only (preserves color relationships) and per-channel
    • Configurable tile grid, clip limit, and histogram bins

Tests:

  • New tests added
    • Python tests: test_clahe.py with multiple parameter combinations, device testing (CPU/GPU), and API validation
    • GTests: clahe_test.cc with CPU vs GPU equivalence testing, different tile sizes, clip limits, and error handling
    • Example: clahe_example.py demonstrating usage patterns and parameter effects
    • Benchmark: Performance benchmarking could be added in future work

Checklist

Documentation

  • Documentation updated
    • Docstring: Comprehensive operator schema documentation with parameter descriptions and usage examples
    • Doxygen: C++ code documentation following DALI conventions
    • RST: Added CLAHE optimizations section to performance tuning guide
    • Other: Performance optimization details documented in PERFORMANCE_NOTES.md

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

@JanuszL
Copy link
Contributor

JanuszL commented Oct 15, 2025

@tonyreina, thank you for your contribution. We appreciate the time you spent diving into DALI and extending it.

I haven't delved deeply into the code yet, as I focused more on general remarks - mostly regarding testing, examples, and memory management. Please let us know if you need any guidance in applying the suggestions.

@tonyreina tonyreina force-pushed the main branch 2 times, most recently from 084b2f1 to 4e087d2 Compare October 15, 2025 18:01
@tonyreina tonyreina closed this Oct 15, 2025
@mzient mzient assigned mzient and unassigned klecki Oct 15, 2025
@tonyreina tonyreina reopened this Oct 15, 2025
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@tonyreina tonyreina requested review from JanuszL and mzient October 15, 2025 22:59
@jantonguirao
Copy link
Contributor

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [36715300]: BUILD STARTED

@jantonguirao
Copy link
Contributor

Would be nice to add images to the example, to see the effect of this transformation visually

@review-notebook-app
Copy link

review-notebook-app bot commented Oct 31, 2025

View / edit / reply to this conversation on ReviewNB

mzient commented on 2025-10-31T12:25:19Z
----------------------------------------------------------------

📚1️⃣

Welcome to this hands-on tutorial!

Those pictures and the opening sentence are nice on their own, but stand out among our examples.


@review-notebook-app
Copy link

review-notebook-app bot commented Oct 31, 2025

View / edit / reply to this conversation on ReviewNB

mzient commented on 2025-10-31T12:25:20Z
----------------------------------------------------------------

I think it would be better to demonstrate the processing on an actual image (you can grab one from DALI_extra repository). The image here stands in stark opposition to the claim made in the introduction "CLAHE is a powerful technique that improves contrast in images without overamplifying noise" - here we can see mostly noise amplification and the actual contrast between the squares is diminished. This doesn't look like an effective demonstration, since the useful features (the squares) lose contrast and noise gains contrast.


Signed-off-by: Tony Reina <tbreina@bu.edu>
Signed-off-by: Tony Reina <tbreina@bu.edu>
@tonyreina
Copy link
Author

I think I made all of the recommended changes. The notebook and test examples now use the MRI knee image from DALI_extra.

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This review covers only the changes made since the last review, not the entire PR. The latest changes focus primarily on formatting corrections and code cleanup addressing previous review feedback. The main updates include: copyright year standardization to 2025across multiple files, Black formatting improvements throughout test files to adhere to the 100-character line limit, and proper handling of signed/unsigned comparison issues in CUDA code. Additionally, the PR includes improvements to include statements and removes debug output from test files to maintain clean test logs.

Important Files Changed

Filename Score Overview
dali/operators/image/clahe/clahe_op.cu 4/5 Fixed signed/unsigned comparison warning in clip redistribution loop and improved numeric constants precision
dali/test/python/test_clahe.py 4/5 Added .as_cpu() calls for proper GPU tensor conversion to numpy arrays and removed debug print statements
docs/examples/image_processing/clahe_example.ipynb 5/5 Updated copyright year from 2022-2023 to 2022-2025 following repository standards
dali/operators/image/clahe/clahe_op.cc 5/5 Updated copyright year and maintained existing functionality without changes
dali/operators/image/clahe/clahe_cpu.cc 5/5 Updated copyright year and maintained existing CPU implementation without changes
dali/operators/image/clahe/clahe_test.cc 5/5 Updated copyright year and maintained existing test logic without changes
dali/test/python/test_dali_cpu_only.py 5/5 Applied Black formatting for better code readability while maintaining functionality
dali/test/python/test_eager_coverage.py 5/5 Applied Black formatting improvements for consistent code style across parameter lists
dali/test/python/checkpointing/test_dali_checkpointing.py 5/5 Applied Black formatting improvements for multi-line function parameters
dali/test/python/test_dali_variable_batch_size.py 5/5 Applied Black formatting improvements for better parameter organization
docs/examples/image_processing/index.py 5/5 Updated copyright year and maintained proper documentation index structure
dali/operators/image/CMakeLists.txt 5/5 Maintained existing build configuration without changes
dali/operators/image/clahe/CMakeLists.txt 5/5 Maintained existing build configuration without changes

Confidence score: 4/5

  • This PR addresses formatting and compilation issues from previous reviews with minimal risk
  • Score reflects successful resolution of signed/unsigned comparison warnings and proper tensor handling improvements
  • Pay close attention to the CUDA implementation file due to the complexity of performance-critical GPU kernels and ensure all compiler warnings are resolved

13 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Signed-off-by: Tony Reina <tbreina@bu.edu>
Signed-off-by: Tony Reina <tbreina@bu.edu>
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greetings Summary

This review covers only the changes made since the last review, not the entire PR. The most recent changes include:

  1. Implementation refinements: Improved CUDA optimization patterns with proper C++20 bit_cast for OpenCV compatibility constants, replaced division operations with multiplication by reciprocals for better performance, and enhanced memory coalescing patterns in vectorized kernels.

  2. Test infrastructure expansion: Added comprehensive test coverage across DALI's testing framework including eager execution tests (test_eager_coverage.py), CPU-only validation (test_dali_cpu_only.py), variable batch size support (test_dali_variable_batch_size.py), and checkpointing exclusions (test_dali_checkpointing.py).

  3. Documentation integration: Added the CLAHE example notebook to the image processing documentation index with proper operator references, and applied consistent code formatting across multiple test files following the project's Black formatting standards.

  4. Build system integration: Added CMake configuration for the CLAHE module following DALI's standard patterns for operator integration.

The changes demonstrate good integration of the new CLAHE operator into DALI's established infrastructure, with appropriate test coverage and documentation updates. The GPU implementation shows sophisticated optimization strategies while maintaining OpenCV algorithmic compatibility.

Important Files Changed

Filename Score Overview
dali/operators/image/clahe/clahe_op.cu 3/5 GPU CLAHE implementation with extensive CUDA optimizations but contains performance concerns around expensive powf operations and sequential algorithms
dali/operators/image/clahe/clahe_op.cc 4/5 GPU operator backend with proper memory management and comprehensive schema documentation
dali/operators/image/clahe/clahe_cpu.cc 4/5 CPU implementation using OpenCV with thread safety and proper input validation
dali/test/python/operator_1/test_clahe.py 4/5 Comprehensive Python test suite with OpenCV validation and device consistency checks
docs/examples/image_processing/clahe_example.ipynb 4/5 Educational Jupyter notebook with practical examples, but contains executed outputs that should be cleared
dali/operators/image/clahe/clahe_test.cc 4/5 C++ unit tests comparing CPU/GPU implementations with appropriate tolerances
dali/test/python/test_eager_coverage.py 5/5 Adds CLAHE to eager execution test coverage with proper integration
dali/test/python/test_dali_cpu_only.py 5/5 Adds CPU-only testing for CLAHE operator
dali/test/python/test_dali_variable_batch_size.py 5/5 Validates CLAHE works with dynamic batch sizes
dali/test/python/checkpointing/test_dali_checkpointing.py 5/5 Properly excludes CLAHE from checkpointing as expected for new operators
dali/operators/image/CMakeLists.txt 5/5 Simple and correct integration of CLAHE subdirectory into build system
dali/operators/image/clahe/CMakeLists.txt 5/5 Standard DALI CMake configuration following established patterns
docs/examples/image_processing/index.py 5/5 Minimal documentation index update adding CLAHE example

Confidence score: 3/5

  • This PR requires careful review due to complex GPU kernel optimizations and potential performance issues in the CUDA implementation
  • Score reflects concerns about expensive floating-point operations (powf calls) in device code, sequential algorithms that could benefit from parallelization, and potential memory access inefficiencies
  • Pay close attention to the CUDA kernel implementations in clahe_op.cu, particularly the color space conversion routines and CDF redistribution logic which may impact performance

13 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Signed-off-by: Tony Reina <tbreina@bu.edu>
Signed-off-by: Tony Reina <tbreina@bu.edu>
@tonyreina
Copy link
Author

I updated the powf calls with a LUT to improve speed. The other greptile comment I think was around the parallelization of the CDF calculation. Happy to add that if it is really necessary.

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This review covers only the changes made since the last review, not the entire PR. The most recent changes show comprehensive progress in addressing previous review feedback including notebook conversion, memory management improvements, error handling standardization, and formatting fixes. The PR continues to implement CLAHE (Contrast-Limited Adaptive Histogram Equalization) as a new image processing operator in DALI with both CPU (OpenCV-based) and GPU (CUDA-optimized) backends. Key recent improvements include migrating from a Python script to a Jupyter notebook for documentation, replacing manual CUDA memory allocation with DALI's DynamicScratchpad API, standardizing error handling to use DALI_ENFORCE macros, and addressing various CUDA kernel optimization issues.

Important Files Changed

Filename Score Overview
dali/operators/image/CMakeLists.txt 5/5 Simple addition of clahe subdirectory to build system
docs/examples/image_processing/index.py 5/5 Updates copyright year and adds CLAHE example to documentation index
dali/operators/image/clahe/CMakeLists.txt 5/5 Standard CMake configuration for new CLAHE operator module
dali/test/python/test_eager_coverage.py 5/5 Adds CLAHE to eager execution test coverage with formatting improvements
dali/test/python/checkpointing/test_dali_checkpointing.py 5/5 Adds CLAHE to unsupported checkpointing operators list with formatting updates
dali/test/python/test_dali_cpu_only.py 4/5 Extends CPU-only testing to include CLAHE operator
dali/test/python/test_dali_variable_batch_size.py 4/5 Integrates CLAHE into variable batch size testing framework
dali/operators/image/clahe/clahe_cpu.cc 4/5 Well-structured CPU implementation using OpenCV with thread-safety
dali/operators/image/clahe/clahe_op.cc 4/5 Comprehensive GPU operator implementation with proper DALI patterns
docs/examples/image_processing/clahe_example.ipynb 4/5 Educational notebook with comprehensive examples but contains executed output
dali/operators/image/clahe/clahe_test.cc 4/5 C++ unit tests with good coverage but could benefit from more comprehensive validation
dali/test/python/operator_1/test_clahe.py 3/5 Extensive Python test suite but has non-deterministic elements and incomplete GPU feature parity
dali/operators/image/clahe/clahe_op.cu 3/5 Complex CUDA implementation with performance optimizations but several implementation issues

Confidence score: 3/5

  • This PR requires careful review due to complex CUDA kernel implementations and some remaining issues in the GPU backend
  • Score reflects unresolved issues in CUDA kernels including undefined behavior, potential out-of-bounds access, and warp divergence problems that could affect correctness and performance
  • Pay close attention to the CUDA kernel implementation in clahe_op.cu which contains several optimization patterns that need validation for correctness

13 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants