Skip to content

Conversation

shaneahmed
Copy link
Member

@shaneahmed shaneahmed commented Sep 20, 2024

Summary of Changes

Major Additions

  • Dask Integration:

    • Added dask as a dependency and integrated Dask arrays and lazy computation throughout the engine and patch predictor code.
    • Added Dask-based merging, chunking, and memory-aware processing for large images and WSIs.
  • Zarr Output Support:

    • Added support for saving model predictions and intermediate results directly to Zarr format.
    • New CLI options and internal logic for Zarr output, including memory thresholding and chunked writes.
  • SemanticSegmentor Engine:

    • Added a new SemanticSegmentor engine with Dask/Zarr support and new test coverage (test_semantic_segmentor.py).
    • Added CLI entrypoint for semantic_segmentor and removed the old semantic_segment CLI.
  • Enhanced CLI and Config:

    • Added CLI options for memory threshold, unified worker options, and improved mask handling.
    • Updated YAML configs and sample data for new models and test images.
  • Utilities and Validation:

    • Added utility functions for minimal dtype casting, patch/stride validation, and improved error handling (e.g., DimensionMismatchError).
    • Improved annotation store conversion for Dask arrays and Zarr-backed outputs.
  • Changes to kwarg

    • Add memory-threshold
    • Unified num-loader-workers and num-postproc-workers into num-workers
    • Removed cache_mode as cache mode is automatically handled.

Major Removals/Refactors

  • Removed Old CLI and Redundant Code:

    • Deleted the old semantic_segment.py CLI and replaced it with semantic_segmentor.py.
    • Removed legacy cache mode and patch prediction Zarr store tests.
  • Refactored Model and Dataset APIs:

    • Unified and simplified model inference APIs to always return arrays (not dicts) for batch outputs.
    • Refactored dataset classes to enforce patch shape validation and remove legacy “mode” logic.
  • Test Cleanup:

    • Removed or updated tests that relied on old APIs or cache mode.
    • Refactored test assertions for new output types and Dask array handling.
  • API Consistency:

    • Standardized function and argument names across engines, CLI, and utility modules.
    • Updated docstrings and type hints for clarity and consistency.

Notable File Changes

  • New:

    • tiatoolbox/cli/semantic_segmentor.py
    • tests/engines/test_semantic_segmentor.py
  • Removed:

    • tiatoolbox/cli/semantic_segment.py
    • Old cache mode and patch Zarr store tests
  • Heavily Modified:

    • engine_abc.py, patch_predictor.py, semantic_segmentor.py
    • CLI modules and test suites
    • Dataset and utility modules for Dask/Zarr compatibility

Impact

  • Enables scalable, parallel, and memory-efficient inference and output saving for large images.
  • Simplifies downstream analysis by supporting Zarr as a native output format.
  • Lays the groundwork for further Dask-based optimizations in TIAToolbox.

@shaneahmed shaneahmed self-assigned this Sep 20, 2024
@shaneahmed shaneahmed added the enhancement New feature or request label Sep 20, 2024
@shaneahmed shaneahmed added this to the Release v2.0.0 milestone Sep 20, 2024
Copy link

codecov bot commented Sep 20, 2024

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.58%. Comparing base (ca49b18) to head (3a47fe1).

Additional details and impacted files
@@                    Coverage Diff                     @@
##           dev-define-engines-abc     #866      +/-   ##
==========================================================
+ Coverage                   91.19%   94.58%   +3.38%     
==========================================================
  Files                          73       73              
  Lines                        9374     9234     -140     
  Branches                     1230     1209      -21     
==========================================================
+ Hits                         8549     8734     +185     
+ Misses                        792      468     -324     
+ Partials                       33       32       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@shaneahmed shaneahmed marked this pull request as draft February 5, 2025 16:27
- Use `input_resolutions` instead of resolution to make engines outputs compatible with ioconfig.
- Uses input resolution as a list of dictionaries on units and resolution.
- Use `input_resolutions` instead of resolution to make engines outputs compatible with ioconfig.
- Uses input resolution as a list of dictionaries on units and resolution.
…mentor

# Conflicts:
#	tests/engines/test_engine_abc.py
#	tests/engines/test_patch_predictor.py
#	tiatoolbox/models/engine/engine_abc.py
#	tiatoolbox/models/engine/io_config.py
#	tiatoolbox/models/engine/patch_predictor.py
@Jiaqi-Lv Jiaqi-Lv requested a review from Copilot August 28, 2025 15:25
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request implements a comprehensive refactor of the TIAToolbox engine system, introducing a new abstract base class EngineABC and implementing SemanticSegmentor as an extension of the PatchPredictor. The refactor modernizes the codebase with improved memory management, Dask array integration, and better separation of concerns.

Key changes include:

  • New EngineABC base class providing unified interface for deep learning engines
  • Complete rewrite of SemanticSegmentor extending PatchPredictor with WSI-specific functionality
  • Integration of Dask arrays for memory-efficient processing and caching
  • Enhanced error handling and validation with new exception types

Reviewed Changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tiatoolbox/utils/transforms.py Added int type annotation to interpolation parameter
tiatoolbox/utils/misc.py Enhanced utility functions with Dask integration, memory optimization, and new helper functions
tiatoolbox/utils/exceptions.py Added DimensionMismatchError exception class
tiatoolbox/models/models_abc.py Updated abstract method signatures for improved type safety
tiatoolbox/models/engine/semantic_segmentor.py Complete rewrite implementing new EngineABC architecture
tiatoolbox/models/engine/patch_predictor.py Refactored to extend EngineABC with simplified interface
tiatoolbox/models/engine/engine_abc.py New abstract base class for all TIAToolbox engines
tiatoolbox/models/dataset/dataset_abc.py Enhanced dataset classes with output location tracking and validation
Comments suppressed due to low confidence (1)

tiatoolbox/models/engine/semantic_segmentor.py:535

  • Similar to the previous issue, self.dataloader should be dataloader in the else clause on the following line.
                canvas, count, canvas_zarr, count_zarr

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@adamshephard adamshephard self-requested a review September 12, 2025 09:39
Copy link
Collaborator

@measty measty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done an initial review, looks nice overall but I found a few issues that need addressing before it could be merged.

mode: str,
ioconfig: IOSegmentorConfig,
save_dir: str | Path,
images: list[os.PathLike | Path | WSIReader] | np.ndarray,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.Pathlike should include str, and example at top of file illustrates usage using string. But string results in error because code uses things like: save_dir / (get_path(image).stem + suffix) without any conversion into Path.

probably need to make get_path return a Path

keys_to_compute = [k for k in keys_to_compute if k not in zarr_group]
write_tasks = []
for key in keys_to_compute:
dask_array = processed_predictions[key].rechunk("auto")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving the rechunk to auto doesn't always work.

I ran semantic segmentor tissue mask on a random slide, this line ended up with a dask_array with:
chunks = ((1862, 1800, 1800, 1800, 1800, 450), (13562,))

zarr needs to have even chunks (except for the last one).
So for example, specifying dask_array = processed_predictions[key].rechunk((1800,-1)) here instead made the to_zarr work (results in chunks = ((1800, 1800, 1800, 1800, 1800, 512), (13562,))) which zarr is fine with.

Probably this passes during tests cause the tests use slides with nice round dimensions (like 4k x 4k).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todo: helper function to calculate zarr chunks.

@shaneahmed shaneahmed linked an issue Oct 3, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Shifted patches when merging patch predictions!

3 participants