Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
3cfd2cb
Add `Dicom` datatype and sniffer to datatypes_conf.xml.sample
kostrykin Dec 2, 2025
24acfd2
Add dicom format, example file
hexylena Sep 26, 2024
9d526a2
Implement `Dicom.set_meta`
kostrykin Dec 2, 2025
b741270
Fix formatting
kostrykin Dec 2, 2025
1f1edb0
Extend and fix `Dicom.set_meta`
kostrykin Dec 2, 2025
1984b52
Add test for `Dicom` metadata
kostrykin Dec 2, 2025
e941b08
Employ `pydicom` to identify DICOM files
kostrykin Dec 2, 2025
edb798c
Fix `Dicom.set_meta`
kostrykin Dec 2, 2025
0adfb90
Add pydicom dependency
kostrykin Dec 2, 2025
be6bdcd
Fix images.py
kostrykin Dec 3, 2025
c82854c
Make `import pydicom` optional
kostrykin Dec 3, 2025
72d39a0
Add `mimetype` and set `display_in_upload` to true for `Dicom`
kostrykin Dec 3, 2025
ddba90d
Add `description_url` for `Dicom`
kostrykin Dec 3, 2025
6045517
Fix `Dicom` and tests
kostrykin Dec 3, 2025
88b9906
Employ `build_sniff_from_prefix` and add test
kostrykin Dec 3, 2025
481ede4
Restrict pydicom to Python >=3.10 in pyproject.toml
kostrykin Dec 3, 2025
3b96806
Update deps with `lib/galaxy/dependencies/update.sh -p pydicom`
kostrykin Dec 3, 2025
df96139
Fix linting
kostrykin Dec 3, 2025
dcf5683
Fix reference to TotalPixelMatrix in images.py
kostrykin Dec 3, 2025
3ff63e7
Revert 3b96806556737039d7998c0271d2d534b81ad603 for unrelated packages
kostrykin Dec 3, 2025
602d0ba
Fix linting
kostrykin Dec 3, 2025
a56aee3
Fix linting
kostrykin Dec 3, 2025
270643a
Fix linting
kostrykin Dec 3, 2025
bdfb096
Fix linting (now for real)
kostrykin Dec 3, 2025
2758853
Add `pydicom` as a package dependency
kostrykin Dec 3, 2025
3e28990
Fix bug in `Dicom.set_meta` that made integration tests fail
kostrykin Dec 4, 2025
802a2aa
Employ more diverse DICOM test data
kostrykin Dec 4, 2025
0272751
Migrate tests to new test data and fix `Dicom.set_meta`
kostrykin Dec 4, 2025
9f15412
Fix tests and `Dicom.set_meta` implementation for tiled data
kostrykin Dec 4, 2025
7e23152
Add `is_tiled` metadata element for DICOM
kostrykin Dec 4, 2025
2dd31e9
Refactor test/unit/data/datatypes/test_images.py
kostrykin Dec 4, 2025
6a21206
Refactor test/unit/data/datatypes/test_images.py
kostrykin Dec 4, 2025
0188ab9
Fix type hints
kostrykin Dec 4, 2025
8d1247b
Fix linting
kostrykin Dec 4, 2025
a1c28d9
Adopt formatting for black to pass
kostrykin Dec 4, 2025
93724f0
Revert import order to what isort likes but flake8 doesn't
kostrykin Dec 4, 2025
df5fe20
Fix docstring example
kostrykin Dec 4, 2025
b4a8d6c
Reproduce bug (integration test failure) as unit test
kostrykin Dec 4, 2025
5515f18
Employ `build_sniff_from_prefix` for `Tiff` to solve ambiguity with `…
kostrykin Dec 4, 2025
dc7ba53
Bring back `channels` metadata element (deleted by accident)
kostrykin Dec 4, 2025
ab12b60
Fix formatting for black
kostrykin Dec 4, 2025
177f9b5
Clean up `OMETiff.sniff` and add tests
kostrykin Dec 4, 2025
d8375ee
Fix formatting for black
kostrykin Dec 4, 2025
aa17319
Ignore `sniff_prefix` in `OMETiff` derived from `Tiff`
kostrykin Dec 4, 2025
ea5eff3
Revert aa1731941eaba07b1028b266ca11075d280a4879 for sniff.py, reimple…
kostrykin Dec 4, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions lib/galaxy/config/sample/datatypes_conf.xml.sample
Original file line number Diff line number Diff line change
Expand Up @@ -313,6 +313,7 @@
<datatype extension="ome.tiff" type="galaxy.datatypes.images:OMETiff" display_in_upload="true">
<display file="image/avivator.xml"/>
</datatype>
<datatype extension="dcm" type="galaxy.datatypes.images:Dicom" mimetype="application/dicom" subclass="true" display_in_upload="true" description_url="https://formats.kaitai.io/dicom"/>
<datatype extension="vms" type="galaxy.datatypes.images:Hamamatsu" mimetype="image/hamamatsu"/>
<datatype extension="vmu" type="galaxy.datatypes.images:Hamamatsu" subclass="true" display_in_upload="false"/>
<datatype extension="ndpi" type="galaxy.datatypes.images:Hamamatsu" subclass="true" display_in_upload="false"/>
Expand Down Expand Up @@ -1445,6 +1446,7 @@
<sniffer type="galaxy.datatypes.images:Png"/>
<sniffer type="galaxy.datatypes.images:OMETiff"/>
<sniffer type="galaxy.datatypes.images:Tiff"/>
<sniffer type="galaxy.datatypes.images:Dicom"/>
<sniffer type="galaxy.datatypes.images:Bmp"/>
<sniffer type="galaxy.datatypes.images:Gif"/>
<sniffer type="galaxy.datatypes.images:Im"/>
Expand Down
149 changes: 140 additions & 9 deletions lib/galaxy/datatypes/images.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
"""

import base64
import io
import json
import logging
import math
Expand All @@ -17,6 +18,7 @@
import mrcfile
import numpy as np
import png
import pydicom
import tifffile
from typing_extensions import Literal

Expand Down Expand Up @@ -225,10 +227,12 @@ def set_meta(
dataset.metadata.num_unique_values = len(unique_values)


@build_sniff_from_prefix
class Tiff(Image):
edam_format = "format_3591"
file_ext = "tiff"
display_behavior = "download" # TIFF files trigger browser downloads

MetadataElement(
name="offsets",
desc="Offsets File",
Expand All @@ -239,6 +243,27 @@ class Tiff(Image):
optional=True,
)

def sniff_prefix(self, file_prefix: FilePrefix) -> bool:
"""
Determine if the file is in TIFF format by checking the file header.

For a successful check, the first 4 bytes must be the TIFF magic number. See [1] for a list of magic numbers.

Manual checking of the file header, as opposed to trying to read the file with tifffile, is required due to an
ambiguity with DICOM files. This is because the DICOM standard allows *any content* for the first 128 bytes of
the file, followed by the DICOM prefix (see §7.1 in [2] for details).

[1] https://gist.github.com/leommoore/f9e57ba2aa4bf197ebc5
[2] https://dicom.nema.org/medical/dicom/current/output/html/part10.html
"""
return file_prefix.contents_header_bytes[:4] in (
b"\x4d\x4d\x00\x2a", # TIFF format (Motorola - big endian)
b"\x49\x49\x2a\x00", # TIFF format (Intel - little endian)
) and (
len(file_prefix.contents_header_bytes) < 132 # file is too short to be a DICOM
or file_prefix.contents_header_bytes[128:132] != b"DICM" # file does not contain the DICOM prefix
)

def set_meta(
self, dataset: DatasetProtocol, overwrite: bool = True, metadata_tmp_files_dir: Optional[str] = None, **kwd
) -> None:
Expand Down Expand Up @@ -385,19 +410,14 @@ def _read_segments(page: Union[tifffile.TiffPage, tifffile.TiffFrame]) -> Iterat

yield segment

def sniff(self, filename: str) -> bool:
with tifffile.TiffFile(filename):
return True


class OMETiff(Tiff):
file_ext = "ome.tiff"

def sniff(self, filename: str) -> bool:
with tifffile.TiffFile(filename) as tif:
if tif.is_ome:
return True
return False
def sniff_prefix(self, file_prefix: FilePrefix) -> bool:
buf = io.BytesIO(file_prefix.contents_header_bytes)
with tifffile.TiffFile(buf) as tif:
return tif.is_ome
Comment on lines +418 to +420
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling this won't work for larger files where the image is larger than the contents_header_bytes .

Maybe you can use the @disable_parent_class_sniffing decorator from lib/galaxy/datatypes/sniff.py and restore the previous sniff method?

Copy link
Contributor Author

@kostrykin kostrykin Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it (see kostrykin#4), but when I put the @disable_parent_class_sniffing decorator on the OMETiff class, the sniff method does not get called. The decorator seems to disable sniffing.

The test/unit/data/datatypes/test_images.py::test_ome_tiff_sniff fails with

>       assert OMETiff().sniff(fname)
E       AssertionError: assert False
E        +  where False = <lambda>('lib/galaxy/datatypes/test/1.ome.tiff')
E        +    where <lambda> = <galaxy.datatypes.images.OMETiff object at 0x7f99946c3190>.sniff
E        +      where <galaxy.datatypes.images.OMETiff object at 0x7f99946c3190> = OMETiff()

Despite that OMETiff.sniff is implemented as

    def sniff(self, filename: str) -> bool:
        raise ValueError('OMETiff.sniff called')  # <-- !!! this error is *not* reported !!!
        with tifffile.TiffFile(filename) as tif:
            return tif.is_ome

thus, OMETiff.sniff is not called when the @disable_parent_class_sniffing decorator is in place.

Here is the error from running the test: https://github.com/kostrykin/galaxy/actions/runs/19942920690/job/57185121231?pr=4#step:8:11238

That said, I think the current solution should work, because tifffile.TiffFile only reads the headers/metadata of the file. The pixel content only is read when the corresponding methods are used (they are not used here).

On the other hand, since we do not know how large metadata can be, or how it is organized, if there is a possibility to get this working without sniff_prefix, I'd go for that.



class OMEZarr(data.ZarrDirectory):
Expand Down Expand Up @@ -519,6 +539,117 @@ def sniff(self, filename: str) -> bool:
return fh.read(4) == b"%PDF"


@build_sniff_from_prefix
class Dicom(Image):
"""
DICOM medical imaging format (.dcm)

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('ct_image.dcm')
>>> Dicom().sniff(fname)
True
"""

MetadataElement(
name="is_tiled",
desc="Is this a WSI DICOM?",
readonly=True,
visible=True,
optional=True,
)

edam_format = "format_3548"
file_ext = "dcm"

def sniff_prefix(self, file_prefix: FilePrefix) -> bool:
"""
Determine if the file is in DICOM format according to §7.1 in [1].

[1] https://dicom.nema.org/medical/dicom/current/output/html/part10.html
"""
return len(file_prefix.contents_header_bytes) >= 132 and file_prefix.contents_header_bytes[128:132] == b"DICM"

def get_mime(self) -> str:
"""
Returns the mime type of the datatype.
"""
return "application/dicom"

def set_meta(
self, dataset: DatasetProtocol, overwrite: bool = True, metadata_tmp_files_dir: Optional[str] = None, **kwd
) -> None:
"""
Populate the metadata of the DICOM file using the pydicom library.

The following metadata fields are populated, if possible:
- `width`
- `height`
- `channels`
- `dtype`
- `num_unique_values` in some cases
- `is_tiled`

Currently, `frames` and `depth` are not populated. This is because "frames" in DICOM are a generic entity,
that can be used for different purposes, including slices in 3-D images, frames in temporal sequences, and
tiles of a mosaic or pyramid (WSI DICOM). Distinguishing these cases is not straight-forward (and, as a
consequence, neither is determining the `axes` of the image). This can be implemented in the future.
"""
try:
dcm = pydicom.dcmread(dataset.get_file_name(), stop_before_pixels=True)
except pydicom.errors.InvalidDicomError:
return # Ignore errors if metadata cannot be read

# Determine the number of channels (0 if no channel info is present)
dataset.metadata.channels = dcm.get("SamplesPerPixel", 0)

# Determine if the DICOM file is tiled (likely WSI DICOM)
dataset.metadata.is_tiled = hasattr(dcm, "TotalPixelMatrixColumns") and hasattr(dcm, "TotalPixelMatrixRows")

# Determine the width and height of the dataset. If the DICOM file is not tiled, the width and height
# directly. For tiled DICOM, these values correspond to the size of the tiles.
if dataset.metadata.is_tiled:
dataset.metadata.width = dcm.TotalPixelMatrixColumns
dataset.metadata.height = dcm.TotalPixelMatrixRows
else:
dataset.metadata.width = dcm.get("Columns")
dataset.metadata.height = dcm.get("Rows")

# Try to infer the `dtype` from metadata
if dcm.BitsAllocated == 1:
dataset.metadata.dtype = "bool" # 1bit
else:
dtype_lut = [
["uint8", "int8"],
["uint16", "int16"],
["uint32", "int32"],
]
dtype_lut_pos = (
round(math.log2(dcm.BitsAllocated) - 3), # 8bit -> 0, 16bit -> 1, 32bit -> 2
dcm.PixelRepresentation,
)
if 0 <= dtype_lut_pos[0] < len(dtype_lut):
dataset.metadata.dtype = dtype_lut[dtype_lut_pos[0]][dtype_lut_pos[1]]
else:
dataset.metadata.dtype = None # unknown `dtype`

# Try to infer `num_unique_values` from metadata
try:
if dcm.SOPClassUID == "1.2.840.10008.5.1.4.1.1.66.4": # https://www.dicomlibrary.com/dicom/sop

# The DICOM file contains segmentation, count +1 for the image background
dataset.metadata.num_unique_values = 1 + len(dcm.SegmentSequence)

else:

# Otherwise, `num_unique_values` is not available from metadata
dataset.metadata.num_unique_values = None

except AttributeError:

# Ignore errors if metadata cannot be read
dataset.metadata.num_unique_values = None


@build_sniff_from_prefix
class Tck(Binary):
"""
Expand Down
1 change: 1 addition & 0 deletions lib/galaxy/datatypes/test/ct_image.dcm
1 change: 1 addition & 0 deletions lib/galaxy/datatypes/test/seg_image_ct_binary.dcm
1 change: 1 addition & 0 deletions lib/galaxy/datatypes/test/sm_image.dcm
2 changes: 2 additions & 0 deletions lib/galaxy/dependencies/pinned-requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,8 @@ pycryptodome==3.23.0
pydantic==2.12.5
pydantic-core==2.41.5
pydantic-tes==0.2.0
pydicom==2.4.4 ; python_full_version < '3.10'
pydicom==3.0.1 ; python_full_version >= '3.10'
pydot==4.0.1
pyeventsystem==0.1.0
pyfaidx==0.9.0.3
Expand Down
1 change: 1 addition & 0 deletions packages/data/setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ install_requires =
parsley
pycryptodome
pydantic[email]>=2.7.4
pydicom
pylibmagic
pypng
python-magic
Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ dependencies = [
"pulsar-galaxy-lib>=0.15.10",
"pycryptodome",
"pydantic[email]>=2.7.4", # https://github.com/pydantic/pydantic/pull/9639
"pydicom",
"PyJWT",
"pykwalify",
"pylibmagic",
Expand Down
7 changes: 7 additions & 0 deletions test-data/highdicom/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Copyright 2020 MGH Computational Pathology

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Binary file added test-data/highdicom/ct_image.dcm
Binary file not shown.
Binary file added test-data/highdicom/seg_image_ct_binary.dcm
Binary file not shown.
Binary file added test-data/highdicom/sm_image.dcm
Binary file not shown.
Loading
Loading