Expose Image and friends to allow for Multithreading implementations #246

feefladder · 2024-09-19T01:04:14Z

Tiff is used for reading GeoTiff and - by extension - COGs. Those all require reading tiffs and me and friends think image/tiff is the place to nag for adding cool features to make our lives easier. Otherwise, we'd all be implementing our own decoders on top of preloaded images and stuff. Especially in the use-case of reading COGs, where concurrent, partial reads of a tiff are needed. This is mainly a discussion starter, but I think tiff::decoder::Image should be polished a bit more and then made public.

Summary

Make Image and other types public to allow for easily extending the Decoder. Also provide examples and implementation of an extended reader.

Motivation

GeoTiff landscape is currently quite fragmented and most implementations are stuck on being able to decode weird tiff/compression types. Those issues actually belong in this crate. I think this crate should either:

provide async/multithreaded support
have an extensible API to be able to implement said support.

then all geo-related functionality (and only that) can be put into georust/geotiff#7. Also libtiff has a multilayered api and based on this comment, I'd assume this library is trying to be somewhat analogous to libtiff. Therefore, exposing the API at multiple levels of abstraction should fit within this crate?

Currently remaining uglyness

The Image struct may need some polishing before being published as a pub struct. Below are some ramblinations on what could be improved before exposing the Image struct.

Currently, creating a Decoder from an Image is not hassle-free: byte_order is in the reader, bigtiff in the decoder (but that is also not a necessary field one the metadata is loaded) and so the implementation of ChunkReader uses all those. This info (byte_order and bigtiff) could be added to the Image struct.

Reading in the chunk offset/length fields in a large COG still takes quite some time, as shown here:

epic recording showing fast multithreaded stuff

This could be circumvented by having an enum like:

/// Enum for partially-loaded tags for chunk_offset and chunk_bytes
pub enum ChunkInfos {
   /// This tag has not been loaded yet, please read in the chunk you need
  Uninitialized(Entry), // Entry field for ergonomic initialization
  /// This tag has a minority of values read that are not necessarily close to each other
  Sparse(Entry, HashMap<u32, u64>),
  /// This tag has chunks read that form a sub-rectangle in the larger tiff
  /// assumes a rectangle from topleft-botright, where x and y difference (or rather I and J according to [GeoTiff Spec](https://docs.ogc.org/is/19-008r4/19-008r4.html#_device_space_and_geotiff)) is calculated from TileAttributes
  Rect{
    entry: Entry,
    topleft: u32,
    botright: u32,
    data: Vec<u64>
  },
  /// This tag is either entirely loaded, or has loaded enough data to be dense. `0` indicates a missing value.
  Dense(Vec<u64>)
}

impl ChunkInfos {
  fn get(chunk_index: u32) -> TiffResult<u64> {
    // logics
  }

  fn retrieve<R: Read + Seek>(chunk_index: u32, reader: R, byte_order: ByteOrder, bigtiff: bool) {//limits: Limits, 
    // more logics?
  }

  /// Not sure
  fn retrieve_async<R: AsyncRead + AsyncSeek + Unpin>(//bla) {
    //more logics.await?
  }
}

That would allow for partial reads of these tags. Reading and representation of these tags is directly embedded in this crate, so an extender could not implement this by themselves.

possibly that tiff::Image and image-rs::Image overlap in naming, callign for tiff::TiffImage.
I could probably think of more possible objections, but would need actual feedback first.
The name ChunkReader was co-developed with buzz from https://nurdspace.nl

…ded envs, implemented ChunkReader and provided example

fintelia · 2024-10-06T00:53:10Z

Those types were intentionally not made public. Making a public API requires design work to ensure that it is something we're happy with supporting long term

feefladder force-pushed the multithread branch from 49c16cd to 54c357f Compare September 19, 2024 04:12

feefladder added 5 commits September 30, 2024 16:57

feat:async implemented async

d30d3b9

feat:async made tests work

1eadd6c

refactored folder structure

78e7df9

added async http example

4ae8f82

commented out unused functions in ~EndianAsyncReader~ AsyncEndianReader

da8f183

feefladder force-pushed the multithread branch from 54c357f to 92d4f4f Compare September 30, 2024 15:34

This was referenced Sep 30, 2024

WIP Partial tag #248

Closed

Partial tag #249

Draft

feat:multithread made necessary modules public for using in mutithrea…

a5c87b7

…ded envs, implemented ChunkReader and provided example

feefladder force-pushed the multithread branch from 92d4f4f to a5c87b7 Compare September 30, 2024 19:36

This was referenced Oct 2, 2024

Exif support attempt number 2 #242

Draft

Extensibility: mid-level API and harmonizing encoder/decoder structs. #250

Open

feefladder mentioned this pull request Oct 15, 2024

Asynchronous GeoTIFF reader georust/geotiff#13

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose Image and friends to allow for Multithreading implementations #246

Expose Image and friends to allow for Multithreading implementations #246

feefladder commented Sep 19, 2024 •

edited

Loading

fintelia commented Oct 6, 2024

Expose Image and friends to allow for Multithreading implementations #246

Are you sure you want to change the base?

Expose Image and friends to allow for Multithreading implementations #246

Conversation

feefladder commented Sep 19, 2024 • edited Loading

Summary

Motivation

Currently remaining uglyness

fintelia commented Oct 6, 2024

feefladder commented Sep 19, 2024 •

edited

Loading