Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose Image and friends to allow for Multithreading implementations #246

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

feefladder
Copy link

@feefladder feefladder commented Sep 19, 2024

Tiff is used for reading GeoTiff and - by extension - COGs. Those all require reading tiffs and me and friends think image/tiff is the place to nag for adding cool features to make our lives easier. Otherwise, we'd all be implementing our own decoders on top of preloaded images and stuff. Especially in the use-case of reading COGs, where concurrent, partial reads of a tiff are needed. This is mainly a discussion starter, but I think tiff::decoder::Image should be polished a bit more and then made public.

Summary

Make Image and other types public to allow for easily extending the Decoder. Also provide examples and implementation of an extended reader.

Motivation

GeoTiff landscape is currently quite fragmented and most implementations are stuck on being able to decode weird tiff/compression types. Those issues actually belong in this crate. I think this crate should either:

  1. provide async/multithreaded support
  2. have an extensible API to be able to implement said support.

then all geo-related functionality (and only that) can be put into georust/geotiff#7. Also libtiff has a multilayered api and based on this comment, I'd assume this library is trying to be somewhat analogous to libtiff. Therefore, exposing the API at multiple levels of abstraction should fit within this crate?

Currently remaining uglyness

The Image struct may need some polishing before being published as a pub struct. Below are some ramblinations on what could be improved before exposing the Image struct.

  • Currently, creating a Decoder from an Image is not hassle-free: byte_order is in the reader, bigtiff in the decoder (but that is also not a necessary field one the metadata is loaded) and so the implementation of ChunkReader uses all those. This info (byte_order and bigtiff) could be added to the Image struct.
  • Reading in the chunk offset/length fields in a large COG still takes quite some time, as shown here: epic recording showing fast multithreaded stuff This could be circumvented by having an enum like:
    /// Enum for partially-loaded tags for chunk_offset and chunk_bytes
    pub enum ChunkInfos {
       /// This tag has not been loaded yet, please read in the chunk you need
      Uninitialized(Entry), // Entry field for ergonomic initialization
      /// This tag has a minority of values read that are not necessarily close to each other
      Sparse(Entry, HashMap<u32, u64>),
      /// This tag has chunks read that form a sub-rectangle in the larger tiff
      /// assumes a rectangle from topleft-botright, where x and y difference (or rather I and J according to [GeoTiff Spec](https://docs.ogc.org/is/19-008r4/19-008r4.html#_device_space_and_geotiff)) is calculated from TileAttributes
      Rect{
        entry: Entry,
        topleft: u32,
        botright: u32,
        data: Vec<u64>
      },
      /// This tag is either entirely loaded, or has loaded enough data to be dense. `0` indicates a missing value.
      Dense(Vec<u64>)
    }
    
    impl ChunkInfos {
      fn get(chunk_index: u32) -> TiffResult<u64> {
        // logics
      }
    
      fn retrieve<R: Read + Seek>(chunk_index: u32, reader: R, byte_order: ByteOrder, bigtiff: bool) {//limits: Limits, 
        // more logics?
      }
    
      /// Not sure
      fn retrieve_async<R: AsyncRead + AsyncSeek + Unpin>(//bla) {
        //more logics.await?
      }
    }
    That would allow for partial reads of these tags. Reading and representation of these tags is directly embedded in this crate, so an extender could not implement this by themselves.
  • possibly that tiff::Image and image-rs::Image overlap in naming, callign for tiff::TiffImage.
  • I could probably think of more possible objections, but would need actual feedback first.
  • The name ChunkReader was co-developed with buzz from https://nurdspace.nl

This was referenced Sep 30, 2024
…ded envs, implemented ChunkReader and provided example
@fintelia
Copy link
Contributor

fintelia commented Oct 6, 2024

Those types were intentionally not made public. Making a public API requires design work to ensure that it is something we're happy with supporting long term

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants