-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cloud compatibility #85
Comments
I'm not sure if there is a crate for it, but it should be possible to make your own struct that takes a Reader and implements Read+Seek by buffering all data that is read. Something like: struct BufferCursor<R: Read> {
r: R,
data: Vec<u8>
cursor: usize,
}
impl Read for BufferCursor {
fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
while cusor >= data.len() {
// read data into buffer
}
// copy data from self.data into buf
}
}
impl Seek for BufferCursor {
fn seek(&mut self, ...) {
// Update cursor, but don't modify self.data or self.r
}
} If you do implement something like this, you should totally release it on crates.io though! The bigger question is whether |
After some investigation I found that the best way was to add a new limit var: /// Decoding limits
#[derive(Clone, Debug)]
pub struct Limits {
/// The maximum size of any `DecodingResult` in bytes, the default is
/// 256MiB. If the entire image is decoded at once, then this will
/// be the maximum size of the image. If it is decoded one strip at a
/// time, this will be the maximum size of a strip.
pub decoding_buffer_size: usize,
/// The maximum size of any ifd value in bytes, the default is
/// 1MiB.
pub ifd_value_size: usize,
// Allow to specify a file limit to raise an error if decoding of the data is impossible (for cloud range request purpose)
pub file_limit: Option<u64>,
/// The purpose of this is to prevent all the fields of the struct from
/// being public, as this would make adding new fields a major version
/// bump.
_non_exhaustive: (),
} Then on each call of the #[inline]
pub fn goto_offset_u64(&mut self, offset: u64) -> TiffResult<()> {
if let Some(file_limit) = self.limits.file_limit {
if (file_limit as u64) < offset {
return Err(TiffError::DataUnreachable(offset))
}
}
self.reader
.seek(io::SeekFrom::Start(offset))?;
Ok(())
} I do the same in the let num_tags = if self.bigtiff { self.read_long8()? } else { self.read_short()?.into() };
if let Some(file_limit) = self.limits.file_limit {
let tag_size = if self.bigtiff { 20 } else { 12 };
if (file_limit as u64) < offset + num_tags * tag_size {
return Err(TiffError::DataUnreachable(offset + num_tags * tag_size))
}
} And i will add something on the |
Could you explain how this error has to differ from the one already produced by any beyond the end seek or read? What doesn't exist is a recovery after such an error, in other words reading a strip or ifd isn't transactional and any error may leave it in an improper state for resumption. |
There is two issues. We could solve the first issue by using custom reader and if the |
Also for better cloud support we should provide a step between the parsing of the tag type and the decoding of their values. // tile_offset { index: 1253, value_size: 4 }
let tiles_per_row = (level.width as f32 / level.tile_width as f32).ceil() as u32;
let index = tile_offset.index + (y*tiles_per_row + x) * tile_offset.value_size If we just add a method |
When jogging around in this code on a blue moon, I added the I'd propose an enum of the shape: enum ChunkData {
Uninitialized(Entry), // entry for easy retrieving the value, also its only 12 bytes, so data duplication is not really a problem I'd say
Sparse(Entry, HashMap<u64, u64>),
Dense(Vec<u64>),
}
impl ChunkData {
/// Tries to get the value, returning None if not cached
fn get(index: u64) -> Option<u64>{
match self {
Uninitialized(_) => None,
Sparse(_, hm) => hm.get(index),
Dense(v) => match v.get(index) {
None => None,
Some(v) => if v != 0 {v} else {None}
}
}
}
fn retrieve_single(reader, byte_order, bigtiff, index: u64) -> TiffResult<u64> {
// matching based adding logic,
}
} I put the rough outline in #249 with superfluous features for rumination. |
I open this issue to speak about the best way to have cloud compatibility.
What does it mean ? Simply that we should be able to decode the header step by step. Because the ifd can be anywhere in the file we should be able to decode ifd one by one and set the data to decode each time.
Example:
If there is a way to dynamicaly add data to cursor and fake the position we should be able to already have something working but i didn't find any way to do that. I wait for your advices about this use case before making a PR 😉
The text was updated successfully, but these errors were encountered: