Skip to content

Add/experiment with lazy reading of structures #127

@Schamper

Description

@Schamper

Ever since the conception of cstruct, we've had the idea to add lazy reading to structures. However due to numerous reasons (laziness itself probably being one of them, but also technical) we never fully experimented with it.

With the new architecture of 4.0, I think it's a bit easier to try this out.

The idea is basically as follows: for static structures (so no dynamic fields), we can eagerly read all the bytes (we already do this in the compiler) and store it in the Structure object. Then upon access of a field (struct.field_name), we only parse and store that field value.

Some implementation ideas:

  • Upon creation of a statically sized structure in StructureMetaType, change the class to LazyStructure
  • LazyStructure has a different __init__, which basically amounts to self.__buf = io.BytesIO(fh.read(len(self)))
  • For every field in the structure, generate a @cached_property def field_name and attach it to the class
  • This generated property will seek into self.__buf and parse the type of that field

We could also make the IO fully lazy (only keep a reference to the passed in fh inside of LazyStructure), but we can run the risk of it disappearing from underneath us (i.e. a file closing).

This will only really work with static structures, because we can calculate the offsets of all fields beforehand. Technically you could support dynamic structures by allowing this technique on fields up to and including the first dynamic field, but that's probably a lot more complicated to implement.

I wonder if this will really be any faster due to the extra overhead in parsing. We've optimized the compiler quite a bit already, with optimized struct.unpack calls. With this method we instead will go through the (much slower) _read implementations of each type. The "initial" parsing of a structure will be much faster, yes, but I think you'll quickly lose out if you end up reading every field of the structure anyway.

Anyway, this ticket is for experimentation. I personally think #126 is the safer bet to gain more performance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions