-
Notifications
You must be signed in to change notification settings - Fork 20
Description
Ever since the conception of cstruct, we've had the idea to add lazy reading to structures. However due to numerous reasons (laziness itself probably being one of them, but also technical) we never fully experimented with it.
With the new architecture of 4.0, I think it's a bit easier to try this out.
The idea is basically as follows: for static structures (so no dynamic fields), we can eagerly read all the bytes (we already do this in the compiler) and store it in the Structure object. Then upon access of a field (struct.field_name), we only parse and store that field value.
Some implementation ideas:
- Upon creation of a statically sized structure in
StructureMetaType, change the class toLazyStructure LazyStructurehas a different__init__, which basically amounts toself.__buf = io.BytesIO(fh.read(len(self)))- For every field in the structure, generate a
@cached_property def field_nameand attach it to the class - This generated property will seek into
self.__bufand parse the type of that field
We could also make the IO fully lazy (only keep a reference to the passed in fh inside of LazyStructure), but we can run the risk of it disappearing from underneath us (i.e. a file closing).
This will only really work with static structures, because we can calculate the offsets of all fields beforehand. Technically you could support dynamic structures by allowing this technique on fields up to and including the first dynamic field, but that's probably a lot more complicated to implement.
I wonder if this will really be any faster due to the extra overhead in parsing. We've optimized the compiler quite a bit already, with optimized struct.unpack calls. With this method we instead will go through the (much slower) _read implementations of each type. The "initial" parsing of a structure will be much faster, yes, but I think you'll quickly lose out if you end up reading every field of the structure anyway.
Anyway, this ticket is for experimentation. I personally think #126 is the safer bet to gain more performance.