Python 3 compatibility and t-batch caching. #9
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The first 3 commits address python 3 compatibility and remove unnecessary imports.
The final commit is an incomplete tbatching optimization. We don't need to recompute tbatches for every epoch, so it makes sense to do some type of caching. Also, we don't need to recompute them for every run either, assuming we can load the entire dict of tbatches into memory and do random access on each dict (needed to account for user changes to timespan).
However, the current code isn't set up to incorporate these changes easily, because chunks of t-batches are computed on-the-fly, trading off with the corresponding chunk of the epoch. So one would have to compute the "start" and "end" points of each tbatch chunk so that the epoch chunk can access the right tbatches.
The code in the 4th commit is not just unoptimized, but buggy. In the first epoch, the tbatch dicts keep growing, as args.cache_tbatches=True removes tbatch reinitialization. But the epoch still iterates over the full length of the tbatch dicts. There are a two competing ways one could fix this:
Revert to reinitializing the tbatch chunk every time, but save the tbatch chunks to disk.
Tell the epoch where to access the tbatch dict, instead of starting from the beginning.
2 seems easier, but I will leave that to the code maintainers' judgement :)