Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python 3 compatibility and t-batch caching. #9

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

jpalowitch
Copy link

The first 3 commits address python 3 compatibility and remove unnecessary imports.

The final commit is an incomplete tbatching optimization. We don't need to recompute tbatches for every epoch, so it makes sense to do some type of caching. Also, we don't need to recompute them for every run either, assuming we can load the entire dict of tbatches into memory and do random access on each dict (needed to account for user changes to timespan).

However, the current code isn't set up to incorporate these changes easily, because chunks of t-batches are computed on-the-fly, trading off with the corresponding chunk of the epoch. So one would have to compute the "start" and "end" points of each tbatch chunk so that the epoch chunk can access the right tbatches.

The code in the 4th commit is not just unoptimized, but buggy. In the first epoch, the tbatch dicts keep growing, as args.cache_tbatches=True removes tbatch reinitialization. But the epoch still iterates over the full length of the tbatch dicts. There are a two competing ways one could fix this:

  1. Revert to reinitializing the tbatch chunk every time, but save the tbatch chunks to disk.

  2. Tell the epoch where to access the tbatch dict, instead of starting from the beginning.

2 seems easier, but I will leave that to the code maintainers' judgement :)

@pmixer
Copy link
Contributor

pmixer commented Jul 7, 2020

Hi @jpalowitch, I can help finishing&testing cached t-batch feature as synced-up with Prof. Kumar last week, are you still interested/in-need of it?

@jpalowitch
Copy link
Author

Hi @jpalowitch, I can help finishing&testing cached t-batch feature as synced-up with Prof. Kumar last week, are you still interested/in-need of it?

Yes, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants