-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data point discontinuous in sampling with --load_fast
(aka Rust board) implementation
#6796
Comments
Correct, data is sampled, and the behavior can be overridden by the flag that you mentioned. This is working as intended. The sampling algorithm has a few attributes that influenced the design choice:
Due to this, we use a reservoir sampling implementation that keeps the last value. You can find it here. Unfortunately, as the population size grows larger than the sample size, it is likely that the algorithm will just keep replacing the latest read value. It was interesting to think about this. I came up with an implementation that attempts to be more fair in keeping a representative sample, with a trade-off in memory usage. I think we would need to put more thought into this if we wanted to submit this change for the actual implementation, but you're welcome to fork our repo and change the implementation in the mean time, if you'd like. Changing the code to something like this:
To compare, the view with the And the sampled view with this implementation (still using the default sample size) looks like this: Having said that, here are a few notes to consider:
|
First of all, thank you for your detailed explanation of sampling. I would like to add some information.
From this python code, It is difficult to always replace the last value to cause the long interruption. tensorboard/tensorboard/backend/event_processing/reservoir.py Lines 223 to 226 in cf27fe0
|
Ah, you are correct! The python algorithm should work. It is the Rust implementation the one with the issue. I didn't think of the Rust implementation at the beginning, and then I guess I was trying to fit an explanation of what happened to the code that I was looking at in python. Anyway... I'll reopen this issue and rename to emphasize that the issue is with the Rust implementation, but honestly we haven't touched that code in a while, the people who wrote it are no longer working with the team, so it's unlikely that we will pick this up any time soon. |
--load_fast
(aka Rust board) implementation
Environment information (required)
Diagnostics
Diagnostics output
For browser-related issues, please additionally specify:
Issue description
There is a significant interruption in data point sampling when using tensorboard.
Using
EventAccumulator
, I checked the data file is complete and use--samples_per_plugin=scalars=10000
also works but slowly.Data file: events.out.tfevents.zip
Reproduce step: open the tfevents file with tensorboard.
The text was updated successfully, but these errors were encountered: