-
-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving performance when performing summary over a day and over a time period #49
Comments
This is problematic due to the start and end of a "day" not being well defined because of timezones. One could imagine summarizing over each hour-period, but I'd really like to find another solution as it comes with a lot of complexity. We've discussed caching like this before, but I think the proper way to improve performance would be to profile the relevant query and find the bottleneck instead of looking into caching straight away. |
What can be done, is that you save the summary of the events at 00:00 daily without the time-zone. So you only write them to the database once. When summarizing over weeks, months, years, all the days in between will still be the same regardless of the time zone as you're essentially summarizing the same chunk of time. However, the difference will be in the first and last day, where the timezone does matter. Therefore you only have to summarize the first and the last day again, by taking the timezone into account, and just add it to the rest of the summarized days, and do analysis from there. This would not decrease the time to calculate the summary throughout the day, but it would have a significant impact on summarizing over big timelines, which will become even more important as more features are added to analyze productivity. |
@nicolae-stroncea That is true, and we've thought about that before. But as long-term analysis is considered to be a rarely-used function it's not a priority. You described the solution well though, which I don't think anyone has taken the time to write down before, so that is appreciated. We might get around to it some day, but there are far more pressing issues and the complexity it would contribute to add a cache like that is just not worth the hassle right now. |
I'll start working on it |
@nicolae-stroncea Glad to hear it! Keep us posted. |
From my understanding of the code, and documentation, every time the localhost page reloads, the top activities in all categories are recalculated from the beginning. If you have a lot of events, that can lead to a timeout( ActivityWatch/activitywatch#217). I think performance could be improved by creating a new table in the database where you store the summarized events. The columns would be:
Key(URLs, domains, app_events, title_events), Value(i.e github.com, localhost:5600), Duration, and Day.
Thus every time the page reloads, it only summarizes the new events, then adds the new domains/urls/etc and updates the existing ones. Then you just retrieve the results. This has two advantages:
@ErikBjare , @johan-bjareholt, what do you guys think? Are there any current initiatives like this already in the works? If not, I can start working on it
The text was updated successfully, but these errors were encountered: