Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving performance when performing summary over a day and over a time period #49

Open
nicolae-stroncea opened this issue Aug 11, 2018 · 5 comments

Comments

@nicolae-stroncea
Copy link
Member

nicolae-stroncea commented Aug 11, 2018

From my understanding of the code, and documentation, every time the localhost page reloads, the top activities in all categories are recalculated from the beginning. If you have a lot of events, that can lead to a timeout( ActivityWatch/activitywatch#217). I think performance could be improved by creating a new table in the database where you store the summarized events. The columns would be:

Key(URLs, domains, app_events, title_events), Value(i.e github.com, localhost:5600), Duration, and Day.

Thus every time the page reloads, it only summarizes the new events, then adds the new domains/urls/etc and updates the existing ones. Then you just retrieve the results. This has two advantages:

  1. Decreases the time to calculate the summary for the day, as you perform a lot fewer calculations.
  2. Makes analyzing statistics for a week, month, year a lot faster since the table will have a summary of the events for every day. Therefore, you just retrieve the right amount of days and perform analysis on them.

@ErikBjare , @johan-bjareholt, what do you guys think? Are there any current initiatives like this already in the works? If not, I can start working on it

@ErikBjare
Copy link
Member

ErikBjare commented Aug 11, 2018

This is problematic due to the start and end of a "day" not being well defined because of timezones. One could imagine summarizing over each hour-period, but I'd really like to find another solution as it comes with a lot of complexity.

We've discussed caching like this before, but I think the proper way to improve performance would be to profile the relevant query and find the bottleneck instead of looking into caching straight away.

@nicolae-stroncea
Copy link
Member Author

What can be done, is that you save the summary of the events at 00:00 daily without the time-zone. So you only write them to the database once. When summarizing over weeks, months, years, all the days in between will still be the same regardless of the time zone as you're essentially summarizing the same chunk of time. However, the difference will be in the first and last day, where the timezone does matter. Therefore you only have to summarize the first and the last day again, by taking the timezone into account, and just add it to the rest of the summarized days, and do analysis from there.

This would not decrease the time to calculate the summary throughout the day, but it would have a significant impact on summarizing over big timelines, which will become even more important as more features are added to analyze productivity.

@ErikBjare
Copy link
Member

ErikBjare commented Aug 15, 2018

@nicolae-stroncea That is true, and we've thought about that before. But as long-term analysis is considered to be a rarely-used function it's not a priority. You described the solution well though, which I don't think anyone has taken the time to write down before, so that is appreciated.

We might get around to it some day, but there are far more pressing issues and the complexity it would contribute to add a cache like that is just not worth the hassle right now.

@nicolae-stroncea nicolae-stroncea changed the title Improving performance when performing summary Improving performance when performing summary over a day and over a time period Aug 17, 2018
@nicolae-stroncea
Copy link
Member Author

I'll start working on it

@ErikBjare
Copy link
Member

@nicolae-stroncea Glad to hear it! Keep us posted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants