Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Use pregenerate figures #96

Merged
merged 14 commits into from
Oct 10, 2024
Merged

[ENH] Use pregenerate figures #96

merged 14 commits into from
Oct 10, 2024

Conversation

surchs
Copy link
Contributor

@surchs surchs commented Sep 25, 2024

Changes

  • code to prerender figures
  • load figures from pickled file
  • in app: use prerendered figures in barplot callback
  • meta: updated submodules to use ssh instead of https so setup works with public key auth

NOTE: this means the app won't work without the prerendered figures (they are now in VCS).
Also Note: loading the prerendered figures takes a lot of time, about 45s. However it seems that this only happens once on app startup.

@surchs surchs requested a review from alyssadai September 25, 2024 23:38
@surchs
Copy link
Contributor Author

surchs commented Sep 27, 2024

hey @alyssadai, you said you had a comment on the submodules. I can't see the comment. Did you maybe not submit your review yet?

Copy link
Collaborator

@alyssadai alyssadai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @surchs, thanks a lot for taking this on! 🎉

I think there's one other place in the code we should use your nice new pregenerated figures, + some other suggestions from me below for staying consistent with other modules.

code/create_prerendered_figures.py Outdated Show resolved Hide resolved
code/create_prerendered_figures.py Outdated Show resolved Hide resolved
code/create_prerendered_figures.py Outdated Show resolved Hide resolved
.gitmodules Outdated Show resolved Hide resolved
climate_emotions_map/app.py Show resolved Hide resolved
surchs and others added 3 commits September 27, 2024 16:40
@surchs
Copy link
Contributor Author

surchs commented Sep 27, 2024

So what should we do with this NUM_DECIMALS thing? It's a constant, it's the same for all figures. We could make it part of the figure lookup key and then if it would ever change the app would crash and we might find out why and then regenerate the figures. That could work

@alyssadai
Copy link
Collaborator

alyssadai commented Sep 27, 2024

So what should we do with this NUM_DECIMALS thing? It's a constant, it's the same for all figures. We could make it part of the figure lookup key and then if it would ever change the app would crash and we might find out why and then regenerate the figures. That could work

I'm not sure what you're suggesting. Are you asking if there's a way we can make the code signal to us that there is a different number of decimals being used in the pregenerated figures vs. other ones generated dynamically in the app, if they all use NUM_DECIMALS but the pregenerated figure script is run beforehand?

@alyssadai
Copy link
Collaborator

Oh wait, I think I might understand: are you suggesting that we hardcode the value of NUM_DECIMALS into the pregenerated figures .pkl, when we run the fig generation script, so that if we ever change NUM_DECIMALS in the main app, it won't find any matches in the .pkl?

@surchs surchs requested a review from alyssadai September 27, 2024 21:33
@surchs
Copy link
Contributor Author

surchs commented Oct 8, 2024

@alyssadai do you have additional changes requested here or is this good to go?

@alyssadai
Copy link
Collaborator

Thanks for the ping @surchs! Sorry for the delay, I think your review request slipped my radar somehow...I'd like to try it out locally, and will then approve!

@surchs
Copy link
Contributor Author

surchs commented Oct 8, 2024

No rush, was just thinking about it today.

Copy link
Collaborator

@alyssadai alyssadai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @surchs, thanks for the changes! I've tried out the app with the pregenerated figs locally, and can confirm they speed things up substantially! It does also look like the pickle loading only happens once on startup 🥳

That said, I noticed that I previously missed that your PR seems to be using an old version of the data/ submodule - see my comment below. Once we address that, I think this is good to merge!

data Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed this in my last review but I think the version of the data/ submodule you ran your fig generation script on might be outdated - it looks like this PR is trying to point to an older version (32bb92 should actually be the latest commit).

You can confirm this by double-checking if you have Delaware, Washington DC, Maryland (Cluster C) in the state dropdown in your local branch, or the old cluster name Delaware, District of Columbia, Maryland (Cluster C)

On my end, when I have the latest version of the submodule checked out, trying to select "Delaware, Washington DC, Maryland (Cluster C)" with your PR branch gives a key error:

Traceback (most recent call last):
  File "D:\climate_opinion_dashboard\us_climate_emotions_map\climate_emotions_map\app.py", line 330, in update_stacked_bar_plots
    figures.append(PRERENDERED_BARPLOTS[figure_lookup_key][question])
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: ('Delaware, Washington DC, Maryland (Cluster C)', False, None, 1)

To resolve this, could you bring your branch up to date with main, run git submodule update, and then regenerate the figures?

Also, this makes me realize that, for completeness, we should probably ensure that we regenerate prerendered_figures.pkl whenever there's an update to the data submodule (I don't expect there to be more, but just in case). I think we could do this automatically, using the wf we have to auto-pull and open PRs for submodule updates on a cronjob. Specifically, I think in the below step we could just add another line to run create_prerendered_figures.py:

- name: Update submodule
run: |
cd data
git switch main && git pull
cd ..

Wdyt? If you'd prefer to not handle that change in this PR, I could also do it in a separate one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see, I'm up to date with main:

* 82a0c35 Update data
* a518cdd Use prerendered figures everywhere
* 431a8a4 More suggestions from review
* e047ab0 Revert "[ENH] Switch to ssh submodules"
* bffac74 Apply suggestions from code review
*   4ec6353 Merge remote-tracking branch 'origin/main' into pregenerate_figures
|\  
| * 6771c1a Rename "District of Columbia" to "Washington DC" (#95)
| * 80cbefd [MNT] Switch to 1 decimal place for all plots (#94)
| * 4ab4413 [ENH] Add user help text and disable all drag + zoom plot interactions (#89)

But I will update the submodule!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I didn't see this:

Also, this makes me realize that, for completeness, we should probably ensure that we regenerate prerendered_figures.pkl whenever there's an update to the data submodule (I don't expect there to be more, but just in case). I think we could do this automatically, using the wf we have to auto-pull and open PRs for submodule updates on a cronjob. Specifically, I think in the below step we could just add another line to run create_prerendered_figures.py:

iiuc, you want to regenerate the figures every day? That seems a tad excessive, no? that's an expensive / long job to run on a hunch. Also, as you say: the data are quite unlikely to change again so not sure we want to have this repo regenerate these figures until we remember to turn that job off again to save a single PR.

@surchs surchs requested a review from alyssadai October 9, 2024 21:59
Copy link
Collaborator

@alyssadai alyssadai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @surchs! Two final suggestions below - I realized I forgot about installing Python dependencies in the workflow - and then 🧑‍🍳. Feel free to merge directly after the changes!

.github/workflows/update_submodule.yml Outdated Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I just realized that we should also install the Python dependencies to ensure that your nice script doesn't error out during the wf 🤦‍♀️

can't suggest in this part of the file, but I think

    - name: Set up Python 3.11
      uses: actions/setup-python@v5
      with:
        python-version: "3.11"

    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install -r requirements.txt

after the checkout step should do the trick!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, good point

surchs and others added 2 commits October 9, 2024 21:25
@surchs surchs merged commit e2ba7d0 into main Oct 10, 2024
1 check passed
@surchs surchs deleted the pregenerate_figures branch October 10, 2024 01:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate serializing all plots before/at startup to speed up app
2 participants