Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up unnecessary copies #8673

Closed
stscijgbot-jp opened this issue Jul 24, 2024 · 13 comments
Closed

Clean up unnecessary copies #8673

stscijgbot-jp opened this issue Jul 24, 2024 · 13 comments

Comments

@stscijgbot-jp
Copy link
Collaborator

Issue JP-3695 was created on JIRA by Melanie Clarke:

In working on JP-3610, we noted that there are sometimes unnecessary copies in pipeline steps, e.g. a copy of the input data is made at the top of the step, then another copy is made in the core algorithm when processing begins.  

We should review all pipeline steps to make sure that only necessary copies are made, for performance optimization.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Maria Pena-Guerrero on JIRA:

Working on cleanup for steps in Detector1 in #8676

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Maria Pena-Guerrero on JIRA:

The steps in Image3 will not be changed as part of this ticket.

@stscijgbot-jp
Copy link
Collaborator Author

stscijgbot-jp commented Aug 27, 2024

Comment by Maria Pena-Guerrero on JIRA:

I did a couple of tests with MIRI image files, since these are the most affected by the memory increase.  One file is part of our regression tests, jw00001001001_01101_00001_mirimage_uncal.fits, and is about 1 GB in size. On both master and the branch, this file took about 3 min to finish and the maximum memory used was about 11 GB. However, the branch run is slightly faster to both finish and reach the max memory usage. Plots:

branch 
!memory_det1_branch_1GB_3min_max11GB.png!

master 

!memory_det1_master_1GB_3min_max11GB.png!

The other file was jw01283001001_03101_00001_mirimage_uncal.fits, which has a size of 2.64 GB. On master, this file took 60 min to run with a maximum memory usage of about 25 GB. On the branch, the file took 25 min to run with a maximum memory usage of about 22 GB. Plots:

branch

!memory_det1_new_branch_2GB.png!

master

!memory_det1_new_master_2GB.png!

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Maria Pena-Guerrero on JIRA:

For completion, here is the pip freeze of my testing environment:

[^pip_freeze.txt]

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Maria Pena-Guerrero on JIRA:

another test with the file I am using to write a memory regression test: jw01024001001_04101_00001_mirimage_uncal.fits

Still,  the branch is slightly faster

branch !memory_det1_small_branch.png|thumbnail!

 

master !memory_det1_small_master.png|thumbnail!

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Maria Pena-Guerrero on JIRA:

I think the combined effect of all these PRs dramatically reduced the memory usage.

In the ASDF repo:

In the stpipe repo:

In the stcal repo:

 

The PR in the jwst repo that removes the unnecessary copies still further improves the memory usage and the running time.

 

@stscijgbot-jp
Copy link
Collaborator Author

Comment by David Law on JIRA:

Is there any way to ballpark what the combined savings after all of these PRs is for some typical data sets?  (Might be easiest after the build is tagged and all related things are merged).  Similar to https://jira.stsci.edu/browse/JP-3716, it would be useful to be able to advertise the impact of all of this work.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Tyler Pauly on JIRA:

I'm trying and failing to replicate the figures shown for branch/master and the 2GB input file, from program jw01283. I used current jwst/master (which pulls the recent stdatmodels and stcal releases) and file "jw01283001001_03101_00001_mirimage_uncal.fits", and profiled with memray. I see this:

!plot_2gb_master0918.png|thumbnail!

Can you confirm those figures are correct?

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Maria Pena-Guerrero on JIRA:

Tyler Pauly can you please give me a pip freeze to compare to what I have?

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Tyler Pauly on JIRA:

I've attached the file here: [^jwst_0918_env.txt]

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Maria Pena-Guerrero on JIRA:

I have just ran my test again on master for that file. Here are my results with a) memray and b) mprof

a) memray

[^memray-flamegraph-pipe_profile.py.86815.html]

b) mprof

!memory_det1.png|thumbnail!

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Maria Pena-Guerrero on JIRA:

Seems like memray keeps track of both the virtual and real memory whereas mprof only follows the real memory usage. In my case the virtual memory rose up to the values we were seeing before, about 66 GB, but the real memory got up to about 23 GB.

@stscijgbot-jp
Copy link
Collaborator Author

Comment by Tyler Pauly on JIRA:

Fixed by #8676

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant