Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRSDTCON: behavior changes and poor performance in 2024_10 #5764

Open
lrijkels opened this issue Nov 26, 2024 · 8 comments
Open

DRSDTCON: behavior changes and poor performance in 2024_10 #5764

lrijkels opened this issue Nov 26, 2024 · 8 comments

Comments

@lrijkels
Copy link

The convective dissolution keyword for CO2STORE seems to behave differently in OPM-Flow 2024.10, compared to 2024.04. In the simple example deck attached, less CO2 dissolves than in 2024.04. But in other decks the reverse is the case.
Switching on DRSDTCON also leads to quite a severe performance penalty in 2024.10, which did not happen in 2024.04, even when I ran it with the same solver settings as the new default in 2024.10. (Which, by the way, lead to a spectacular improvement in runtime in many models!)
I tried a few other tricks to restore the same behavior as in 2024.04, which used to work in previous issues, but to no avail:

  • multiplying the DRSDTCON parameter by a factor 1000
  • switching on diffusion to force another solver
    These tricks resulted in the same change in behavior as a normal run.

Might this be related to issue OPM/opm-common#4099 , which addressed salinity? Or something else?

image
CO2STORE_DRSDTCON_2024_10.DATA.txt

@totto82
Copy link
Member

totto82 commented Nov 27, 2024

Between 2024.04 and 2024.10 we did large changes in how DRSDTCON is implemented. The new implementation improves the accuracy of the model. We are in the process of updating the manual accordingly. A paper, explaining the new method, "New sub-grid model for convective mixing in field-scale CO2 storage simulation" is accepted for publication in TiPM, but it may take some weeks before it gets available. The method should in theory not lead to performance penalty, so I would very much like to investigate this more. Is the case you shared a good candidate for testing performance?

@lrijkels
Copy link
Author

The new model sounds very interesting and I look forward to the paper. The simple model I attached was just an edited version of an existing OPM example, to illustrate the changes in behavior. It is too simple to illustrate the performance impact, but I'll see if I can construct or edit an example that shows it. I'll post it soon.

@lrijkels
Copy link
Author

Sorry, took a bit longer, but here is a simple case that shows the difference. The ZIP file contains 4 DATA files: 2 sets that are different only in the use of DRSDTCON or not. The names within a set are different simply to make it easier to track the PRT files between OPM versions. In 2024.04 there is no great impact of switching on DRSDTCON: the runtime increases by some 20%. But in 2024.10 the runtime increased by about 250% on my machine, mostly because the assembly time increased by a factor 20. (In another example it even increased by a factor 60. It seems to get worse with larger decks.) The assembly time then dominates total runtime, the solver does not really feel it.
I ran all 2024.04 cases with the command line option --linear-solver="Parameters_2024_10.json" to make the comparison fair.

I hope this helps. he cases don't take long to run, but here are the summaries of the 2024.10 cases without DRSDTCON:
================ End of simulation ===============

Number of MPI processes: 4
Threads per MPI process: 2
Setup time: 5.09 s
Deck input: 0.47 s
Number of timesteps: 60
Simulation time: 97.90 s
Assembly time: 13.38 s (Wasted: 0.0 s; 0.0%)
Well assembly: 0.20 s (Wasted: 0.0 s; 0.0%)
Linear solve time: 48.41 s (Wasted: 0.0 s; 0.0%)
Linear setup: 14.42 s (Wasted: 0.0 s; 0.0%)
Props/update time: 22.71 s (Wasted: 0.0 s; 0.0%)
Pre/post step: 11.06 s (Wasted: 0.0 s; 0.0%)
Output write time: 2.20 s
Overall Linearizations: 420 (Wasted: 0; 0.0%)
Overall Newton Iterations: 360 (Wasted: 0; 0.0%)
Overall Linear Iterations: 1017 (Wasted: 0; 0.0%)

And the 2024.10 case with DRSDTCON:
================ End of simulation ===============

Number of MPI processes: 4
Threads per MPI process: 2
Setup time: 5.37 s
Deck input: 0.67 s
Number of timesteps: 62
Simulation time: 343.64 s
Assembly time: 234.57 s (Wasted: 0.0 s; 0.0%)
Well assembly: 0.32 s (Wasted: 0.0 s; 0.0%)
Linear solve time: 55.23 s (Wasted: 0.0 s; 0.0%)
Linear setup: 15.23 s (Wasted: 0.0 s; 0.0%)
Props/update time: 34.67 s (Wasted: 0.0 s; 0.0%)
Pre/post step: 16.53 s (Wasted: 0.0 s; 0.0%)
Output write time: 2.48 s
Overall Linearizations: 428 (Wasted: 0; 0.0%)
Overall Newton Iterations: 366 (Wasted: 0; 0.0%)
Overall Linear Iterations: 1051 (Wasted: 0; 0.0%)

CO2STORE_DRSDTCON.zip

@atgeirr
Copy link
Member

atgeirr commented Nov 28, 2024

I can confirm the general performance difference you report above exists with the latest master branches as well, in particular assembly time taking 11x as long in my case.

Profiling points to BlackOilConvectiveMixingModule::addConvectiveMixingFlux() as being the bottleneck, and a few function calls deeper, the mutualSolubilitySpycherPruess2005_() function in Brine_CO2.hpp.

I have not seen any obvious bugs or inefficiencies in that code, so it may simply be that the Spycher-Pruess 2005 model is very expensive, having lots of pow, log and exp functions, which in our context is even more expensive that usual since they are applied to the AD objects to get derivaties.

However, this model is not new to the 2024.10 release! @totto82 have some defaults changed to use this model whereas before it was not?

@atgeirr
Copy link
Member

atgeirr commented Nov 28, 2024

I can also confirm that just activating DISGAS is not enough to make it a lot slower, it is the DRSDTCON option that makes the big difference.

@lrijkels
Copy link
Author

Another interesting observation is that higher temperature seems to degrade performance. The example contains a table with temperature versus depth. When I made this 100 degrees throughout, like this
RTEMPVD
0 100
1000 100
/
run time was about twice as long as a cases where the temperature was 50 degrees throughout, with the assembly taking most of the hit.

@totto82
Copy link
Member

totto82 commented Nov 28, 2024

The main reason for the slowdown is that the new code is not optimized and therefore computes the phase partitioning many times per iteration. I already have code that address part of this issue, but I haven't prioritized to get it into the master branch yet. I did some testing today and it reduce the simulation time on the model your shared significantly. It will still be larger than before the release, but that is expected since the new model is more advanced. I will make a PR next week with the speedup. I can also add an option for running the 2024_04 version of the model for backward compatibility.

@lrijkels
Copy link
Author

Thanks for the investigation. For now a bit of patience is a price worth paying for a more advanced model, so we'll just let the computers work a bit longer, and wait for the next version to speed it up again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants