Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare for multiple potentials by defining a list of potentials #332

Merged
merged 15 commits into from
Jan 6, 2023

Conversation

elindgren
Copy link
Collaborator

@elindgren elindgren commented Nov 29, 2022

Adds support for multiple potentials in gpumd.

Use cases:

  • Evaluating multiple potentials over a trajectory
  • Need a combination of PESs, for instance for ground/excited states.
  • On-the-fly calculation of model uncertainty

Checklist

  • Prepare force to handle a list of Potentials
  • Let force take multiple potentials
  • Logic for which potential to use in force->compute.
  • Add check for only allowing multiple NEP potentials.
  • Logic for which potential to run MD with
    • Average of all potentials
    • One of the potentials (the first)
  • Dump potential/forces/virials for all potentials

Fixes #239

@elindgren elindgren changed the title Prepare for multiple potentials by defining a list of potentials WIP: Prepare for multiple potentials by defining a list of potentials Nov 29, 2022
@elindgren
Copy link
Collaborator Author

@brucefan1983 I've started work on adding support for multiple potentials.
I modified force to take a list of potentials instead of a single potential. Then, in force->compute, one would have to evaluate each of the potentials. The issue then becomes how to store all the results from the

I have two questions:

  • How do you think one should store the results from all the calls to potential->compute? For the average results one could just add up all the results and then divide by the number of potentials, but that doesn't work for more complicated combinations of the potentials.
  • Do you think that the approach of having a list of potentials makes sense?

@elindgren
Copy link
Collaborator Author

ping @erhart1

@brucefan1983
Copy link
Owner

How do you think one should store the results from all the calls to potential->compute? For the average results one could just add up all the results and then divide by the number of potentials, but that doesn't work for more complicated combinations of the potentials.

Isn't this only for NEP models? Does it make sense for other potentials?

Do you think that the approach of having a list of potentials makes sense?

There had been a list of potentials in older verions where we allowed for hybridization. So defining a list (array) of potentials is not a problem in the code at least.

@erhart1
Copy link
Collaborator

erhart1 commented Nov 29, 2022

It sounds like the case you are considering is already the specialized case of using the model ensemble to predict uncertainties.
The more basic case is to enable evolving on one PES using one model and then adding other models as "observers" that are evaluated with a certain interval (possibly at every time step) but that do not necessarily require evaluating all properties.
For example, the energy alone could be sufficient.

I suggest to also think about how this is supposed to be specified from the user side in the input file.
This should help in clarifying which data need to generated and stored.

@elindgren
Copy link
Collaborator Author

How do you think one should store the results from all the calls to potential->compute? For the average results one could just add up all the results and then divide by the number of potentials, but that doesn't work for more complicated combinations of the potentials.

Isn't this only for NEP models? Does it make sense for other potentials?

Do you think that the approach of having a list of potentials makes sense?

There had been a list of potentials in older verions where we allowed for hybridization. So defining a list (array) of potentials is not a problem in the code at least.

I agree that this really only makes sense for NEP potentials, but I don't see the point of having the list of potentials only be NEPs. From what I can tell the code makes it seem like it's simpler to allow any combination of potentials.

@elindgren
Copy link
Collaborator Author

It sounds like the case you are considering is already the specialized case of using the model ensemble to predict uncertainties. The more basic case is to enable evolving on one PES using one model and then adding other models as "observers" that are evaluated with a certain interval (possibly at every time step) but that do not necessarily require evaluating all properties. For example, the energy alone could be sufficient.

I suggest to also think about how this is supposed to be specified from the user side in the input file. This should help in clarifying which data need to generated and stored.

Having a list of potentials also covers the basic use case that you point out, the only thing that differs between the different use cases is when to run potential->compute with each of the potentials, and how to aggregate the results. Here is how I envision the three different use cases I list above:

  • Multiple potentials over trajectory: One main potential used to run the MD, the others run potential->compute with some interval dump_observer and dumped to an out-file.
  • Combined PES: All potentials are evaluated for every timestep, and then the results are averaged together.
  • Model uncertainty: All potentials are evaluated at every timestep, alternatively at some interval dump_observer, and then the average and standard deviation of the computed properties are computed. This uncertainty can then be dumped to file.

Here is my sketch for the run.in interface:

potential nep0.txt ...         # Potential0
potential nep1.txt ...         # Potential1
observer_mode observe  # one of observe, average, uncertainty, corresponding to the cases above.
dump_observer 1000       # Dump frequency with which to evaluate the potentials and dump to file. 

Here one has to take care to write the potentials in the correct order, but I think that is a pretty intuitive for the user.

@brucefan1983
Copy link
Owner

Sounds a good plan!

@brucefan1983
Copy link
Owner

perhaps it's too early to discuss this, I think observer_mode should be just an option in dump_observer, not a separate keyword.

@elindgren
Copy link
Collaborator Author

@brucefan1983 I've added logic for evaluating all potentials when writing to file (for force and exyz). It looks as follows, in measure.cu:

void Measure::dump_properties_for_all_potentials(
int step,
const double global_time,
std::vector<Group>& group,
Box& box,
Atom& atom,
Force& force,
GPU_Vector<double>& thermo)
{
const int number_of_potentials = force.potentials.size();
for (int potential_index = 0; potential_index < number_of_potentials; potential_index++) {
std::cout << "#### Potential "<< potential_index+1 << "/" << number_of_potentials << "\n";
force.potentials[potential_index]->compute(box, atom.type, atom.position_per_atom,
atom.potential_per_atom, atom.force_per_atom, atom.virial_per_atom);
dump_force.process(step, group, atom.force_per_atom, potential_index);
dump_exyz.process(
step, global_time, box, atom.cpu_atom_symbol, atom.cpu_type, atom.position_per_atom,
atom.cpu_position_per_atom, atom.velocity_per_atom, atom.cpu_velocity_per_atom,
atom.force_per_atom, atom.virial_per_atom, thermo, potential_index);
}
}

Basically each potential writes it's own set of .out-files, like force0.out, force1.out fot potentials potential0, potential1 etc. I have several questions on this:

  • Do you think this approach makes sense?
  • Do I edit the atoms object when evaluating the properties using compute? Would it be better to do local copies of all the arrays instead?
  • If I were to do local copies, how would I do that for the GPU vectors (which are initialized before the run)?

I also have a suspicion that modifying the atoms object directly like this affects the MD, since this is all done before correct_velocity in run.cu:

GPUMD/src/main_gpumd/run.cu

Lines 187 to 201 in 792e664

integrate.compute2(time_step, double(step) / number_of_steps, group, box, atom, thermo);
measure.process(
number_of_steps, step, integrate.fixed_group, global_time, integrate.temperature2,
integrate.ensemble->energy_transferred, box, group, thermo, atom, force);
velocity.correct_velocity(
step, atom.cpu_mass, atom.position_per_atom, atom.cpu_position_per_atom,
atom.cpu_velocity_per_atom, atom.velocity_per_atom);
int base = (10 <= number_of_steps) ? (number_of_steps / 10) : 1;
if (0 == (step + 1) % base) {
printf(" %d steps completed.\n", step + 1);
fflush(stdout);
}

What do you think?

@brucefan1983
Copy link
Owner

I think if the extra method Measure::dump_properties_for_all_potentials is only called when you have the current keyword you proposed, then it does not matter if the force/energy/virial are updated by the potentails in the measurement stage. They only affect later measurement stuff, not the next round of force evaluation.

@brucefan1983
Copy link
Owner

Because this new feature is quite experemental, my suggestion is to make sure that existing features are not affected. You can extend the internel code as you have already done for the dump_exyz part, but the features of dump_exyz should be kept the old way I think.

@elindgren
Copy link
Collaborator Author

I think if the extra method Measure::dump_properties_for_all_potentials is only called when you have the current keyword you proposed, then it does not matter if the force/energy/virial are updated by the potentails in the measurement stage. They only affect later measurement stuff, not the next round of force evaluation.

Alright, I'll add the keyword then and only run the function when it is present.

Because this new feature is quite experemental, my suggestion is to make sure that existing features are not affected. You can extend the internel code as you have already done for the dump_exyz part, but the features of dump_exyz should be kept the old way I think.

I agree, that's why I only extended dump_exyz to have a list of output files instead of modifying it further.

Do you have any suggestions for how the other functionality, i.e., running MD on an average PES? I guess I would need to compute the forces with all potentials and add them to an extra GPU Vector?

@brucefan1983
Copy link
Owner

Yes, you can make this keyword as a new class, similar to Dump_XYZ class. And for the class, you can have GPU_Vector member for the averaged quantities. You can chech some classes such as HAC to see how to do preprocess, process, and postprocess.

@elindgren
Copy link
Collaborator Author

Yes, you can make this keyword as a new class, similar to Dump_XYZ class. And for the class, you can have GPU_Vector member for the averaged quantities. You can chech some classes such as HAC to see how to do preprocess, process, and postprocess.

Makes sense! But how would that allow me to use that computed force as the one to run MD with? I assume you mean this class should be under measure.

@brucefan1983
Copy link
Owner

I see, then you can make new GPU_Vectors for quantities from individual potentials as members in the Force class. Only allocating them if you have the relevant keyword present. Then you can make sure the existing global GPU_Vector for force, energy virial are contain the data you want (from the main potential or the averaged).

@elindgren
Copy link
Collaborator Author

@brucefan1983 I've implemented the changes you suggested :) There is now a new class ´dump_observerthat usesdump_exyz, as well as being responsible for controlling which potential is used in force`.

GPUMD/src/force/force.cu

Lines 385 to 403 in eea2865

static __global__ void gpu_add_vectors(int length, double* a, double* b)
{
// Add vector b onto a
int n = blockIdx.x * blockDim.x + threadIdx.x;
if (n < length) {
a[n] += b[n];
}
}
static __global__ void gpu_divide_and_copy_vector(int length, double* source, double* target, double denominator)
{
// Divide source vector by denominator, and copy into target
int n = blockIdx.x * blockDim.x + threadIdx.x;
if (n < length) {
source[n] /= denominator;
target[n] = source[n];
}
}

I've created two kernels in force, one for adding vectors and one for computing the mean. Do you think this makes sense, or is it overkill to do it like this?

The use of the kernels in force.cu:

GPUMD/src/force/force.cu

Lines 437 to 473 in eea2865

if(strcmp(multiple_potentials_mode_, "observe") == 0)
{
// If observing, calculate using main potential only
potentials[0]->compute(
box, type, position_per_atom, potential_per_atom, force_per_atom, virial_per_atom);
}
else if(strcmp(multiple_potentials_mode_, "average") == 0)
{
// Calculate average potential, force and virial per atom.
for (int i = 0; i < potentials.size(); i++){
potentials[i]->compute(
box, type, position_per_atom, potential_per_atom, force_per_atom, virial_per_atom);
// Save properties to average vectors
gpu_add_vectors<<<(number_of_atoms - 1) / 128 + 1, 128>>>(
number_of_atoms, average_potential_per_atom_.data(), potential_per_atom.data());
CUDA_CHECK_KERNEL
gpu_add_vectors<<<(3*number_of_atoms - 1) / 128 + 1, 128>>>(
3*number_of_atoms, average_potential_per_atom_.data(), potential_per_atom.data());
CUDA_CHECK_KERNEL
gpu_add_vectors<<<(9*number_of_atoms - 1) / 128 + 1, 128>>>(
9*number_of_atoms, average_potential_per_atom_.data(), potential_per_atom.data());
CUDA_CHECK_KERNEL
}
// Compute average and copy properties back into original vectors.
gpu_divide_and_copy_vector<<<(number_of_atoms - 1) / 128 + 1, 128>>>(
number_of_atoms, average_potential_per_atom_.data(), potential_per_atom.data(), potentials.size());
CUDA_CHECK_KERNEL
gpu_divide_and_copy_vector<<<(number_of_atoms - 1) / 128 + 1, 128>>>(
3*number_of_atoms, average_potential_per_atom_.data(), potential_per_atom.data(), potentials.size());
CUDA_CHECK_KERNEL
gpu_divide_and_copy_vector<<<(number_of_atoms - 1) / 128 + 1, 128>>>(
9*number_of_atoms, average_potential_per_atom_.data(), potential_per_atom.data(), potentials.size());
CUDA_CHECK_KERNEL
}
else {
PRINT_INPUT_ERROR("Invalid mode for multiple potentials.\n");
}

Two questions remain:

  1. I'm not too happy with the user interface that I've implemented; it feels a bit weird to control the behavior of force through dump_observer average 100. But at the same time, it's nice to have everything contained in one keyword.
  2. How should we go about testing this feature? Do you have any suitable tests in mind?

@brucefan1983
Copy link
Owner

  1. I also have no better suggestion on the interface. Perhaps just do it like this for the time being. I can ask for opinions from users later.

  2. You should have some use cases? How about make a simple example that I can also play with? For systems if you don't want to make your current study public, you can choose e.g. the silicon or carbon case to train a few?

@elindgren
Copy link
Collaborator Author

  1. I also have no better suggestion on the interface. Perhaps just do it like this for the time being. I can ask for opinions from users later.

Alright 👍

2. You should have some use cases? How about make a simple example that I can also play with? For systems if you don't want to make your current study public, you can choose e.g. the silicon or carbon case to train a few?

I meant if you had any tests that you usually run so that we can make sure that I didn't break anything. As for a multi-potential example, we have some examples at the department that we can probably use.

@brucefan1983
Copy link
Owner

I usually ran the regression tests in tests/, but the reference data are from my Windows platform and my criteria for pass is extremely high: to have exactly the same outputs by comparing the files. You can do it similarly:

  • Compile the base branch using -DDEBUG and run all those tests and save all the outputs with some names that will not be overwritten.
  • Do the same for your current branch.
  • Compare the two sets of files. They should match exactly as there is no randomness when -DDEBUG is on.

@brucefan1983
Copy link
Owner

If you don't want to do it, I can do at some point. That is to assure that most of the existing features are not broken/affected.

Zheyong

@elindgren
Copy link
Collaborator Author

If you don't want to do it, I can do at some point. That is to assure that most of the existing features are not broken/affected.

Zheyong

I tried running the tests, but they segfault on my branch. It seems to be because of a CUDA out of memory error, I'm guessing because the program tries to allocate too much. Checking the memory usage its only ~1GB of 16GB. This only seems to happen if I run multiple run commands; could this perhaps be due to me not freeing up all the GPU vectors I've used?

In dump_observer I just call the postprocess function of dump_exyz. That should be sufficient cleanup, right?

void Dump_Observer::postprocess()
{
dump_ = false;
dump_exyz_.postprocess();
}

A more likely culprit would be the GPU vectors I allocate for the average properties in force.cuh. However, they are only allocated if run with dump_observer average. Where would I free these vectors? I can't seem to find where/if you free the vectors corresponding to the atom object.

private:
int number_of_atoms_ = -1;
bool is_fcp = false;
bool has_non_nep = false;
const char* multiple_potentials_mode_ = "observe"; // "observe" or "average"
GPU_Vector<double> average_potential_per_atom_;
GPU_Vector<double> average_force_per_atom_;
GPU_Vector<double> average_virial_per_atom_;
};

@brucefan1983
Copy link
Owner

There is no need to free GPU memory manually. It is handled by the GPU_Vector class. The calling of some post-processing function might have side effects. I need to study it in detail to understand it.

@elindgren
Copy link
Collaborator Author

There is no need to free GPU memory manually. It is handled by the GPU_Vector class. The calling of some post-processing function might have side effects. I need to study it in detail to understand it.

Okay, I see. I think the error is raised during dos::preprocess, but I agree that it is probably a symptom of some side effect in one of the postprocess() functions. I'm kinda stumped what might be causing it though, but I'm guessing it is because I do something stupid in dump_observer. Do you have time to look it over and see if you can find something that looks weird?

@brucefan1983
Copy link
Owner

Yes i will read your changes this weekend.

@brucefan1983
Copy link
Owner

brucefan1983 commented Dec 17, 2022

As far as I understand, you wanted to reuse some code in Dump_XYZ for Dump_Observer. However, as disussed in this issue #257 , Dump_XYZ might be generalized in the future and I think it is better not to reuse code and implement what you need directly in Dump_Observer. This is, I think it is better to decouple these two classes and Dump_XYZ will be untouched in this PR.

@elindgren
Copy link
Collaborator Author

As far as I understand, you wanted to reuse some code in Dump_XYZ for Dump_Observer. However, as disussed in this issue #257 , Dump_XYZ might be generalized in the future and I think it is better not to reuse code and implement what you need directly in Dump_Observer. This is, I think it is better to decouple these two classes and Dump_XYZ will be untouched in this PR.

Ah I see, I'll reimplement the logic from dump_exyz into dump_observer then. Hopefully that fixes the issue with the segfaults as well 👍

@brucefan1983
Copy link
Owner

Can you see my comments on the code? Perhaps they are not visible to you?

@elindgren
Copy link
Collaborator Author

Can you see my comments on the code? Perhaps they are not visible to you?

No I can't see the comments.

@brucefan1983
Copy link
Owner

oh, that's strange, you can see them before for the message-passing PR.

@elindgren
Copy link
Collaborator Author

oh, that's strange, you can see them before for the message-passing PR.

Yeah I've been able to see review comments before. Maybe they are toggled to be private or something?

@brucefan1983 brucefan1983 self-requested a review December 19, 2022 08:23
@brucefan1983
Copy link
Owner

So you still cannot see this?
image

@elindgren
Copy link
Collaborator Author

So you still cannot see this? image

Nope :/ Maybe it has something to do with my permissions/rights?

@elindgren elindgren changed the title WIP: Prepare for multiple potentials by defining a list of potentials Prepare for multiple potentials by defining a list of potentials Dec 19, 2022
@brucefan1983
Copy link
Owner

Or a bug of github. You are already a collaborator. Ok, I will copy those images here for your reference:

image
image
image
image
image
image

@elindgren
Copy link
Collaborator Author

@brucefan1983 I've fixed the code according to your review comments. I've also gotten the regression tests running, and I get the same results on master and this branch (multiple-potentials). The results on my machine does not match the expected ones as you suspected. Can you check on your machine if it's correct as well?

If it works on your machine as well, then I believe the only remaining step is adding a specific test for multiple nep potentials. I hope to have time to implement such a test either today or tomorrow.

@brucefan1983
Copy link
Owner

brucefan1983 commented Dec 19, 2022

Ok, I will run the regression tests in my computer, and you can go and prepare the example(s) for the new features.

You can create a folder in examples if you want to make it public.

@brucefan1983
Copy link
Owner

I have confirmed that all the regression tests pass in this branch. So you only need to create examples using the new features.

@elindgren
Copy link
Collaborator Author

I have confirmed that all the regression tests pass in this branch. So you only need to create examples using the new features.

I've started with the tests for dump_observer, but unfortunately I got sick so I won't be able to finish them before Christmas. I'll finish it as soon as I'm back in the office (after new years).

@brucefan1983
Copy link
Owner

Sorry to hear that. Have a good rest. It is not in a hurry.

@elindgren
Copy link
Collaborator Author

@brucefan1983 I've finished the tests. Please take a look, and if you're happy I think we are ready to merge this :)

@brucefan1983
Copy link
Owner

I will have a look on friday, thanks

@brucefan1983 brucefan1983 merged commit 18b1903 into master Jan 6, 2023
@brucefan1983 brucefan1983 deleted the multiple-potentials branch January 6, 2023 09:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

multi-NEP models in a single simulation
3 participants