Skip to content

Conversation

@ijpulidos
Copy link
Contributor

These changes aim to add support to be able to perform protein mutations in the current SetupUnit that the NonEquilibriumCyclingProtocol uses.

This mostly implies making sure that the alchemical component is now not assumed to be a SmallMoleculeComponent but also a ProteinComponent.

Utility functions for achieving such purpose were written or modified accordingly. Someof these changes were needed to be made on the openfe side of things, since we have not yet decided on migrating them to feflow (this code base).

@ijpulidos ijpulidos marked this pull request as draft January 30, 2025 18:03
@ijpulidos ijpulidos linked an issue Jan 30, 2025 that may be closed by this pull request
@ijpulidos ijpulidos marked this pull request as ready for review September 17, 2025 02:59
@ijpulidos
Copy link
Contributor Author

This should be somewhat ready to go, but we first need to figure out where are the utils from OpenFreeEnergy/openfe#1106 are going to end up, because those are needed for these changes to work.

@codecov
Copy link

codecov bot commented Sep 24, 2025

Codecov Report

❌ Patch coverage is 87.50000% with 22 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (protein-mutation-protocol@00c1f15). Learn more about missing BASE report.

Files with missing lines Patch % Lines
feflow/utils/vendored.py 85.93% 9 Missing ⚠️
feflow/utils/misc.py 84.78% 7 Missing ⚠️
feflow/protocols/nonequilibrium_cycling.py 90.16% 6 Missing ⚠️
Additional details and impacted files
@@                     Coverage Diff                      @@
##             protein-mutation-protocol     #106   +/-   ##
============================================================
  Coverage                             ?   83.71%           
============================================================
  Files                                ?       16           
  Lines                                ?     1597           
  Branches                             ?        0           
============================================================
  Hits                                 ?     1337           
  Misses                               ?      260           
  Partials                             ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ijpulidos ijpulidos requested a review from IAlibay September 24, 2025 23:41
@ijpulidos
Copy link
Contributor Author

@IAlibay this one should be ready for review. Please note that the vendored.py adds a lot of lines of code to the changes, but these are basically the same that we had on openfe only that I had to copy a good part of the topology_helpers.py module (see OpenFreeEnergy/openfe#1539 for more information)

@ijpulidos
Copy link
Contributor Author

ijpulidos commented Oct 24, 2025

@IAlibay I have found issues with some specific mappings. I think we need to postpone this one while I tackle these issues. I'm getting KeyError.

@ijpulidos ijpulidos changed the title Protein mutation support in Neq Cycling Setup Unit [DNM] Protein mutation support in Neq Cycling Setup Unit Oct 24, 2025
Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An initial quick review - I mostlly am not very convinced with the idea of doing the template registration / partial charge assignment in a separate place and then using offmols with possibly different partial charges everywhere else.

solvent_comp_a = (
solvent_comps.pop() if solvent_comps else None
) # Get the first component if exists
protein_comps_a = get_typed_components(state_a, ProteinComponent)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch these to gufe.ChemicalSystem.get_components_of_type?

ligand_a = ligand_mapping.componentA
ligand_b = ligand_mapping.componentB

solvent_comps = get_typed_components(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally it would be better to just asset a single solvent component in the protocol validator than to have to work around these type of things.

i.e. currently there zero cases where having more than 1 SolventComponent could make sense, indeed even with mixed solvent systems, we're likely to just create a single solvencomponent that would contain a list of smiles rather than stacking them.

all_openff_mols = list(
chain(all_alchemical_mols.values(), common_small_mols.values())
)
self._assign_openff_partial_charges(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't assign partial charges at this point, you'll end up with bad things happening - i.e. variable partial charges everywhere being assigned with the wrong partial charge method.

try:
# Minimize
openmm.LocalEnergyMinimizer.minimize(context)
# Optionally store minimized topology -- Mostly for debugging purposes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not always store it?

# This can be populated however we want
return outputs

# TODO: Maybe this could be a utility function. Is this something protocol-specific?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mostly specific to this Protocol - also this should go in the Protocol validate method.

# print(f"Free energy = {fe_estimate} +/- {fe_error}") # DEBUG

@pytest.mark.skip(
reason="Ambertools failing to parameterize. Review when we have full nagl."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be no reason why ambertools fails to generate charges for toluene.. this seems to be a bug not something that should wait for replacement.


# Generate and register FF parameters in the system generator template
all_openff_mols = [comp.to_openff() for comp in all_small_mols]
register_ff_parameters_template(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing this and then using a different set of offmols elsewhere (with different partial charges) is a bad hack imho - because if either the templating behaviour changes or anything else ends up needing partial charges that doesn't go through templates you'll end up with a bunch of offmols downstream that have the wrrong thing.

I would strongly suggest generating the offmols & charge them ahead of them and keep them around.

If the component does not support the necessary conversion methods.
"""

try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because technically protein components can try to go via openff, I wouldn't rely on an AttributeErrorr check here - I would isinstance check and enforce the correct route.

return topology


def get_positions_from_component(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium term, maybe we should just add to_positions methods in the Components themselves?

ValueError
If the atom index is not found in the topology.
"""
for residue in topology.residues():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than looping over residues and then over atoms, why not just loop over atoms, and get the residue? It would avoid you doing an extra for loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor setup unit to work with protein mutation protocol

2 participants