Skip to content

Conversation

@VGPReys
Copy link
Contributor

@VGPReys VGPReys commented Oct 17, 2025

Checklist

  • Tests added for the new code
  • Documentation added for the code changes
  • Modifications / enhancements are reflected on the haddock3 user-manual
  • CHANGELOG.md is updated to incorporate new changes
  • Does not break licensing
  • Does not add any dependencies, if it does please add a thorough explanation

Summary of the Pull Request

This PR adds a new parameters interface_combinations = [] in CNS scoring modules (emscoring and mdscoring).

Each entry must be composed of two coma separated chains
     e.g.: 
     []             -> Consider all interfaces (default)
     ["A,B"]        -> Consider only the interface score between A and B
     ["A,H", "A,L"] -> Sum interface scores between A,H and A,L

Note that the header of the PDB file is not modified, but only the score attribute written in the io.json is affected.

As this addition is used by both emscoring and mdscoring modules, it has been implemented in their shared class CNSScoringModule in module/scoring/__init__.py.

The reading of the score components from PDB files has been displaced from the CNSScoringModule to the HaddockModel.

Finally, small optimization to reduce IO when performing the per_interface_output, where now the PDB files are not read twice.

Unfortunately had to add antibody-antigen pdb structure + psf file for the integration tests...

Related Issue

Closes #1414

@VGPReys VGPReys self-assigned this Oct 17, 2025
@VGPReys VGPReys added enhancement Improving something in the codebase m|emscoring Related to emscoring module m|mdscoring mdscoring module labels Oct 17, 2025
@VGPReys
Copy link
Contributor Author

VGPReys commented Oct 17, 2025

After discussion with Alex:
Let's change the parameter to behave differently, where the use needs to specify the chains of interest: ["A,C", "B,C"]
This would allow more flexibility

@VGPReys VGPReys marked this pull request as draft October 17, 2025 08:01
@VGPReys VGPReys marked this pull request as ready for review October 22, 2025 13:01
@AnnaKravchenko
Copy link
Contributor

It’s “interface_combinationS” not ‘interface_combination” - would be nice to edit this PR description

@AnnaKravchenko
Copy link
Contributor

AnnaKravchenko commented Oct 27, 2025

Smth not right. If I use this new parameter - my run crushes:

[2025-10-27 12:19:07,133 __init__ INFO] [emscoring] CNS jobs have finished
[2025-10-27 12:19:07,135 libutil ERROR] local variable 'interface_score' referenced before assignment
Traceback (most recent call last):
  File "/trinity/login/arha/dev-h3/haddock3/src/haddock/libs/libutil.py", line 382, in log_error_and_exit
    yield
  File "/trinity/login/arha/dev-h3/haddock3/src/haddock/clis/cli.py", line 193, in main
    workflow.run()
  File "/trinity/login/arha/dev-h3/haddock3/src/haddock/libs/libworkflow.py", line 43, in run
    step.execute()
  File "/trinity/login/arha/dev-h3/haddock3/src/haddock/libs/libworkflow.py", line 173, in execute
    self.module.run()  # type: ignore
  File "/trinity/login/arha/dev-h3/haddock3/src/haddock/modules/base_cns_module.py", line 61, in run
    self._run()
  File "/trinity/login/arha/dev-h3/haddock3/src/haddock/modules/scoring/emscoring/__init__.py", line 92, in _run
    output_haddock_models = self.update_pdb_scores(interface_combinations)
  File "/trinity/login/arha/dev-h3/haddock3/src/haddock/modules/scoring/__init__.py", line 180, in update_pdb_scores
    haddock_score = self.compute_interfaces_score(
  File "/trinity/login/arha/dev-h3/haddock3/src/haddock/modules/scoring/__init__.py", line 251, in compute_interfaces_score
    selected_interfaces_scores.append(interface_score)
UnboundLocalError: local variable 'interface_score' referenced before assignment
[2025-10-27 12:19:07,138 libutil ERROR] local variable 'interface_score' referenced before assignment

More details:

My tolms:
Version1:

[topoaa]
[emscoring]
interface_combinations = ["A,B"]
[caprieval]

Version2:

[topoaa]
[emref]
[emscoring]
interface_combinations = ["A,B"]
[caprieval]

If I do not use interface_combinations - no errors, [emscoring] ( or smth else? since “ [emscoring] CNS jobs have finished” before error occured) doing fine.

Even more details: I have 400 DNA-ligand flexref models, which I modified by splitting DNA into chains A and C, remerging back with ligand (chain B), and putting all 400 models into ensemble. The idea now is to emsore this ensemble with 1. no interface_combinations, 2. interface_combinations = ["A,B”, ”C,B”] and 3. just because I can - only [“A,B”].

Runs here: /trinity/login/arha/test_pr

@AnnaKravchenko
Copy link
Contributor

Ah, I see a big issue with my testing. Chain B does not exist after emscoring! becuase I mess up my topology.
So now this behavior is due to the attempt to evaluate interface using non-existing chain name.

@AnnaKravchenko
Copy link
Contributor

Turned out atm it dose not matter if chain B exists or not - same behavior in both cases.
Sorry for the messy commetns!

@AnnaKravchenko
Copy link
Contributor

AnnaKravchenko commented Oct 27, 2025

Now [emscoring] finished sucesfully. But libutinl in [caprieval] failed:

[2025-10-27 14:59:25,248 libutil ERROR] '<' not supported between instances of 'NoneType' and 'NoneType'
Traceback (most recent call last):
  File "/trinity/login/arha/dev-h3/haddock3/src/haddock/libs/libutil.py", line 382, in log_error_and_exit
    yield
  File "/trinity/login/arha/dev-h3/haddock3/src/haddock/clis/cli.py", line 193, in main
    workflow.run()
  File "/trinity/login/arha/dev-h3/haddock3/src/haddock/libs/libworkflow.py", line 43, in run
    step.execute()
  File "/trinity/login/arha/dev-h3/haddock3/src/haddock/libs/libworkflow.py", line 173, in execute
    self.module.run()  # type: ignore
  File "/trinity/login/arha/dev-h3/haddock3/src/haddock/modules/__init__.py", line 246, in run
    self._run()
  File "/trinity/login/arha/dev-h3/haddock3/src/haddock/modules/analysis/caprieval/__init__.py", line 228, in _run
    extract_data_from_capri_class(
  File "/trinity/login/arha/dev-h3/haddock3/src/haddock/modules/analysis/caprieval/capri.py", line 924, in extract_data_from_capri_class
    ranked_data = rank_according_to_score(
  File "/trinity/login/arha/dev-h3/haddock3/src/haddock/modules/analysis/caprieval/capri.py", line 810, in rank_according_to_score
    score_rankkey_values.sort(key=lambda x: x[1])
TypeError: '<' not supported between instances of 'NoneType' and 'NoneType'

Both when reference_fname is given (and I made sure reference has correct number of chains and the names) and when no reference is given. Same error in both cases.

Copy link
Contributor

@AnnaKravchenko AnnaKravchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same workflow with interface_combinations commented out runs without this error, so this error is probably caused by new addition to the code.

@VGPReys
Copy link
Contributor Author

VGPReys commented Oct 27, 2025

let me investigate...
This must be related to the fact that the function computing the interface combination can return None if none of the interfaces are found in the input model

@VGPReys
Copy link
Contributor Author

VGPReys commented Oct 27, 2025

After some investigations, it has been observed that for the use of the interface_combinations parameter, another parameter per_interface_scoring = true was required.

I applied a small patch to the code to temporary set the per_interface_scoring to True when interface_combinations != [], making sure the interfaces scores are written in the PDB headers as REMARK, thus allowing to read the interface score and compute the desired combination(s).

The original value of the per_interface_scoring set by the user is kept and used when outputting the results.

Copy link
Contributor

@AnnaKravchenko AnnaKravchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double interface like interface_combinations = ["A,C","C,B”] is functional, both with per_interface_scoring set to true and false.
Single interface like interface_combinations = ["A,C”] with per_interface_scoring set to true and false creates expected headers in emscoring.pdb:

REMARK Interface Chain1 Chain2 HADDOCKscore Evdw Eelec Edesol BSA
REMARK Interface: A B -40.0928 -30.984 -40.9368 -0.921424 573.25
REMARK Interface: A C -211.158 -196.942 -565.691 98.9224 3194.26
REMARK Interface: B C -38.1605 -26.0944 -55.2351 -1.01912 530.589
REMARK Total HADDOCK score without restraints: -289.411

But then libutil (in between [emscoring] and [caprieval] ) crushes:

[2025-10-28 12:01:37,899 libutil ERROR] local variable 'score' referenced before assignment
Traceback (most recent call last):
  File "/trinity/login/arha/dev-h3/haddock3/src/haddock/modules/scoring/__init__.py", line 196, in update_pdb_scores
    score = self.compute_interfaces_score(
  File "/trinity/login/arha/dev-h3/haddock3/src/haddock/modules/scoring/__init__.py", line 279, in compute_interfaces_score
    raise ValueError
ValueError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/trinity/login/arha/dev-h3/haddock3/src/haddock/libs/libutil.py", line 382, in log_error_and_exit
    yield
  File "/trinity/login/arha/dev-h3/haddock3/src/haddock/clis/cli.py", line 193, in main
    workflow.run()
  File "/trinity/login/arha/dev-h3/haddock3/src/haddock/libs/libworkflow.py", line 43, in run
    step.execute()
  File "/trinity/login/arha/dev-h3/haddock3/src/haddock/libs/libworkflow.py", line 173, in execute
    self.module.run()  # type: ignore
  File "/trinity/login/arha/dev-h3/haddock3/src/haddock/modules/base_cns_module.py", line 61, in run
    self._run()
  File "/trinity/login/arha/dev-h3/haddock3/src/haddock/modules/scoring/emscoring/__init__.py", line 92, in _run
    output_haddock_models = self.update_pdb_scores(interface_combinations)
  File "/trinity/login/arha/dev-h3/haddock3/src/haddock/modules/scoring/__init__.py", line 206, in update_pdb_scores
    haddock_score = score

@VGPReys
Copy link
Contributor Author

VGPReys commented Oct 28, 2025

Yes, I was throwing a ValueError but excepting a TypeError, my bad.
It is fixed now

@AnnaKravchenko
Copy link
Contributor

AnnaKravchenko commented Oct 28, 2025

Seems smth is still not right, or I did smth wrong:

In workflow: interface_combinations = ["A,C”]; per_interface_scoring = true
In emscoring_1.pdb:

REMARK Interface Chain1 Chain2 HADDOCKscore Evdw Eelec Edesol BSA
REMARK Interface: A B -40.0928 -30.984 -40.9368 -0.921424 573.25
REMARK Interface: A C -211.158 -196.942 -565.691 98.9224 3194.26
REMARK Interface: B C -38.1605 -26.0944 -55.2351 -1.01912 530.589
REMARK Total HADDOCK score without restraints: -289.411

In capri_ss:
../1_emscoring/emscoring_1.pdb - 21 -292.491

So capri shows a full score between all available chains (listed also in emscoring_1.pdb as REMARK HADDOCK score: -292.491). But I expect only chains A and C.

@VGPReys
Copy link
Contributor Author

VGPReys commented Oct 28, 2025

In workflow: interface_combinations = ["A,C”]; per_interface_scoring = true In emscoring_1.pdb

What if you try interface_combinations = ["A,C"] ?
maybe an issue of the character != " ?

@AnnaKravchenko
Copy link
Contributor

In workflow: interface_combinations = ["A,C”]; per_interface_scoring = true In emscoring_1.pdb

What if you try interface_combinations = ["A,C"] ? maybe an issue of the character != " ?

Unfortunately no, this is jsut a quirk of github. Both my quotes are the same, take a look at the cfg: /trinity/login/arha/test_pr/run1/data/configurations/raw_input.toml

I also tried the interface_combinations = ["A,C", "A,C”] - and capri_ss correctly shows -211.158*2 value.

@VGPReys
Copy link
Contributor Author

VGPReys commented Oct 28, 2025

so when chain combinations are duplicated it is functional but when only one chain is there it is not ?

@AnnaKravchenko
Copy link
Contributor

AnnaKravchenko commented Oct 28, 2025

so when chain combinations are duplicated it is functional but when only one chain is there it is not ?

Yes! Only one pair of chains, i.e. only 1 interface. I think this is what you meant

@VGPReys
Copy link
Contributor Author

VGPReys commented Oct 28, 2025

I cannot reproduce it so far...
Let's discuss it tomorrow in person I guess

@AnnaKravchenko AnnaKravchenko self-requested a review October 29, 2025 11:03
AnnaKravchenko
AnnaKravchenko previously approved these changes Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Improving something in the codebase m|emscoring Related to emscoring module m|mdscoring mdscoring module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow the definition of chain combinations to be used for scoring

4 participants