WIP: chempy refactoring #433

ye11owSub · 2025-02-15T20:31:26Z

adding tests for chempy.cpv - Compares the results from cpv.py and Numpy and shows their interchangeability

modules/chempy/cpv.py

JarrettSJohnson · 2025-02-16T23:18:41Z

Won't comment on details at the moment, but currently I do have a couple of high-level comments:

I'm usually all in favor of refactoring code, but doing so should come with some sort of purpose so that there's an overall net positive. Usually there is a cost of doing so ( it is not always free: a708606 98c85b8 Shortcut now missing has_key method #425 ) .
Similar to above I don't think the cpv module posed any sort of developer obstacle that would warrant a change. In fact, instead of investing time into making cpv nicer to use, I think it would also be worth to consider if cpv is even needed in the first place (No code to maintain is better than maintaining the nicest code). Most of this module was written over two decades ago and there now exists libraries (especially numpy) that have much higher usage from a higher number of experts that have thought about linear algebra more than we have. IMO, one step forward would be to consider if we can replace cpv altogether (perhaps keeping the functions not present in numpy/numpy.linalg). This would of course mean that scripts that use cpv would need to be changed to use numpy (which IMO should rely on numpy and not PyMOL to do linear algebra).
I'd rather keep the verbose setting on for CI so that I can see each step of the C++ compilation process.

ye11owSub · 2025-02-17T00:08:19Z

hey @JarrettSJohnson !

I'm usually all in favor of refactoring code, but doing so should come with some sort of purpose so that there's an overall net positive. Usually there is a cost of doing so ( it is not always free: a708606 98c85b8 #425 ) .

This is the cost of poor code quality and lack of test coverage. Refactoring is a method to find and fix these issues.

Similar to above I don't think the cpv module posed any sort of developer obstacle that would warrant a change. In fact, instead of investing time into making cpv nicer to use, I think it would also be worth to consider if cpv is even needed in the first place (No code to maintain is better than maintaining the nicest code). Most of this module was written over two decades ago and there now exists libraries (especially numpy) that have much higher usage from a higher number of experts that have thought about linear algebra more than we have. IMO, one step forward would be to consider if we can replace cpv altogether (perhaps keeping the functions not present in numpy/numpy.linalg). This would of course mean that scripts that use cpv would need to be changed to use numpy (which IMO should rely on numpy and not PyMOL to do linear algebra).

I'm not sure I understand what you mean when you say "there now exists libraries (especially numpy) that have much higher usage". The entire cpv.py module was completely rewritten using numpy in this pull request. This new implementation is compatible with scripts in the pymol-scripts repo and other pymol modules.
In any case, I completely agree with your idea of replacing cpv.py with numpy. However, I assumed that changing the established API to use numpy would be unacceptable. If you believe that replacing all calls of cpv.py in the pymol and pymol-scripts repositories with numpy is a good idea, then I would be happy to do it.

I'd rather keep the verbose setting on for CI so that I can see each step of the C++ compilation process.

done

JarrettSJohnson · 2025-02-17T00:57:53Z

This is the cost of poor code quality and lack of test coverage. Refactoring is a method to find and fix these issues.

Even if the original code quality was poor and without sufficient test coverage, these specific issues were manifested from the refactoring process due to a couple of properties missed from the original code (which were also easily identifiable by using common functionality in PyMOL--and that's on me too for not testing the PR before merging it). I think we should emphasize code coverage a little bit more than class-level refactoring.

If you believe that replacing all calls of cpv.py in the pymol and pymol-scripts repositories with numpy is a good idea, then I would be happy to do it.

Might make sense to first open up an issue there to get insights/opinions from other developers/maintainers, but I'm generally in favor of removing the basic linear algebra functions from cpv.

ye11owSub · 2025-02-17T11:24:28Z

Even if the original code quality was poor and without sufficient test coverage, these specific issues were manifested from the refactoring process due to a couple of properties missed from the original code (which were also easily identifiable by using common functionality in PyMOL--and that's on me too for not testing the PR before merging it).

I'm truly sorry that my previous PR caused issues that you had to fix. However, in my opinion, this is common part of the software development process.
Due to my lack of experience using pymol, it is difficult for me to test any scenarios since. I have never had to work with pymol as a user, so I rely heavily on tests and grep. Actually these small PRs help me understand the project and contribute something useful in the process

I think we should emphasize code coverage a little bit more than class-level refactoring.

Adding commas to docstrings is worthless stuff, but I am trying to make the code more readable. Therefore, I don't see a problem with refactoring at the class level.
You are right that some of the scripts in the project are more than 20 years old and their readability is poor. Taking small steps to improve them is better than doing nothing at all.

speleo3 · 2025-02-17T14:32:35Z

I support Jarrett's assessment here.

Might make sense to first open up an issue there to get insights/opinions from other developers/maintainers

Fully agree.

I like the added tests and type hints from the first commit, but the numpyfy refactoring is too much IMHO.

In my own scripts I always used either only numpy -- taking advantage of all its features and keeping data in numpy arrays -- or chempy.cpv for its simplicity and no numpy dependency. Making chempy.cpv a numpy wrapper feels like combining the disadvantages from both worlds.

ye11owSub · 2025-02-17T16:13:24Z

Might make sense to first open up an issue there to get insights/opinions from other developers/maintainers

No one argued with that

@speleo3 as you wish, there are now only type annotations and tests

TstewDev · 2025-02-18T19:07:29Z

Hello @ye11owSub,

I'm Thomas Stewart, a PyMOL developer and the current Product Manager for PyMOL at Schrödinger. I just wanted to add my thoughts to a few of your comments:

I'm truly sorry that my previous PR caused issues that you had to fix. However, in my opinion, this is common part of the software development process. Due to my lack of experience using pymol, it is difficult for me to test any scenarios since. I have never had to work with pymol as a user, so I rely heavily on tests and grep. Actually these small PRs help me understand the project and contribute something useful in the process

There's no need to apologize but I do hope it helps explain our general reluctance. I agree that fixing issues introduced by PR's is definitely part of the software development process and I don't want reject PR's simply on that basis. However, I would point out that finding and fixing these issues does take developer time and resources that could be spent on more productive tasks. This means that these PR's do come with a cost (reviewing, testing, maintaining, etc.), regardless of how simple they may appear. They need to be impactful enough to justify merging them into the codebase.

You also mention your lack of experience using PyMOL as a user. Not to say that only heavy PyMOL users can contribute to the project, but I'm curious what your motivations are if you're not trying to address an issue with how the app currently functions. I certainly understand the benefits of clean and well-documented code, but I don't view this as being a beneficial use of your time and effort if these files should be replaced completely.

If you (or anyone else reading this) are really interested in making a significant contribution to the project, I would encourage you to play around with PyMOL and try to identify some functionality/features that would benefit from your effort.

Taking small steps to improve them is better than doing nothing at all.

I certainly understand what you're saying here, but I think it's an oversimplification for the reasons I stated above. In addition to the review/maintenance costs, small changes impact git blame, git history, and consistency across files. Refactoring to make the code more readable can be a noble goal when done with a clear objective, however it can also come with a real cost when just making changes for the sake of making changes.

All that being said, I do believe there is real value being added in this PR and the tests for PyMOL certainly should be improved. I just want to explain my thought process when evaluating PR's in general if you plan on submitting more in the future.

ye11owSub · 2025-02-18T21:20:12Z

Hey @TstewDev!
This PR has attracted more attention than it deserves.

There's no need to apologize but I do hope it helps explain our general reluctance. I agree that fixing issues introduced by PR's is definitely part of the software development process and I don't want reject PR's simply on that basis. However, I would point out that finding and fixing these issues does take developer time and resources that could be spent on more productive tasks. This means that these PR's do come with a cost (reviewing, testing, maintaining, etc.), regardless of how simple they may appear. They need to be impactful enough to justify merging them into the codebase.

I understand, each PR has a cost (so let's reduce this cost through testing).

I'm curious what your motivations are if you're not trying to address an issue with how the app currently functions. I certainly understand the benefits of clean and well-documented code, but I don't view this as being a beneficial use of your time and effort if these files should be replaced completely.

The shortest and at the same time the most complete answer is because I can. It's sad to see that, in 6 years, the project has had 80 PRs closed from the open-source community. Pymol is a popular tool for a specific group of people, I'm not one of them, but i have a CS degree and some free time
I didn't find any specific plans for the future development of the project, so I decided to focus on something that was clearly in need of an update.
You say that these files will be completely replaced, but this is only true for the end of the process. There are a lot of things that need to be done before and testing of old code is one of these things. I think it will take a significant amount of time to replace the cpv.py, and even then, it will be replaced with code from these tests.

Refactoring to make the code more readable can be a noble goal when done with a clear objective, however it can also come with a real cost when just making changes for the sake of making changes.

In general I agree, but in this case, I don't think that's the case. If you have a different opinion, that's fine. Let's fix/add/delete what you think is necessary or close this PR and move on. That's OK for me.

I am also currently refactoring the chempy {models.py, __init__.py, io.py}. I wanted to split this into separate PRs, but if you prefer to have more changes per PR, we can set this one on pause.

ye11owSub · 2025-02-18T21:21:25Z

modules/chempy/cpv.py

+               m1[1][0]*m2[0][2] + m1[1][1]*m2[1][2] + m1[1][2]*m2[2][2]],
+             [m1[2][0]*m2[0][0] + m1[2][1]*m2[1][0] + m1[2][2]*m2[2][0],
+               m1[2][0]*m2[0][1] + m1[2][1]*m2[1][1] + m1[2][2]*m2[2][1],
+               m1[2][0]*m2[0][2] + m1[2][1]*m2[1][2] + m1[2][2]*m2[2][2]]]


Matrix multiplication was not implemented correctly. This and code duplication has also been fixed in this PR

TstewDev · 2025-02-21T21:02:00Z

Hello @ye11owSub!

The shortest and at the same time the most complete answer is because I can. It's sad to see that, in 6 years, the project has had 80 PRs closed from the open-source community.

Say no more, welcome the project! The effort you have already put into these PR's is really appreciated and it sounds like you really are serious about making a contribution.

Please forgive my original tone of skepticism, I just know that open-source projects like this can fall victim to developers creating PR's when they have little intention of actually seeing these changes through. It definitely doesn't sound like that's the case here and we welcome all the help we can get.

I understand, each PR has a cost (so let's reduce this cost through testing).

I'm a big fan of adding tests like this and I think it's one of the obvious areas for improvement.

In general I agree, but in this case, I don't think that's the case. If you have a different opinion, that's fine. Let's fix/add/delete what you think is necessary or close this PR and move on. That's OK for me.

I don't actually think I have any issue with this review now that it has this refined scope. I will take another closer look and add any additional comments if necessary.

I am also currently refactoring the chempy {models.py, __init__.py, io.py}. I wanted to split this into separate PRs, but if you prefer to have more changes per PR, we can set this one on pause.

Happy to hear it! I'm normally in favor of splitting these into multiple smaller review but it sounds like these might be quite intertwined? I'll leave it up your judgement but if you feel like there's relevant context that these other changes would provide, feel free to combine them.

ye11owSub · 2025-02-28T22:01:36Z

Hi @TstewDev !
Happy to hear that. Thank you!
For this PR, it is important to demonstrate that the new tests pass before and after the changes.
Therefore, I was focused on fixing the issues in the CI pipeline. I hope someone could also review this PR

JarrettSJohnson · 2025-04-12T14:50:57Z

modules/chempy/models.py

        return sm

-#------------------------------------------------------------------------------
-    def get_nuclear_charges(self):


Why is this and a bunch of other methods removed?

Hi @JarrettSJohnson!
I wanted to discuss this after I finish, but it's probably better to do it as early as possible.
I couldn't find the use of these functions anywhere, so maybe it's a good idea to remove this code?
If you think this code might still be useful, then I'll restore it

It's a public API, so if any users out there are using them, their scripts would break. If this is intended, we should have least have a deprecation period and some alternative path to obtain a similar result.

Copy that.
I'll restore everything that's working properly

ye11owSub · 2025-04-16T10:28:52Z

modules/chempy/models.py

                if b.index[0] == c:
                    indexed.bond.append(b)
            c = c + 1
        self.reset()


@JarrettSJohnson also some functions that are used in other places seems to me that they are not working correctly. For example:

for a in self.bond: for b in a: ...

b in a: - but bond is not iterable class.

Do you have examples of this not working properly? Connected::bond is a nested list, which is a different form than Model::bond.

What doesn't seem to be working though, for a different reason however, is Connected::insert_atom

Traceback (most recent call last): File "D:\PyMOL\pymol\bond_test.py", line 6, in <module> c.insert_atom(0, 0) File "D:\mambaforge\envs\devenv\Lib\site-packages\chempy\models.py", line 612, in insert_atom for a in self.bonds: ^^^^^^^^^^ AttributeError: 'Connected' object has no attribute 'bonds'. Did you mean: 'bond'?

I took the part mentioned above from the convert_to_indexed function.

def convert_to_indexed(self): if chempy.feedback['verbose']: print(" "+str(self.__class__)+": converting to indexed model...") indexed = Indexed() indexed.atom = self.atom indexed.molecule = self.molecule c = 0 for a in self.bond: **for b in a:** if b.index[0] == c: indexed.bond.append(b) c = c + 1 self.reset()

Can you show me the traceback?

cmd.fetch('1obyA') model = cmd.get_model() c = model.convert_to_connected() i = c.convert_to_indexed()

doesn't give me a traceback

This looks correct for the reasons I mentioned. Connected::bond is not a list of Bond. It's a list of list of Bonds.

pymol-open-source/modules/chempy/models.py

Lines 340 to 349 in c208c62

model = Connected()

model.molecule = self.molecule

model.atom = self.atom

model.bond = []

model.index = None

for a in model.atom:

model.bond.append([])

for b in self.bond:

model.bond[b.index[0]].append(b) # note two refs to same object

model.bond[b.index[1]].append(b) # note two refs to same object

line 348/349

still, it doesn't work correctly. The Connected class is inherited from the Base class, which is store bond as a list[Bond], this means that some of the methods of the base class will not work for the Connected class

two more

self = <chempy.models.Indexed object at 0x105cb8590>, index = 0 def remove_bond(self,index): if chempy.feedback['bonds']: print(" "+str(self.__class__)+": removing bond %d." % index) > nBond=len(self.Bond) E AttributeError: 'Indexed' object has no attribute 'Bond'

self = <chempy.models.Connected object at 0x1063bff90>, index = 2 def delete_atom(self,index): if chempy.feedback['atoms']: print(" "+str(self.__class__)+": deleting atom %d." % index) nAtom=self.nAtom # update index if it exists if self.index: idx = self.index for k in list(idx.keys()): if idx[k] > index: idx[k] = idx[k] - 1 del idx[id(self.atom[index])] # delete atom del self.atom[index] # delete bonds associated with this atom nBond = len(self.bond) for a in self.bond: i = 0 templist = [] for b in a: if index in b.index: templist.append(i) i = i + 1 for i in range(len(templist)): j = templist[i] - i del a[j] # re-index bond table for b in self.bond: > if b.index[0] > index: E TypeError: 'builtin_function_or_method' object is not subscriptable

For now, perhaps just rename the Connected::bond to another name to avoid clashing.

nBond=len(self.Bond) line can probably just be removed since the results are unused.

You'll likely run into several issues here. This module was written (and a lot untouched) from decades ago and rarely used by folks (which is why I think effort is probably better spent elsewhere outside this module).

IMO, just fix what you can, and anything that's not fixable, rather than deleting, you can leave them be and we'll have a proper deprecation period for them.

JarrettSJohnson · 2025-04-17T22:30:12Z

I probably won't be able to deeply get into this for a while, but in any case, lets keep the classes in their original files for now to keep the diff more succinct and so that it's easier to compare the pertinent changes.

JarrettSJohnson

Generally looks good. Review is a bit hard to follow when there are massive relocations/formatting changes mixed in with logic changes. I would propose in the future to keep those separate in future reviews so that it's easier to separate what's reorganization versus what I need to focus on for correctness. Otherwise, thanks for the time you spent on this!

Also whenever this is ready for review & merge, please remove the "WIP" from the title.

JarrettSJohnson · 2025-05-04T16:45:41Z

modules/chempy/cpv.py

+   # HAVEN'T YET VERIFIED THAT THIS CONFORMS TO STANDARD DEFT
+   # upd: no, it's not(fixed)


This can be removed.

JarrettSJohnson · 2025-05-04T17:02:46Z

modules/chempy/models.py

-        if chempy.feedback['atoms']:
-            print(" "+str(self.__class__)+": deleting atom %d." % index)
+    def _handle_new_atom(self) -> None:
+        """In case of Connected class we need to add an empty list to slef.bond"""


ye11owSub force-pushed the chempy_refactoring branch 2 times, most recently from 491cd50 to 9233afc Compare February 15, 2025 21:27

speleo3 reviewed Feb 16, 2025

View reviewed changes

modules/chempy/cpv.py Outdated Show resolved Hide resolved

modules/chempy/cpv.py Show resolved Hide resolved

ye11owSub force-pushed the chempy_refactoring branch 4 times, most recently from d8efebe to 5614259 Compare February 16, 2025 12:53

ye11owSub changed the title ~~WIP: chempy.cpv refactoring~~ WIP: Switching from Python lists to NumPy arrays for linear algebra operations Feb 16, 2025

ye11owSub force-pushed the chempy_refactoring branch from 5b1e43c to fa549d2 Compare February 16, 2025 21:50

ye11owSub changed the title ~~WIP: Switching from Python lists to NumPy arrays for linear algebra operations~~ Switching the computation in cpv.py from python lists to numpy arrays for linear algebra operations Feb 16, 2025

ye11owSub requested a review from speleo3 February 16, 2025 22:05

ye11owSub force-pushed the chempy_refactoring branch from fa549d2 to 137f140 Compare February 16, 2025 22:18

ye11owSub changed the title ~~Switching the computation in cpv.py from python lists to numpy arrays for linear algebra operations~~ Switching the computation in cpv.py from python lists to numpy arrays Feb 16, 2025

ye11owSub force-pushed the chempy_refactoring branch from 137f140 to be6eae8 Compare February 17, 2025 00:08

ye11owSub force-pushed the chempy_refactoring branch from be6eae8 to 3431f50 Compare February 17, 2025 16:11

ye11owSub force-pushed the chempy_refactoring branch 2 times, most recently from 308bae1 to 1373979 Compare February 17, 2025 16:24

ye11owSub changed the title ~~Switching the computation in cpv.py from python lists to numpy arrays~~ tests for cpv.py Feb 17, 2025

ye11owSub commented Feb 18, 2025

View reviewed changes

ye11owSub changed the title ~~tests for cpv.py~~ WIP: chempy refactoring Feb 20, 2025

adding tests for chempy.cpv

f92a13f

ye11owSub force-pushed the chempy_refactoring branch from d45d258 to 7932692 Compare April 12, 2025 11:29

JarrettSJohnson reviewed Apr 12, 2025

View reviewed changes

ye11owSub commented Apr 16, 2025

View reviewed changes

adding tests for chempy models

1a02f19

ye11owSub force-pushed the chempy_refactoring branch 3 times, most recently from 54c88e7 to 6762ce7 Compare April 17, 2025 22:24

ye11owSub force-pushed the chempy_refactoring branch 7 times, most recently from 869f19d to 4f0a7dd Compare April 20, 2025 20:45

refactoring chempy/models.py

6ec0804

ye11owSub force-pushed the chempy_refactoring branch from 4f0a7dd to 6ec0804 Compare April 20, 2025 20:47

JarrettSJohnson requested changes May 12, 2025

View reviewed changes

	model = Connected()
	model.molecule = self.molecule
	model.atom = self.atom
	model.bond = []
	model.index = None
	for a in model.atom:
	model.bond.append([])
	for b in self.bond:
	model.bond[b.index[0]].append(b) # note two refs to same object
	model.bond[b.index[1]].append(b) # note two refs to same object

		# HAVEN'T YET VERIFIED THAT THIS CONFORMS TO STANDARD DEFT
		# upd: no, it's not(fixed)

WIP: chempy refactoring #433

Are you sure you want to change the base?

WIP: chempy refactoring #433

Uh oh!

Conversation

ye11owSub commented Feb 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JarrettSJohnson commented Feb 16, 2025

Uh oh!

ye11owSub commented Feb 17, 2025

Uh oh!

JarrettSJohnson commented Feb 17, 2025

Uh oh!

ye11owSub commented Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

speleo3 commented Feb 17, 2025

Uh oh!

ye11owSub commented Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TstewDev commented Feb 18, 2025

Uh oh!

ye11owSub commented Feb 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TstewDev commented Feb 21, 2025

Uh oh!

ye11owSub commented Feb 28, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ye11owSub Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JarrettSJohnson Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ye11owSub Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ye11owSub Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JarrettSJohnson commented Apr 17, 2025

Uh oh!

JarrettSJohnson left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

ye11owSub commented Feb 15, 2025 •

edited

Loading

ye11owSub commented Feb 17, 2025 •

edited

Loading

ye11owSub commented Feb 17, 2025 •

edited

Loading

ye11owSub Apr 16, 2025 •

edited

Loading

JarrettSJohnson Apr 16, 2025 •

edited

Loading

ye11owSub Apr 16, 2025 •

edited

Loading

ye11owSub Apr 16, 2025 •

edited

Loading