rpy2/r-bio3d integration for testing/R interop; memory efficient implementation for DCC computation #14

JHKru · 2024-01-26T13:44:52Z

rpy2/r-bio3d integration:
Currently, tests with bio3d and BiophysConnectoR as references consist of .csv files generated separately by R.
This offers limited flexibility when updating tests:
R scripts have to be adjusted and .csv files regenerated manually for each change.
This would quickly get unsustainable and confusing, in case tests are expanded, for example to a larger range of proteins.
With rpy2, bio3d can directly be called from within Python scripts with (rudimentary) interconversion between bio3d-PDB objects and AtomArrays.
Both packages can be readily installed via conda.
Additional dependencies for springcraft-dev are the only downsides that come to mind.
The obsolete .csv files were deleted for this pull request.
Memory efficient DCC computation variant
The previous approach to compute DCCs needs large amounts of memory for medium/large proteins.
The memory efficient variant was reasonably fast in tests and could therefore replace the older variant.
Both variants are still present for now (the older one can be toggled by mem_eff=False).

padix-key · 2024-01-29T17:01:00Z

Hi, feel free to ask, if you would like to have a review. Since you propose the mem_eff switch: What is the advantage of setting mem_eff=False? Is it significantly slower?

JHKru · 2024-01-31T21:48:09Z

Hi, sure; that would be great, if you find the time.
Regarding DCC mem_eff:
I haven't tested it systematically yet to be honest and mainly left it as fallback option or (possibly) for future tests.
The memory efficient variant should be slower for mode subsets (written in a for-loop), but it is not noticeable for the small protein we use in our tests.

padix-key · 2024-02-04T12:30:40Z

Could you run a quick and dirty timing test? If it is not significantly slower and it passes our tests, but also provides a big memory improvement, I would say we enforce this new way:
This avoids maintencance from our side and does not require additional attention by the user.

JHKru · 2024-02-16T12:34:28Z

-> Averages over five repetitions.
The new mem_eff method is faster for larger proteins (compare mem_eff with no_mem_eff).
Surprisingly, the loop-based variant for mem_eff considering only the first 100 modes is also faster than its no_mem_eff counterpart (mem_eff_subset_loop vs. no_mem_eff_subset).
Lastly, the loop-based, mem_eff variant (considering all modes) is also faster than no_mem_eff.

In summary, we can and should remove the mem_eff option and the older methods.

JHKru · 2024-02-16T14:05:53Z

One additional annoyance to consider:
I tested the development environment on Windows today and rpy2 did not work properly.
There are some known problems with Windows specifically, both with conda and pip
(for example, www.github.com/rpy2/rpy2/issues/848#issuecomment-1059785984).
With WSL, everything works as expected.

padix-key · 2024-02-18T16:43:50Z

One additional annoyance to consider: I tested the development environment on Windows today and rpy2 did not work properly. There are some known problems with Windows specifically, both with conda and pip (for example, www.github.com/rpy2/rpy2/issues/848#issuecomment-1059785984). With WSL, everything works as expected.

From my perspective this is not a large problem, as this happens only for the tests, which most users wouldn't run anyway.
If it works in the CI this is fine for me.

Does this problem appear in during the rpy2 installation or during test execution? In the latter case, I would suggest to use pytest.mark.skip() or pytest.skip(), so if anyone happens to run the tests, it is directly clear that there is not problem with the code.

padix-key

Functionally the code looks good to me, I would only suggest some small style changes. I really like the new tests, it is much cleaner now!

padix-key · 2024-02-18T16:50:16Z

tests/test_anm.py

+            reference_fluc_subset = np.array(
+                                        bio3d.fluct_nma(enm_nma_bio3d, 
+                                                        mode_inds=r_seq(12,33)
+                                        ))
+            reference_dcc = np.array(bio3d.dccm(enm_nma_bio3d))
+            reference_dcc_subset = np.array(
+                                        bio3d.dccm(enm_nma_bio3d, nmodes=30)
+                                    )


To much indentation

padix-key · 2024-02-18T16:50:38Z

tests/test_anm.py

+            reference_fluc = np.genfromtxt(
+                    join(data_dir(), ref_fluc),
+                    skip_header=1, delimiter=","
+                )


Too much indentation

padix-key · 2024-02-18T16:53:39Z

src/springcraft/nma.py

+        reshaped = cov.reshape(
+                            cov.shape[0]//3, 3, -1, 3
+                        ).swapaxes(1,2)


Too much indentation. Furtermore I would use cov.shape[0]//3, 3, cov.shape[0]//3, 3 just to reflect that both values are the same N.

padix-key · 2024-02-18T16:55:25Z

src/springcraft/nma.py

+        modes_reshaped = np.reshape(
+                            eig_vectors, (len(mode_subset), -1, num_dim)
+                            )


Too much indentation

padix-key · 2024-02-18T16:57:03Z

src/springcraft/nma.py

+    from springcraft import GNM
+    from springcraft import ANM


Do you have a reason not to use a relative import here?

Not really; changed that.

padix-key · 2024-02-18T17:14:48Z

I would propose to do a squash and merge as soon as you are ready to keep the git history a bit cleaner

Refinements; DCC Updated environment.yml -> bio3d as Dev-Dependency Updated environment.yml -> bio3d as Dev-Dependency Revert "Updated environment.yml -> bio3d as Dev-Dependency" This reverts commit a4bb8de. Revert "Updated environment.yml -> bio3d as Dev-Dependency" This reverts commit 8cc2017. Add r2py/bio3d as dev dependencies Adjusted tests; incorporated rpy2

Add rpy2 and r-bio3d to pyproject.toml Added correct versions in .toml Revert "Integrated rpy2 into tests -> fix" This reverts commit 8133498. Added rpy2/r-bio3d -> fix Revert "Added correct versions in .toml" This reverts commit 02562ee. Revert "Add rpy2 and r-bio3d to pyproject.toml" This reverts commit ecc420f. Update .gitignore Added rpy2/r-bio3d to GitHub-Workflow Removed mem_eff argument for DCC

JHKru · 2024-02-22T18:39:03Z

Not sure, if all indentations are correct now, but I think I'm ready.
Is the commit history fine this way?

padix-key · 2024-02-23T20:23:07Z

tests/test_anm.py

+            reference_fluc = np.genfromtxt(join(data_dir(), ref_fluc),
+                                            skip_header=1, delimiter=","
+                                        )


In other places in the code base, and as far as I have seen, most commonly in Python projects, the indentation of a line with a closing bracket bracket is the same as the corresponding opening bracket.

Suggested change

reference_fluc = np.genfromtxt(join(data_dir(), ref_fluc),

skip_header=1, delimiter=","

)

reference_fluc = np.genfromtxt(

join(data_dir(), ref_fluc),

skip_header=1, delimiter=","

)

padix-key · 2024-02-23T20:25:17Z

Hi, sorry for the late response. A added an example change to the review to show my preferred indentation style (see above). If you still prefer yours, that would be also fine for me, then I would merge.

JHKru · 2024-02-23T22:09:49Z

No worries, indentations should be adjusted now.

JHKru changed the title ~~rpy2/r-bio3d integration for testing/ R interop; Memory efficient implementation for DCC computation~~ rpy2/r-bio3d integration for testing/ R interop; memory efficient implementation for DCC computation Jan 26, 2024

JHKru requested a review from padix-key January 31, 2024 21:48

JHKru marked this pull request as ready for review January 31, 2024 21:49

padix-key reviewed Feb 18, 2024

View reviewed changes

JHKru added 3 commits February 22, 2024 19:36

Style changes

eb2f0f1

JHKru force-pushed the master branch from 725dce8 to eb2f0f1 Compare February 22, 2024 18:37

JHKru changed the title ~~rpy2/r-bio3d integration for testing/ R interop; memory efficient implementation for DCC computation~~ rpy2/r-bio3d integration for testing/R interop; memory efficient implementation for DCC computation Feb 22, 2024

padix-key reviewed Feb 23, 2024

View reviewed changes

Style changes: Fix indentations

9072241

padix-key merged commit 914b997 into biotite-dev:master Feb 25, 2024
1 check passed

JHKru mentioned this pull request Sep 14, 2024

Preparation for Biotite 1.0/Numpy 2.0 #15

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rpy2/r-bio3d integration for testing/R interop; memory efficient implementation for DCC computation #14

rpy2/r-bio3d integration for testing/R interop; memory efficient implementation for DCC computation #14

JHKru commented Jan 26, 2024 •

edited

Loading

padix-key commented Jan 29, 2024

JHKru commented Jan 31, 2024

padix-key commented Feb 4, 2024

JHKru commented Feb 16, 2024 •

edited

Loading

JHKru commented Feb 16, 2024 •

edited

Loading

padix-key commented Feb 18, 2024

padix-key left a comment

padix-key Feb 18, 2024

padix-key Feb 18, 2024

padix-key Feb 18, 2024

padix-key Feb 18, 2024

padix-key Feb 18, 2024

JHKru Feb 19, 2024

padix-key commented Feb 18, 2024

JHKru commented Feb 22, 2024

padix-key Feb 23, 2024

padix-key commented Feb 23, 2024

JHKru commented Feb 23, 2024

rpy2/r-bio3d integration for testing/R interop; memory efficient implementation for DCC computation #14

rpy2/r-bio3d integration for testing/R interop; memory efficient implementation for DCC computation #14

Conversation

JHKru commented Jan 26, 2024 • edited Loading

padix-key commented Jan 29, 2024

JHKru commented Jan 31, 2024

padix-key commented Feb 4, 2024

JHKru commented Feb 16, 2024 • edited Loading

JHKru commented Feb 16, 2024 • edited Loading

padix-key commented Feb 18, 2024

padix-key left a comment

Choose a reason for hiding this comment

padix-key Feb 18, 2024

Choose a reason for hiding this comment

padix-key Feb 18, 2024

Choose a reason for hiding this comment

padix-key Feb 18, 2024

Choose a reason for hiding this comment

padix-key Feb 18, 2024

Choose a reason for hiding this comment

padix-key Feb 18, 2024

Choose a reason for hiding this comment

JHKru Feb 19, 2024

Choose a reason for hiding this comment

padix-key commented Feb 18, 2024

JHKru commented Feb 22, 2024

padix-key Feb 23, 2024

Choose a reason for hiding this comment

padix-key commented Feb 23, 2024

JHKru commented Feb 23, 2024

JHKru commented Jan 26, 2024 •

edited

Loading

JHKru commented Feb 16, 2024 •

edited

Loading

JHKru commented Feb 16, 2024 •

edited

Loading