'MDAnalysis.analysis.diffusionmap' parallelization #4745
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #4679 attempt
Changes made in this Pull Request:
DistanceMatrix
inanalysis.diffusionmap
client_DistanceMatrix
inconftest.py
client_DistanceMatrix
inrun()
intest_diffusionmap.py
Here is the Problem:
From what I see currently
self.results.dist_matrix = np.zeros((self.n_frames, self.n_frames))
has the problem, that when it uses parallelization the
self.n_frames
value gets divided, which leads with thendarray_sum
andndarray_mean
to:E AssertionError:
E Arrays are not almost equal to 4 decimals
E (shapes (2,), (4,) mismatch)
E ACTUAL: array([1., 1.])
E DESIRED: array([1, 1, 1, 1])
and with the use of
ndarray_vstack
orndarray_hstack
it leads to the following error:numpy.linalg.LinAlgError: Last 2 dimensions of the array must be square
so I am not sure if this can be somehow adjusted, I also encountered a similar case also while trying to parallelize
analysis.msd
PR Checklist
Developers certificate of origin
📚 Documentation preview 📚: https://mdanalysis--4745.org.readthedocs.build/en/4745/