-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel insty #1945
Parallel insty #1945
Conversation
If the tests pass don't give broken pipes with fewer frames and procs then we can update the results to compare against to have fewer frames too |
Ok, still a broken pipe, so definitely something to figure out |
@jamesmkrieger are you planning to try this again? |
It would be good to try at some point, yes. There was an issue asking about it too. I don't know when I'll get around to figuring out what's wrong though |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi James,
I tested parallel insty and I was able to compute nicely interactions for two trajectories. One was with 20 frames, second with 100 frames. Later I can try with higher number of frames.
When I tried ensemble PDB such as NMR structure. I obtained the error below and that was with only 5 frames (total is over 100 conformations):
BrokenPipeError Traceback (most recent call last)
Cell In[22], line 1
----> 1 data_all = nteractionsTrajectoryNMR.calcProteinInteractionsTrajectory(atoms2, frame_stop=5)
File ~/anaconda3/envs/py310_tests/lib/python3.10/site-packages/ProDy-2.5.0-py3.10-linux-x86_64.egg/prody/proteins/interactions.py:5280, in InteractionsTrajectory.calcProteinInteractionsTrajectory(self, atoms, trajectory, filename, **kwargs)
5277 for p in processes:
5278 p.join()
-> 5280 interactions_all = [entry[:] for entry in interactions_all]
5281 interactions_nb = [entry[:] for entry in interactions_nb]
5282 else:
File ~/anaconda3/envs/py310_tests/lib/python3.10/site-packages/ProDy-2.5.0-py3.10-linux-x86_64.egg/prody/proteins/interactions.py:5280, in (.0)
5277 for p in processes:
5278 p.join()
-> 5280 interactions_all = [entry[:] for entry in interactions_all]
5281 interactions_nb = [entry[:] for entry in interactions_nb]
5282 else:
File :2, in getitem(self, *args, **kwds)
File ~/anaconda3/envs/py310_tests/lib/python3.10/multiprocessing/managers.py:817, in BaseProxy._callmethod(self, methodname, args, kwds)
814 self._connect()
815 conn = self._tls.connection
--> 817 conn.send((self._id, methodname, args, kwds))
818 kind, result = conn.recv()
820 if kind == '#RETURN':
File ~/anaconda3/envs/py310_tests/lib/python3.10/multiprocessing/connection.py:206, in _ConnectionBase.send(self, obj)
204 self._check_closed()
205 self._check_writable()
--> 206 self._send_bytes(_ForkingPickler.dumps(obj))
File ~/anaconda3/envs/py310_tests/lib/python3.10/multiprocessing/connection.py:411, in Connection._send_bytes(self, buf)
405 self._send(buf)
406 else:
407 # Issue #20540: concatenate before sending, to avoid delays due
408 # to Nagle's algorithm on a TCP socket.
409 # Also note we want to avoid sending a 0-length buffer separately,
410 # to avoid "broken pipe" errors if the other end closed the pipe.
--> 411 self._send(header + buf)
File ~/anaconda3/envs/py310_tests/lib/python3.10/multiprocessing/connection.py:368, in Connection._send(self, buf, write)
366 remaining = len(buf)
367 while True:
--> 368 n = write(self._handle, buf)
369 remaining -= n
370 if remaining == 0:
BrokenPipeError: [Errno 32] Broken pipe
I hope that will be helpful with figuring where the problem is. The good news is that the analysis of trajectory seems to work really nice (and fast!).
Thanks. That should help, yes. Which nmr ensemble? |
PDB code: 2k39 |
I just analyzed >1300 frames (trajectory analysis). That code is working fine for DCD analysis. |
Does it still work right using stop_frame for trajectories? It looks like that isn't being used right |
Actually, ignore that. I see how that's handled. |
How about we just use the trajectory option for multi-model pdbs instead?
I could then remove the other block of code |
ok, I have now simplified the code to not use the other block and to just use atoms as trajectory if it can:
If atoms doesn't have multiple coordinate sets, then we still get the warning and empty output:
|
ok, now the tests are failing because the saved interactions to compare against have the wrong number of frames, I think |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, parallel calculations are fine.
I tested trajectory and multi-model PDB.
But I started wondering whether stop_frame is well defined. I did that, but now I am starting to wonder.
When we have:
interactionsTrajectory = InteractionsTrajectory('trajectory')
...: interactionsTrajectory.calcProteinInteractionsTrajectory(atoms2, stop_frame=10)
It will give us an analysis of the first 11 frames because it starts to compute from 0 till the 10th frame. Is it ok?
The outcome for 2k39:
[[43, 49, 53, 52, 50, 50, 44, 50, 47, 48, 47],
[3, 5, 2, 6, 4, 6, 5, 5, 2, 4, 4],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[16, 20, 16, 19, 18, 17, 18, 19, 22, 20, 19],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
Yes, I think that makes sense. I just tried stop_frame=12 and got results for 13 frames. With start_frame=1, stop_frame=12, then I get 12. |
We can keep stop_frame as is, but if we want to change it because it is not intuitive or so, it is the best time. That is why I am mentioning that. |
I think as long as we document it, then it should be ok |
Ok |
I think the docs work fine:
|
What is calcInteractionsMultipleFrames? Does this one work with the parallel options? |
calcInteractionsMultipleFrames compute is used by each function that is computing particular types of interactions, such as calcHydrogenBondsTrajectory(), calcSaltBridgesTrajectory(), etc. |
I tested parallel insty. I run three times the same calculations. As you can see each type we have calculations in different order. I this that is the problem with tests. test1: test2: test3: |
Thanks for pinning that down |
I think everything should work now |
|
The test data files are restored from the main branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked.
In [1]: from prody import *
In [2]: atoms = parsePDB('2k39_insty.pdb')
@> 1231 atoms and 15 coordinate set(s) were parsed in 0.20s.
In [3]: confProDy(verbosity='none')
@> ProDy is configured: verbosity='none'
In [4]: interactions_traj = InteractionsTrajectory()
...:
In [5]: interactions_nb_1 = interactions_traj.calcProteinInteractionsTrajectory(atoms, max_proc=1)
...:
In [6]: interactions_nb_1
Out[6]:
[[56, 48, 43, 43, 45, 49, 46, 52, 43, 44, 50, 49, 48, 50, 39],
[7, 5, 5, 4, 3, 6, 3, 4, 2, 7, 5, 3, 4, 5, 5],
[1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
[15, 15, 17, 16, 18, 15, 16, 16, 18, 17, 14, 18, 16, 15, 16],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
In [7]: interactions_traj_para = InteractionsTrajectory()
In [8]: interactions_nb_para = interactions_traj_para.calcProteinInteractionsTrajectory(atoms)
In [9]: interactions_nb_para
Out[9]:
[[56, 48, 43, 43, 45, 49, 46, 52, 43, 44, 50, 49, 48, 50, 39],
[7, 5, 5, 4, 3, 6, 3, 4, 2, 7, 5, 3, 4, 5, 5],
[1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
[15, 15, 17, 16, 18, 15, 16, 16, 18, 17, 14, 18, 16, 15, 16],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
In [10]: interactions_nb_para2 = interactions_traj_para.calcProteinInteractionsTrajectory(atoms, max_proc=5)
In [11]: interactions_nb_para2
Out[11]:
[[56, 48, 43, 43, 45, 49, 46, 52, 43, 44, 50, 49, 48, 50, 39],
[7, 5, 5, 4, 3, 6, 3, 4, 2, 7, 5, 3, 4, 5, 5],
[1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
[15, 15, 17, 16, 18, 15, 16, 16, 18, 17, 14, 18, 16, 15, 16],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
This still has some problems, which I think might be to do with memory when using too many processors and/or frames, but it's essentially there now.