Cython xdrlib #441

kain88-de · 2015-09-19T21:15:51Z

This is the progress I made on the cython port of xdrlib. Most tests are passing, some are failing because currently XTC/TRR-File has less features then the old version. I'll add them later, see list below). The reimplemented Readers/Writers are now just a minimal subclass of base.Reader. I'd like to imagine this is easier to understand then the previous code

Things that have happened

short list of what I did.

add updated xdrlib source
wrap unmodified xdrlib with python-like file objects in cython
add tests for file XTCFile and TRRFile classes
use XTCFile in XDR-Reader/Writer
use TRRFile in XDR-Reader/Writer

TODO

kain88-de · 2015-09-19T21:21:59Z

testsuite/MDAnalysisTests/test_timestep_api.py

-        assert_equal(self.ts.has_forces, True)
-        assert_array_almost_equal(self.ts.forces, self.refpos + 101)
-
+# class TestXTCTimestep(_TestTimestep, _XTCTimestep):


Was there anything really special about the old XTCTimestep? I didn't could so far to everything with the base.Timestep. If there is nothing special I'll remove these test as the class doesn't exist anymore

https://github.com/MDAnalysis/mdanalysis/blob/develop/package/MDAnalysis/coordinates/xdrfile/core.py#L97

Looks like some stuff for keeping the status of the Reader and _frame which is the trajectories opinion of the frame, rather than MDA's

Oh, and the unitcell isn't standard either. Timestep._unitcell should be the native format representation of the box, gromacs has 3 vectors

Well I already use the native format with the conversion functions defined in 'lib.mdamath'. If this is all then we don't have a reason that to keep that special Timestep class

The problem with that is if you write out your unit cell again you might end up with rounding errors (89.999 angles) , which then isn't a cuboid any more.

According to this code that is actually what MDAnalysis is doing currently.

mdanalysis/package/MDAnalysis/coordinates/xdrfile/core.py

Line 245 in 96e3ce9

unitcell = self.convert_dimensions_to_unitcell(ts).astype(np.float32) # must be float32 (!)

in a convoluted way currently we are directly using for every save

box = triclinic_vectors(triclinic_box(box_vectos))

But I'll definitely run a check how often that can be done until the errors sum.

richardjgowers · 2015-09-19T21:42:15Z

I think another thing that's going to have to be checked is performance. Probably just something simple like iterating through a huge trajectory and seeing how fast it goes

kain88-de · 2015-09-20T08:14:11Z

I'll do a performance check once I implemented the old seek code again. For proper performance tests we should have a least two files. One with a large number of frames and another with a large number of atoms, I'm think that that memory allocations could be a problem with a large number of atoms per frame. But I'll have to test that first.

richardjgowers · 2015-09-21T22:59:25Z

package/MDAnalysis/coordinates/XTC.py

+            self.n_atoms = len(sub)
+        else:
+            self.n_atoms = self.xtc.n_atoms
+        self.xtc.seek(0)


Is this going to put the trajectory before frame 0? The Reader should read the first frame, so that calling next reads the second frame

u = mda.Universe(etc) u.trajectory.next() # should be second frame

Ah OK I didn't know that. I added a TODO for me and I will also add tests for all readers.

kain88-de · 2015-10-29T22:00:35Z

#474 definitely was a good idea. The new tests are already catching stuff that I didn't notice before.

richardjgowers · 2015-10-29T22:12:36Z

Yeah I was going to move a few Readers over into the new tests and I'm half expecting to find a few nice little bugs.

Don't worry about the unitcell thing too much, I'm just wary of boxes having an angle of 89.99. There's a huge difference between that and 90.0.

kain88-de · 2015-11-24T21:15:54Z

OK finally all the tests are passing, old and new! This means that just the optimized seeking is left to do + some cleaning up.

As far as performance goes this isn't doing to bad. Version 0.12.1 needs 24s to run the xdr tests and this needs 28s. But my hope is that this will go away once I optimized the seeking behavior

kain88-de · 2015-11-24T21:18:28Z

BTW is there something super special about the single frame readers? What is the expected behavior of the Readers when the trajectory only contains 1 frame?

richardjgowers · 2015-11-24T22:30:38Z

I think with the SingleFrameReader, what was happening previously was each Reader had an implementation of __getitem__ etc so I wrote the class to unify that. It might be possible to use Reader for them too now that's also been cleaned up

kain88-de · 2015-12-06T20:34:06Z

package/MDAnalysis/lib/formats/xtc.pyx

+        cdef np.ndarray dims = np.array([est_nframes], dtype=np.int64)
+        cdef np.ndarray _offsets = ptr_to_ndarray(<void*> offsets, dims, np.NPY_INT64)
+        print("read {} frames, estimated {}".format(n_frames, est_nframes));
+        print("first offset = {}".format(offsets[0]))


So I'm finally trying to read the offsets from the file in the same way we used to before. But I can't quite seem to make if possible. This code compiles but doesn't interface with the correctly with the C xdrlib. offsets is still a NULL-ptr after calling the read_xtc_n_frames. This shouldn't have happened. Anyone has an idea of why?

I checked the memory is allocated in the xdrlib but the pointer at the cython level is never updated.

This [SO}(http://stackoverflow.com/questions/1398307/how-can-i-allocate-memory-and-return-it-via-a-pointer-parameter-to-the-calling#1398321) posts answers it. If I want to change the address a pointer points to in a function I have to pass a pointer to that pointer. Ok so offset reading works I just need to include it in the API (YEAH)

kain88-de · 2015-12-08T08:50:58Z

package/MDAnalysis/lib/formats/xtc.pyx

+            raise RuntimeError('Trying to seek over max number of frames')
+        print(frame, self.offsets[frame])
+        # print(xdrlib.xdr_seek(self.xfp, self.offsets[frame], xdrlib.SEEK_SET))
+        print(xdrlib.xdr_seek(self.xfp, 0, xdrlib.SEEK_SET))


I'm hitting a wall again here. Every time I'm calling this function from python I get SystemError: error return without exception set. I have no clue where that comes from. The xdrlib functions seem to be called correctly if I add printf statements there. As far as I can make it out there are no errors occuring. @mnmelo did you see similar things implementing the seeking in SWIG?

Hmm... didn't bump into that one. My first reaction was to blame the frame type, as they always must be 64bit (large file access, and the such).

Another thing I noticed is that, in xdrfile.c/xdrfile.h the exdr_message is being built with exdrNR-1 members. This might be related, since exdrNR is out-of-bounds, and that's what seek returns on error. Maybe you can change xdrstdio_setpos or xdr_seek to return another error (though none of the available ones really fit it).

I overlooked exdr_message so far but that will be useful to get better errors messages. So i'm sure that is not the problem.

Yea... I couldn't also tell it would be the cause, but do flag it as problematic because exdrNR shouldn't be used for an error code and it will break when we attempt to retrieve the error message from it.
Anyway, a Google search led me to http://pythonextensionpatterns.readthedocs.org/en/latest/exceptions.html, where it is pointed out that somewhere a function is returning NULL without setting an exception. None of the seek functions can return a NULL...

kain88-de · 2015-12-10T21:35:45Z

So I think I'm done with the largest part of the implementation. The offsets are now also calculated directly in the xdrfile-library.

I've started to do some benchmarks. The test-suite runs in the same time. But when I try to benchmark bigger files I don't get conclusive results. I have a gut feeling that this version is faster but I can't really prove it right now. It would be nice if others could also benchmark and test this code. When I have time this week I'll run some more tests on different systems.

1.7 GB XTC

MDA v0.12.1

In [1]: from MDAnalysis.coordinates.XTC import XTCReader

In [2]: %timeit -n1 -r1 XTCReader('all_bb.xtc')
1 loops, best of 1: 1.91 s per loop

MDA this PR

In [1]: from MDAnalysis.coordinates.XTC import XTCReader

In [2]: %timeit -n1 -r1 XTCReader('all_bb.xtc')
1 loops, best of 1: 597 ms per loop

150 MB XTC

MDA v0.12.1


In [1]: from MDAnalysis.coordinates.XTC import XTCReader

In [2]: %timeit  XTCReader('test.xtc')
1 loops, best of 3: 206 ms per loop

MDA this PR

In [1]: from MDAnalysis.coordinates.XTC import XTCReader

In [2]: %timeit  XTCReader('test.xtc')
The slowest run took 4.08 times longer than the fastest. This could mean that an intermediate result is being cached 
1 loops, best of 3: 50.2 ms per loop

kain88-de · 2015-12-10T22:42:59Z

Also I'm not sure why the travis tests are failing. The files exists in the repository and when I do a fresh checkout everything works on my laptop.

This also includes information how to read the content of the files in ascii without using MDAnalysis

Base test classes in test files seem to be needed to start with an underscore otherwise nose picks them up and tries to lets their tests run.

kain88-de · 2016-01-17T14:01:28Z

@richardjgowers I added the changelog information. You can merge this if you like.

richardjgowers · 2016-01-17T14:15:09Z

package/CHANGELOG


 Enhancement
+  * Offsets reading for xtc/trr files has been speed up.


Sped up (exception in english)

richardjgowers · 2016-01-17T14:22:31Z

Awesome. So afaik it's just DCD and lots of six things that need doing now?

kain88-de · 2016-01-17T14:37:04Z

@richardjgowers hopefully yes. But the dcd will be a big one again.

Cython xdrlib

kain88-de reviewed Sep 19, 2015
View reviewed changes

kain88-de force-pushed the cython-xdrlib branch from ef4dddb to c327d35 Compare September 19, 2015 21:28

richardjgowers reviewed Sep 21, 2015
View reviewed changes

kain88-de mentioned this pull request Sep 22, 2015

style guide #404

Closed

11 tasks

kain88-de mentioned this pull request Oct 1, 2015

organization of test files #466

Closed

3 tasks

kain88-de force-pushed the cython-xdrlib branch 2 times, most recently from 94eb23e to 1a98b6d Compare October 29, 2015 21:45

kain88-de force-pushed the cython-xdrlib branch from 3bd46ae to 793d85d Compare November 2, 2015 08:36

kain88-de mentioned this pull request Nov 3, 2015

Port Coordinates test to new BaseReader/Writer Test classes #516

Open

16 tasks

kain88-de force-pushed the cython-xdrlib branch 4 times, most recently from 8df826c to 925d192 Compare November 16, 2015 22:00

kain88-de force-pushed the cython-xdrlib branch from 6ecd7cf to 09058db Compare November 25, 2015 10:51

kain88-de force-pushed the cython-xdrlib branch from 4797572 to de978f5 Compare December 6, 2015 20:28

kain88-de reviewed Dec 6, 2015
View reviewed changes

kain88-de force-pushed the cython-xdrlib branch 2 times, most recently from ba87760 to 54f3b3c Compare December 8, 2015 08:05

kain88-de reviewed Dec 8, 2015
View reviewed changes

kain88-de added 15 commits January 17, 2016 14:40

Fix type in doc-string

ddc8caf

Add information how test.xtc/trr was created

b3f48d8

This also includes information how to read the content of the files in ascii without using MDAnalysis

Add test to write non default xtc precision

7fde350

use float type for precision

3128f23

test precision setting in xtcwriter.

4f2a208

deactivate unused tests

405eff1

Add XTC sub selection test

03cb2ae

XDR fix spelling error

7b7242e

really test offset mismatch

568067d

Various QC fixes in XDR.py

7149ed1

Fix test Error

c42c528

Base test classes in test files seem to be needed to start with an underscore otherwise nose picks them up and tries to lets their tests run.

Fix Doc strings

a2508e3

Fix up doc strings

9a3c3fc

Fix up doc strings

eb80ca9

ignore *.npz files created during testing

afcd9da

kain88-de force-pushed the cython-xdrlib branch from 09ac232 to afcd9da Compare January 17, 2016 13:40

richardjgowers reviewed Jan 17, 2016
View reviewed changes

Add changelog info

6002e9f

kain88-de force-pushed the cython-xdrlib branch from 2dc85bd to 6002e9f Compare January 17, 2016 14:20

richardjgowers added this to the 0.13 bugfixes milestone Jan 17, 2016

richardjgowers added a commit that referenced this pull request Jan 17, 2016

Merge pull request #441 from kain88-de/cython-xdrlib

b91e0ab

Cython xdrlib

richardjgowers merged commit b91e0ab into MDAnalysis:develop Jan 17, 2016

kain88-de deleted the cython-xdrlib branch January 17, 2016 16:32

orbeckst mentioned this pull request Jan 17, 2016

new 0.13.0 release #600

Closed

mpharrigan mentioned this pull request Jan 21, 2016

[xtc|trr] added xdrlib2 (seek support). mdtraj/mdtraj#1023

Closed

This was referenced Jan 28, 2016

XDR file seeks and tells working again on large files (closes #677) #678

Merged

New xdrlib information in AUTHORS and LICENSE is missing/outdated #679

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cython xdrlib #441

Cython xdrlib #441

kain88-de commented Sep 19, 2015

kain88-de Sep 19, 2015

richardjgowers Sep 19, 2015

richardjgowers Sep 19, 2015

kain88-de Sep 20, 2015

richardjgowers Sep 20, 2015

kain88-de Oct 29, 2015

richardjgowers commented Sep 19, 2015

kain88-de commented Sep 20, 2015

richardjgowers Sep 21, 2015

kain88-de Sep 22, 2015

kain88-de commented Oct 29, 2015

richardjgowers commented Oct 29, 2015

kain88-de commented Nov 24, 2015

kain88-de commented Nov 24, 2015

richardjgowers commented Nov 24, 2015

kain88-de Dec 6, 2015

kain88-de Dec 6, 2015

kain88-de Dec 6, 2015

kain88-de Dec 8, 2015

mnmelo Dec 8, 2015

kain88-de Dec 8, 2015

mnmelo Dec 8, 2015

kain88-de commented Dec 10, 2015

kain88-de commented Dec 10, 2015

kain88-de commented Jan 17, 2016

richardjgowers Jan 17, 2016

kain88-de Jan 17, 2016

richardjgowers commented Jan 17, 2016

kain88-de commented Jan 17, 2016


		Enhancement
		* Offsets reading for xtc/trr files has been speed up.

Cython xdrlib #441

Cython xdrlib #441

Conversation

kain88-de commented Sep 19, 2015

Things that have happened

TODO

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

richardjgowers commented Sep 19, 2015

kain88-de commented Sep 20, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kain88-de commented Oct 29, 2015

richardjgowers commented Oct 29, 2015

kain88-de commented Nov 24, 2015

kain88-de commented Nov 24, 2015

richardjgowers commented Nov 24, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kain88-de commented Dec 10, 2015

1.7 GB XTC

150 MB XTC

kain88-de commented Dec 10, 2015

kain88-de commented Jan 17, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

richardjgowers commented Jan 17, 2016

kain88-de commented Jan 17, 2016