-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH, WIP] Multivariate connectivity methods (imaginary coherence and Granger causality) #125
Conversation
Initialize epochs_multivar
merge from upstream
add multivar seeds and targets
tagging @larsoner @agramfort @adam2392. This may need significant discussion to hammer out the details. |
Hi @tsbinns Thanks for making the PR and offering such a substantial contribution! In the meantime tho perhaps for my own education and to guide the discussion, I think these are a few things we'll need to discuss at a high level among your team and our team. Note these are just questions to guide discussion. Excited to move this forward! Feel free to chime in on each topic here:
|
Hi @adam2392, thank you for the reply! Here are my thoughts on your questions: 1. Testing for "correctness" of each method If you mean something a bit more advanced such as checking that the connectivity estimates match some simulated data with a known connectivity profile, then I do not know of anything off the top of my head. However, I am sure this is something we could figure out. So far, we have been making sure to compare our connectivity estimates with those produced from the original MATLAB implementations of these methods as a sanity check. I realise this is not a test of "correctness" as such, but at least it confirms that these implementations match the authors' intended method. Unfortunately this sanity check is not possible in all circumstances due to the addition of new features (for example, we have added options for performing dimensionality reduction prior to estimating connectivity to reduce overfitting and enable working with non-full rank data, as well as the ability to extract spatial topographies for connectivity). However, we are confident that these additional features are working as intended. 2. Storage of the connectivity Something a little less trivial is the storage of any spatial topographies as these can end up being ragged object arrays (have dimensions [connections x channels x freqs (x times)], and so depending on the number of channels used in each connection, can end up ragged). For instance, when saving with HDF5, we handled this by adding the functionality to pad the object arrays such that they can be stored as regular numpy arrays, and then upadded when being loaded. Again though, this would not interfere with any existing behaviour, just add something new. However, whilst it is currently the case that the shape of the connectivity results is identical to that in the existing classes ([connections x freqs (x times)]), this may change. For maximised imaginary coherence, the current implementation involves taking only a single (largest) connectivity component. However, it can also be of interest to inspect other components of the connectivity (as many components as the rank of the data is possible, similar to the situation with the results of 3. Maintenance longer-term 4. Interpretations Including a short paragraph or two for each of the connectivity methods in the documentation is also something we could easily add, and there are plenty of publications we could cite for those interested in further reading. I would be interested to hear your thoughts on this. |
Hi @tsbinns thanks for your timely response! We have a dev meeting on Friday, so will get back to you after that. In the meantime, here's somethings to think about:
Yes that would be great! The existing tests you have regarding checking the correct range of values is also great.
Something to think about: Is it possible to get a few hard-coded short-examples ran through MATLAB that we can just compare? E.g. some simulation data generated that we can load in mne-connectivity during
Here, we can leverage the existing
So it seems the main potential differences are the chance of a ragged array, and the possibility of an extra "arbitrary" dimension (e.g. rank). I don't think we want to support arbitrary lists of connectivities in a container. It sort of breaks the xarray design we have. Re ragged array: Do you mind elaborating what is the exact scenario we might get a ragged connectivity array? I don't think I follow. If we have a connectivity over spatial topographies where each spatial point value is computed using say 2-5 channels, the end connectivity should still be over the spatial points though right In terms of storing ragged points: Perhaps we can encode the missing "connectivity" parts with np.nan? Or zeros? I guess zeros are not ideal cuz perhaps they could be 0? Then, someone can just remove the Re arbitrary extra dimension(s): Thinking out loud here... I think this could just be a generic "ExtendedConnectivity" container that allows arbitrary extension of the dimensions stored. E.g. the biggest one currently is
class ExtendedConnectivity(BaseConn):
def __init__(self, data, ..., **kwargs):
...
conn = ExtendedConnectivity(data, ..., rank=[1,2,3,4], subjects=[0,1,5,10])
print(conn.dims)
> `(n_epochs, n_nodes, n_nodes, n_freqs, n_times, rank, subjects)` |
Hi @adam2392,
It's definitely possible. One thing to consider is that I am having to compute the CSD in MNE and then load this into MATLAB, since we did not have the MATLAB code to replicate MNE's CSD computation procedure. This would mean no changes can be made to the way the data is simulated or the CSD is computed in MNE, otherwise someone would have to run the MATLAB computations again. Still, as long as this is kept in mind, it would not be a major problem.
So the spatial topographies have dimensions [connections x channels x freqs (x times)]. If the seeds/targets for all connections have the same number of channels, then the array will not be ragged. However, say we have 2 connections, the first connection involving 3 seed channels, and the second connection involving 5 seed channels. The spatial topographies for our first seed has shape [3 x freqs (x times)], and those for our second seed is [5 x freqs (x times)]. Combining these topographies across connections then results in a ragged array. The same situation applies if we switched to returning more than just the largest connectivity component for the maximised imaginary part of coherency, as the connectivity results would also have dimensions [connections x channels x freqs (x times)], and so depending on the number of channels used for the seeds and targets of a given connection, the shape of the array for each connection could vary. I also forgot to mention that the same issue applies for
Our approach so far has been to store the seed and target topographies as ragged arrays under the
This padding ensures
Admittedly, handling ragged arrays is quite cumbersome. If you believe padding
Yes, that would be a nice, flexible solution. If this does end up requiring some refactoring of the existing connectivity classes, I would be happy to help with a related PR :) |
@tsbinns Sorry the dev meeting is actually next Friday. However, I am going to be out at that time... I may make it to the meeting, but if not the next one is in 3 weeks. I think regarding the ragged and extended dimensions requires a bit more discussion, so my apologies for the possible delay on that. I think overall, I am optimistic we can figure out a way to make this work. Albeit not including all possible features that you may want, but perhaps having some abstractions that make it possible for you to implement on your end that depends on mne-connectivity. E.g. if you want ragged arrays, maybe we just end up supporting default padding. However, I think regarding the rest of the code, it might be possible to begin moving forward w/ a PR if you think so? Out of the list here mne-connectivity/mne_connectivity/spectral/epochs_multivariate.py Lines 1310 to 1328 in 024bc7d
|
@adam2392 Regarding the dev meeting, no worries! As for a baseline PR, yes, I think that would be possible. Only maximised imaginary coherence (MIC) would require an extension of the connectivity results dimensions (and also is the only method involving potentially ragged topographies). For all other methods, the dimensionality of the results is not changed, and so could rely on the existing connectiviy classes. Multivariate interaction measure (MIM) is the simplest in terms of the amount of code required, so that could be a good starting point. However, all of these methods would require support for ragged indices. If this were implemented using the existing connectivity classes, everything except for saving the class should work. Having a call to iron out some details for this example PR would be useful for me; thanks for offering! My schedule is generally quite flexible, so please let me know a date that would work for you. Also, I am based in Europe, so would you have time to meet in the morning? (otherwise it might be a bit late for me) |
Would it make sense to implement them w/o the assumption that
Sure, can you send me an email at adam.li at Columbia dot edu and we can sort out a time? |
Actually yes, this is a valid use of MIM, so I could get started with a PR using the existing indices format.
Yes, I'll write you soon! |
@adam2392 @tsbinns did you two work out the details here? If not, do you still need feedback on all four points from #125 (comment) ? |
Currently @tsbinns is working on a PR to add the new connectivity metrics without ragged indices and without a new dimension in connectivity. Any feedback on the ragged indices and new dimensions is appreciated! |
Hi @larsoner, as @adam2392 mentioned, I am currently working on a PR which could be used as a base before the full functionality of these methods is extended to. I have the implementation working and a fairly thorough example written, however I still had a few unit tests I wanted to add. If you would want me to already push this PR so that you can get an idea of things already, I am happy to do so, otherwise I would try to finish the unit tests first and then submit something in the next few days. Let me know what works best for you. Cheers! |
Feel free to push once you have unit tests and the example, then we can do a proper review. Then once that's merged (since it's uncontroversial), how about you open a follow-up PR with the ragged indices and extra dimensions as a proposal-by-PR? Assuming that splitting path isn't a lot of / excessive work at your end, it'll make reviewing and thinking about the other stuff easier |
Hi @tsbinns just checking in to see how things are going? Feel free to open the PR for early feedback if there's some issues you wanna chat thru. |
Hi @adam2392 , sorry for the delay. I got caught up with some other work, then I lost a week being off sick with the flu, and now I'm on holiday until next week :/ |
We have merged in #138 and now it would be good to perhaps summarize what remains to be done (if any). I will try to make a release sometime in the next few weeks when I have time. |
Thanks very much @adam2392! The only thing which remains is adding support for connectivity computations where the number of seeds and targets differ within and across connections. I suppose the two options are: 1) add support for ragged arrays; or 2) we pad the arrays with some known value (e.g. NaN) to ensure they are not ragged. I would be very happy to discuss this further. With this change we would have the full functionality of these multivariate methods! |
With option 1), I would say if there's a way to add the optional dependency https://github.com/scikit-hep/awkward and easily transform between awk, numpy and xarray, then we could do that. Otw, option 2) seems like a good option w/ documentation and a short example of how to do this. |
I am not familiar with awkward, but from a first glance it looks like this could work. In any case, I imagine avoiding additional dependencies woule be preferable? Having played around with the two approaches, I personally lean towards padding the arrays for an end-user perspective. Handling the arrays is much simpler, and it's not at all difficult to extract the relevant information. With documentation and an example like you mention, I do not see it being a problem for users. It would also be trivial to implement. If you like, I could prepare a draft PR with an implementation of option 2 to get a better idea of how it would practically work. Depending on this we could decide to stick with it or switch to option 1. How does that sound to you? |
just a note to say that I know a couple of the awkward devs/maintainers. My take is that it would be okay to add this as a dependency --- i.e., I think it's likely to be around and supported for a while --- if option 2 (padding regular arrays with |
Myself and some colleagues have been working on incorporating some multivariate connectivity methods into the MNE-connectivity package. The methods are two forms of imaginary coherence (maximised imaginary coherence, and the multivariate interaction measure; Ewald et al., 2012) and a form of multivariate Granger causality based on state-space models (Barnett & Seth, 2014, 2015).
These implementations in MNE are based on the original implementations of the methods in MATLAB (multivariate coherence code provided by the authors of the method, multivariate Granger causality code partially available here), with some additional features added such as options for dimensionality reduction to handle overfitting/non-full rank data, as well as the recovery of spatial topographies accompanying connectivity.
Whilst these implementations work in MNE, have an accompanying (albeit incomplete) suite of unit tests (
test_epochs_multivariate.py
, additions totest_utils.py
), a new utility for creating multivariate indices, as well as new classes for storing the multivariate connectivity measures (useful for e.g. allowing the storage of spatial topographies), we are not entirely satisfied with the 'cleanliness' of the code (there are also a number of code style discrepancies, incomplete documentation). Given the different considerations for handling multivariate connectivity in comparison to the existing bivariate connectivity measures, a lot of new code has needed to be added, and we have found it quite difficult to refactor the existing code used for the other connectivity measures (e.g. computation of cross-spectral densities).Some internal changes we imagined included moving the classes where connectivity is computed from
epochs.py
to a new fileepochs_classes.py
, as well as creating a new file for handling the functions where the connectivity methods are called (epochs_multivariate.py
, with a new function for calling these methodsmultivariate_spectral_connectivity_epochs()
). Without this,epochs.py
was simply becoming too large a file with the added code for the multivariate connectivity measures. Of course, we are happy to hear your thoughts on this matter.We have been working on these implementations for some time now, and we are at a stage where we are confident that the methods are working correctly (certainly from a mathematical standpoint, and also from the perspective of an end user more or less), it is now just a case of finding the correct internal implementation style, for which we would appreciate your insight. I noticed an issue was posted about the possibility of adding other multivariate methods (Issue #102); perhaps there are some common considerations we could think about to make any implementation of multivariate methods as flexible/expandable as possible.
We look forward to hearing your thoughts!