-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GSOC] implement example of state-space model for connectivity #100
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some minor suggestions
Lmk when you want us to activate the CI pipelines + do a more in-depth look at the code!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a good start, let's keep iterating here until we're happy with how the interface looks in the example!
Hello! Just wanted to say hi to include myself in the loop 😄. Could someone quickly explain what the aim of this PR is? Is it to add VAR model based connectivity estimates? State-space sounds like it's implemented as a Kalman filter. At the risk of sounding repetitive, but could we look at https://github.com/scot-dev/scot to see if anything could be re-used here? I've implemented least squares VAR estimation (optionally with regularization) to compute several popular (directed) connectivity measures (for a list see https://scot-dev.github.io/scot-doc/api/scot/scot.html#module-scot.connectivity). |
Hi @cbrnr you are correct that this PR aims to implement a Kalman filter using an AR model to measure connectivity. Reviewing SCoT is still on my to-do list, thanks for the reminder. |
From a quick chat with Jordan, here is what we fleshed out a bit based on my suggestion for the public API: Internal implementation sketch and public API
EDIT: I resolved the conversations above where I talk about this since I think it's general enough to discuss in this main thread rather than inline |
x-axis: time (seconds); y-axis: connectivity strength (autoregressive coefficients) |
I am at a research conference for the next 7 days. When I return I have the following to-do:
Looking forward to your feedback! |
examples/megssm/models.py
Outdated
self._roi_cov_0 = roi_cov_0 | ||
|
||
@property | ||
def log_sigsq_lst(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this stuff looks like it should be private to me. That also means we don't need a @property
or a @whatever.setter
for each of them. We just use private attributes directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I should be able to remove all of these @property
and @some.setter
lines right? Does that mean there needs to be additional checks on variable format somewhere? I don't think any of these have any checks as they are written here. I'd also like to talk about which vars and functions to make private. In my mind, all of the vars and functions in models.py
should be private.
examples/megssm/mne_util.py
Outdated
|
||
# retrieve forward and sensor covariance | ||
fwd_src_snsr = fwd['sol']['data'].copy() | ||
snsr_cov = cov.data.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... probably here you do
epochs = epochs.copy().pick('data', exclude='bads')
snsr_cov = pick_channels_cov(cov, epochs.ch_names, ordered=True).data
fwd = convert_forward_solution(fwd, force_fixed=True)
fwd_src_snsr = pick_channels_forward(fwd, epochs.ch_names, ordered=True)['sol']['data']
data = epochs.get_data()
info = epochs.info
del cov, fwd, epochs
now you are guaranteed that data
, fwd_src_snsr
, and snsr_cov
all have the same set of channels in the same order. Then to rescale them all you can just do:
rescale_cov = make_ad_hoc_cov(epochs.info)
scaler = compute_whitener(info, rescale_cov)
del rescale_cov
fwd_src_snsr = scaler @ fwd_src_snsr
snsr_cov = scaler @ snsr_cov @ scaler.T
data = scaler @ epochs.get_data() # @ nicely knows to operate over the last two dims of (epochs, chs, time)
or something similar. Basically you let the MNE pick_channels_*
make sure all channels are present and ordered properly, then make use of MNE functions to do the rescaling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously you said to play around with these functions to see if it would produce the same outputs as the scale_sensor_data
function. The original function was using a scale of 1 for each of the channel types so nothing was changed. This scaling does alter the data, primarily in a significant increase in the number of principal components (PCA run immediately after scaling function). Also, the output of the model is much noisier using @ scaler
, I think as expected with the significant increase in PCs. Let's talk to make sure this is working as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original function was using a scale of 1 for each of the channel types so nothing was changed.
This is the equivalent of using mne.make_ad_hoc_cov(..., std=1.)
. So if you want this mode you can get it easily with a very small change. Can you verify this looks the same as the old code?
This scaling does alter the data, primarily in a significant increase in the number of principal components (PCA run immediately after scaling function).
This is to be expected. If you do no scaling and have MEG and EEG data, your EEG data will be ~6 orders of magnitude larger or so, so you will end up only looking at EEG data.
Also, the output of the model is much noisier using @ scaler, I think as expected with the significant increase in PCs. Let's talk to make sure this is working as expected.
What's not clear to me: is it also noisier (and noisier in the same way) when using the old code path but enabling scaling?
In other words: is this a problem with the new way of scaling, or is this a problem that has always existed with scaling the data, regardless of scaling method?
We can chat about this today
…functions for scaling data
I am currently working to incorporate an old dataset to see if these scripts produce the expected results. |
Hey @adam2392 one of the CI errors is due to autograd: |
Yeah sure! You can include it as an optional Dev dependency. Do you know where to add? Please also add a comment if you can that we replace it later on. Perhaps we can start another sep issue to track migration to jax later on? |
@adam2392 Cool! I do not where to add autograd as an optional Dev dependency. Should it be installed in mnedev? Where should I leave the comment that it can be replaced with jax later on? Just start a new issue on Github with the suggestion? |
You can add it here: https://github.com/mne-tools/mne-connectivity/blob/main/requirements_testing.txt
Then when you import autograd, you should maybe import it within the functions you want to use and not at the top of the Python file. That way there is no error when someone tries to run
Yeah I'll create the GH issue. |
Ok I think that's complete. Thanks! |
Update: I got the code to work for an old dataset, however the results are not what I expected. I am currently working to get the sample data (much smaller than the old dataset) to work on my original model. This output will be my new ground truth and I can work iteratively to make sure each edit I make to the original model to conform it the MNE-Python API standards produces the same output. Lesson learned - have a simple ground truth to work with from the beginning :) |
Excellent! A good next step is to make sure all random seeds can be set such that if you run this again you get the exact same output (to numerical precision at least). Then you can save the At result to disk now and compare to it each time you replace some piece of code |
Nice! It would be good to know what the differences are that make them not identical, but really if this version of the API works on our UW data as well (maybe even just for one subject?) then I'd say you could use this as the "ground truth" for correctness of additional changes! |
@larsoner Can you look at I'll get to the CI checks first thing Monday! |
examples/megssm/mne_util.py
Outdated
def fwd_src_roi(self, val): | ||
self._fwd_src_roi = val | ||
|
||
@property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jadrew43 this is line 114 and it's not related to scaling. There is some stuff below that is. Can you confirm by commenting on the correct lines/lines in the "Files" tab of this PR? This not being the correct line makes me wonder if you forgot a push, and I don't want to look prematurely.
The best way to help me check would be to make it very easy to run and check the results. For example, add a commented-out (to make CIs happy) # np.testing.assert_allclose(data, data_mne, rtol=..., atol=...)
statement that fails (when you uncomment it) where you scale data
the old way and data_mne
the new way. Then I need a minimal script (hopefully runs in just a few seconds, and on sample
data for example) I can run on this branch that will fail when I uncomment the assert_allclose
line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The updated files are in state_space/
, I haven't done anything in examples/
in weeks so I went ahead and deleted the files I had in there. Running state_space_connectivity.py
takes about 90 seconds, adding _mne_scale_sensor_data()
adds about 60 seconds. Happy to chat about building tests that take much less time. The commented testing line is L228 in mne_util.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I don't think this is quite a minimal reproducible bit of code.
When I tried to run the code, it failed because I needed autograd
. I installed that, then it complained that I needed autograd_linalg
. I can't find this on pip
or conda
, so I tried swapping in from scipy.linalg import solve_triangular
.
Next I tried to run the script and got:
FileNotFoundError: [Errno 2] No such file or directory: '/home/larsoner/mne_data/MNE-sample-data/MEG/sample/labels/AUD-lh.label'
This makes sense, usually labels are in the SUBJECTS_DIR. But even then there aren't auditory labels in sample
. So let's change it to something that should work I think:
regexp = '^(G_temp_sup-Plan_tempo.*|Pole_occipital)'
labels = mne.read_labels_from_annot(
'sample', 'aparc.a2009s', regexp=regexp, subjects_dir=subjects_dir)
label_names = [label.name for label in labels]
assert len(label_names) == 4
Then I got a bit farther, until I hit:
AttributeError: python: undefined symbol: mkl_get_max_threads
So I then disabled all the thread-setting stuff, and then get to:
File "/home/larsoner/python/scipy/scipy/linalg/_basic.py", line 336, in solve_triangular
raise ValueError('expected square matrix')
ValueError: expected square matrix
So my swap of scipy.linalg
is not correct.
I don't think I can proceed until I can install autograd_linalg
somehow, how do you recommend doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try this:
pip install --src deps -e git+ssh://git@github.com/nfoti/autograd_linalg.git@master#egg=autograd_linalg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a commit for a solve_triangular
that allows the code to run on my machine. I can look into the issue now!
IIRC code is still WIP / needs to be systematically converted to MNE conventions, but some bugs have been found along the way which is good! |
PR Description
Google Summer of Code (2022) project
Closes #99
WIP: Linear Dynamic System (state-space model using EM algorithm to find autoregressive coefficients) to infer functional connectivity by interpreting autoregressive coefficients as connectivity strength. The model uses M/EEG data as input, and outputs time-varying autoregressive coefficients for source space labels.
Completed during GSoC
model.fit
using the Expectation Maximization algorithm to fit the autoregressive coefficients of the state-space model, mapping the sensor data to each ROI, and computing the connectivity strength between ROI pairssample
datasetCheck-out this link to see my weekly progress. All of the code in this PR is new to MNE-Python's repositories.
Todo after GSoC
scale_data
withcompute_whitener
and dot products)autograd
as a dependency, which is no longer actively developed (but still maintained). The code should be updated to incorporateJAX
, which includes the features ofautograd
and is being actively developed (autograd_linalg
should be replaced byscipy.linalg
).Merge checklist
Maintainer, please confirm the following before merging: