[GSOC] implement example of state-space model for connectivity #100

jadrew43 · 2022-06-14T23:52:22Z

PR Description

Google Summer of Code (2022) project

Closes #99

WIP: Linear Dynamic System (state-space model using EM algorithm to find autoregressive coefficients) to infer functional connectivity by interpreting autoregressive coefficients as connectivity strength. The model uses M/EEG data as input, and outputs time-varying autoregressive coefficients for source space labels.

Completed during GSoC

A user-friendly API that allows the user to work easily with MEG and/or EEG EEG data, following MNE-Python's local standards and conventions for usability as much as possible:
- Most of the code complexity is hidden from the user in the backend for the simplest interface:

data_path = mne.datasets.sample.data_path()

raw = mne.io.read_raw_fif(raw_fname).crop(tmax=60)
events = mne.find_events(raw, ...)

event_dict = {...}
epochs = mne.Epochs(raw, events, tmin=-0.2, tmax=0.7, event_id=event_dict,
                    preload=True).pick_types(meg=True,eeg=True)


fwd_fname = sample_folder / '....'
fwd = mne.read_forward_solution(fwd_fname)

cov_fname = sample_folder / 'sample_audvis-cov.fif'
cov = mne.read_cov(cov_fname)

label_names = ['Aud-lh', 'Aud-rh', 'Vis-lh', 'Vis-rh']
labels = [mne.read_label(sample_folder / 'labels' / f'{label}.label',
                          subject='sample') for label in label_names]

model = LDS(lam0=0, lam1=100)
model.add_subject('sample', condition, epochs, labels, fwd, cov)
model.fit(...)
model.fit()
At = model.A
assert At.shape == (len(labels), len(labels), len(epochs.times))

Preprocess it to a format meant to increase the SNR of the data
Downsample the data for faster processing.
Utilizes forward and covariance matrices in the API well as the labels for the regions of interest (ROIs) of the dataset
PCA is used to reduce the dimensionality of the data
model.fit using the Expectation Maximization algorithm to fit the autoregressive coefficients of the state-space model, mapping the sensor data to each ROI, and computing the connectivity strength between ROI pairs
Plotting of the time-varying coefficients in a matrix format to observe the strength of connection between each pair of ROIs
An example script showing how to use the function and interpret its outputs using the MNE-Python sample dataset
Basic unit tests have been written and partially incorporated

Check-out this link to see my weekly progress. All of the code in this PR is new to MNE-Python's repositories.

Todo after GSoC

Finish unit tests
Finish replacing redundant functionality with MNE-Python equivalents (e.g., scale_data with compute_whitener and dot products)
The current implementation uses autograd as a dependency, which is no longer actively developed (but still maintained). The code should be updated to incorporate JAX, which includes the features of autograd and is being actively developed (autograd_linalg should be replaced by scipy.linalg).

Merge checklist

Maintainer, please confirm the following before merging:

…data

examples/mne_util.py

adam2392

Left some minor suggestions

Lmk when you want us to activate the CI pipelines + do a more in-depth look at the code!

examples/mne_util.py

examples/state_space_connectivity.py

larsoner

Looks like a good start, let's keep iterating here until we're happy with how the interface looks in the example!

examples/mne_util.py

examples/state_space_connectivity.py

cbrnr · 2022-06-19T06:42:15Z

Hello! Just wanted to say hi to include myself in the loop 😄. Could someone quickly explain what the aim of this PR is? Is it to add VAR model based connectivity estimates? State-space sounds like it's implemented as a Kalman filter. At the risk of sounding repetitive, but could we look at https://github.com/scot-dev/scot to see if anything could be re-used here? I've implemented least squares VAR estimation (optionally with regularization) to compute several popular (directed) connectivity measures (for a list see https://scot-dev.github.io/scot-doc/api/scot/scot.html#module-scot.connectivity).

jadrew43 · 2022-06-20T22:46:19Z

Hi @cbrnr you are correct that this PR aims to implement a Kalman filter using an AR model to measure connectivity. Reviewing SCoT is still on my to-do list, thanks for the reminder.

… moved functions into LDS()

larsoner · 2022-07-05T19:05:18Z

From a quick chat with Jordan, here is what we fleshed out a bit based on my suggestion for the public API:

Internal implementation sketch and public API

# Internal code

class MEGLDS:

    def __init__(self, ...):
        ...
        self._subject_data = dict()

    def add_subject(subject, forward, cov, ...):
        self._subject_data[subject] = dict()
        self._subject_data[subject]['G'] = something(forward, ...)
        self._subject_data[subject]['C'] = something_else(cov, ...)

    def fit(self):
        Gs = np.array([val['G'] for val in self._subject_data.values()])
        ...


# User API should only have:
# - subjects_dir, then per-subject:
# - subject
# - Forward
# - Covariance
# - Epochs
# - list of Label

data_path = mne.datasets.sample.data_path()
subjects_dir = data_path / 'subjects'
model = MEGLDS(lambda0, lambda1, ...)
forward = mne.read_forward_solution(...)
cov = mne.read_covariance(...)
labels = list()
label_names = ('Aud.lh', 'Aud.rh', 'Visual.lh', 'Visual.rh')
for name in label_names:
    labels.append(mne.read_label(subjects_dir / 'sample' / 'labels' / name))
model.add_subject('sample', subjects_dir=subjects_dir, labels=labels,
                  forward=forward, cov=cov)
...
model.fit()
At = model.A
assert At.shape == (len(labels), len(labels), len(epochs.times))

EDIT: I resolved the conversations above where I talk about this since I think it's general enough to discuss in this main thread rather than inline

…ty.py

jadrew43 · 2022-07-07T21:40:41Z

examples/state_space_connectivity.py is functioning properly on my machine with the output depicted below. I think for a single subject, these results look good. Along the diagonal, values are close to 1, as expected for a computation similar to an autocorrelation. For the condition auditory/left there seems to be a connection from Aud-lh to Aud-rh as seen by the non-zero values in graph [0,1]. I expect measurements to be less noisy when running for a large number of subjects. Please run CI checks.

x-axis: time (seconds); y-axis: connectivity strength (autoregressive coefficients)

jadrew43 · 2022-07-07T21:46:08Z

I am at a research conference for the next 7 days. When I return I have the following to-do:

work to replace scale_sensor_data with mne.make_ad_hoc_cov() and compute_whitener()
test for different conditions; see how results change
use different dataset with multiple subjects for sanity check

Looking forward to your feedback!

examples/state_space_connectivity.py

examples/megssm/mne_util.py

examples/state_space_connectivity.py

examples/megssm/models.py

larsoner · 2022-07-08T18:41:21Z

examples/megssm/models.py

+        self._roi_cov_0 = roi_cov_0
+
+    @property
+    def log_sigsq_lst(self):


All of this stuff looks like it should be private to me. That also means we don't need a @property or a @whatever.setter for each of them. We just use private attributes directly

So I should be able to remove all of these @property and @some.setter lines right? Does that mean there needs to be additional checks on variable format somewhere? I don't think any of these have any checks as they are written here. I'd also like to talk about which vars and functions to make private. In my mind, all of the vars and functions in models.py should be private.

larsoner · 2022-07-08T18:51:23Z

examples/megssm/mne_util.py

+
+    # retrieve forward and sensor covariance
+    fwd_src_snsr = fwd['sol']['data'].copy()
+    snsr_cov = cov.data.copy()


... probably here you do

epochs = epochs.copy().pick('data', exclude='bads') snsr_cov = pick_channels_cov(cov, epochs.ch_names, ordered=True).data fwd = convert_forward_solution(fwd, force_fixed=True) fwd_src_snsr = pick_channels_forward(fwd, epochs.ch_names, ordered=True)['sol']['data'] data = epochs.get_data() info = epochs.info del cov, fwd, epochs

now you are guaranteed that data, fwd_src_snsr, and snsr_cov all have the same set of channels in the same order. Then to rescale them all you can just do:

rescale_cov = make_ad_hoc_cov(epochs.info) scaler = compute_whitener(info, rescale_cov) del rescale_cov fwd_src_snsr = scaler @ fwd_src_snsr snsr_cov = scaler @ snsr_cov @ scaler.T data = scaler @ epochs.get_data() # @ nicely knows to operate over the last two dims of (epochs, chs, time)

or something similar. Basically you let the MNE pick_channels_* make sure all channels are present and ordered properly, then make use of MNE functions to do the rescaling.

Previously you said to play around with these functions to see if it would produce the same outputs as the scale_sensor_data function. The original function was using a scale of 1 for each of the channel types so nothing was changed. This scaling does alter the data, primarily in a significant increase in the number of principal components (PCA run immediately after scaling function). Also, the output of the model is much noisier using @ scaler, I think as expected with the significant increase in PCs. Let's talk to make sure this is working as expected.

Output using mne functions for scaling seems much noisier.

Compared to no scaling (or previous function with scale of 1 for each channel type).

The original function was using a scale of 1 for each of the channel types so nothing was changed.

This is the equivalent of using mne.make_ad_hoc_cov(..., std=1.). So if you want this mode you can get it easily with a very small change. Can you verify this looks the same as the old code?

This scaling does alter the data, primarily in a significant increase in the number of principal components (PCA run immediately after scaling function).

This is to be expected. If you do no scaling and have MEG and EEG data, your EEG data will be ~6 orders of magnitude larger or so, so you will end up only looking at EEG data.

Also, the output of the model is much noisier using @ scaler, I think as expected with the significant increase in PCs. Let's talk to make sure this is working as expected.

What's not clear to me: is it also noisier (and noisier in the same way) when using the old code path but enabling scaling?

In other words: is this a problem with the new way of scaling, or is this a problem that has always existed with scaling the data, regardless of scaling method?

We can chat about this today

…functions for scaling data

jadrew43 · 2022-07-22T00:06:14Z

I am currently working to incorporate an old dataset to see if these scripts produce the expected results.

jadrew43 · 2022-07-22T22:41:10Z

Hey @adam2392 one of the CI errors is due to autograd: ModuleNotFoundError: No module named 'autograd'. I'd really like to move to the next step of the proposal and work to integrate jax (to replace autograd) later on in order to keep pace with the summer milestones. Is it alright if I install autograd in the dependencies for mnedev (assuming that will fix the error) for now and work to integrate jax later on?

adam2392 · 2022-07-23T00:42:49Z

Yeah sure! You can include it as an optional Dev dependency. Do you know where to add? Please also add a comment if you can that we replace it later on.

Perhaps we can start another sep issue to track migration to jax later on?

jadrew43 · 2022-07-25T17:43:20Z

Yeah sure! You can include it as an optional Dev dependency. Do you know where to add? Please also add a comment if you can that we replace it later on.

Perhaps we can start another sep issue to track migration to jax later on?

@adam2392 Cool! I do not where to add autograd as an optional Dev dependency. Should it be installed in mnedev?

Where should I leave the comment that it can be replaced with jax later on? Just start a new issue on Github with the suggestion?

adam2392 · 2022-07-25T17:56:00Z

Yeah sure! You can include it as an optional Dev dependency. Do you know where to add? Please also add a comment if you can that we replace it later on.
Perhaps we can start another sep issue to track migration to jax later on?

@adam2392 Cool! I do not where to add autograd as an optional Dev dependency. Should it be installed in mnedev?

You can add it here: https://github.com/mne-tools/mne-connectivity/blob/main/requirements_testing.txt
and then just add a comment e.g.

...
# TODO: we will replace this with Jax
autograd

Then when you import autograd, you should maybe import it within the functions you want to use and not at the top of the Python file. That way there is no error when someone tries to run import mne_connectivity if they don't have autograd/jax installed. For example, this function inside MNE-Python needs pyqt, but imports it within the function so mne still works if the user only has numpy/scipy installed.

Where should I leave the comment that it can be replaced with jax later on? Just start a new issue on Github with the suggestion?

Yeah I'll create the GH issue.

jadrew43 · 2022-07-25T20:13:44Z

Yeah sure! You can include it as an optional Dev dependency. Do you know where to add? Please also add a comment if you can that we replace it later on.
Perhaps we can start another sep issue to track migration to jax later on?

@adam2392 Cool! I do not where to add autograd as an optional Dev dependency. Should it be installed in mnedev?

You can add it here: https://github.com/mne-tools/mne-connectivity/blob/main/requirements_testing.txt and then just add a comment e.g.
...
# TODO: we will replace this with Jax
autograd
Then when you import autograd, you should maybe import it within the functions you want to use and not at the top of the Python file. That way there is no error when someone tries to run import mne_connectivity if they don't have autograd/jax installed. For example, this function inside MNE-Python needs pyqt, but imports it within the function so mne still works if the user only has numpy/scipy installed.

Where should I leave the comment that it can be replaced with jax later on? Just start a new issue on Github with the suggestion?

Yeah I'll create the GH issue.

Ok I think that's complete. Thanks!

jadrew43 · 2022-08-04T22:32:08Z

Update: I got the code to work for an old dataset, however the results are not what I expected. I am currently working to get the sample data (much smaller than the old dataset) to work on my original model. This output will be my new ground truth and I can work iteratively to make sure each edit I make to the original model to conform it the MNE-Python API standards produces the same output. Lesson learned - have a simple ground truth to work with from the beginning :)

jadrew43 · 2022-08-08T20:28:11Z

Here is the output of my original model using the sample dataset for the auditory/left condition for the 12 ROIs from a different dataset. My next step is to get this same output in the mnedev environment. Then I will work to use only the 4 ROIs commonly used with the sample dataset. Then piece by piece I will recreate the API.

larsoner · 2022-08-09T01:41:47Z

Excellent!

A good next step is to make sure all random seeds can be set such that if you run this again you get the exact same output (to numerical precision at least). Then you can save the At result to disk now and compare to it each time you replace some piece of code

jadrew43 · 2022-08-11T18:40:01Z

I have changed the PCA method from being based on the rank of the matrix to a method based on explaining 99% of the variance. This method allows the fitting of the model to run much faster as it produces 147 principal components vs 360 components produced from the rank method. The model output is noticeably but not extremely different. My next step is to perform the processing steps using the 4 labels provided in the sample dataset, which should reduce processing time even further. All processing was completed within the mnedev conda environment.

jadrew43 · 2022-08-11T22:19:18Z

Processing completed with 4 labels from sample.

jadrew43 · 2022-08-24T23:28:50Z

Bootstrapping of epochs, and PCA of epochs.get_data() and forward matrices completed in API. Model fitting completed in command line model. Output from API (LDS) compared to command line model (MEGLDS) are not identical but extremely similar.

larsoner · 2022-08-24T23:34:06Z

Nice!

It would be good to know what the differences are that make them not identical, but really if this version of the API works on our UW data as well (maybe even just for one subject?) then I'd say you could use this as the "ground truth" for correctness of additional changes!

jadrew43 · 2022-08-27T00:07:35Z

@larsoner Can you look at megssm/mne_util.py L114. Am I using the scaler correctly? Because the results do not agree with the original _scale_sensor_data (L140). Thanks.

I'll get to the CI checks first thing Monday!

larsoner · 2022-08-27T12:14:09Z

examples/megssm/mne_util.py

+    def fwd_src_roi(self, val):
+        self._fwd_src_roi = val
+
+    @property


@jadrew43 this is line 114 and it's not related to scaling. There is some stuff below that is. Can you confirm by commenting on the correct lines/lines in the "Files" tab of this PR? This not being the correct line makes me wonder if you forgot a push, and I don't want to look prematurely.

The best way to help me check would be to make it very easy to run and check the results. For example, add a commented-out (to make CIs happy) # np.testing.assert_allclose(data, data_mne, rtol=..., atol=...) statement that fails (when you uncomment it) where you scale data the old way and data_mne the new way. Then I need a minimal script (hopefully runs in just a few seconds, and on sample data for example) I can run on this branch that will fail when I uncomment the assert_allclose line

The updated files are in state_space/, I haven't done anything in examples/ in weeks so I went ahead and deleted the files I had in there. Running state_space_connectivity.py takes about 90 seconds, adding _mne_scale_sensor_data() adds about 60 seconds. Happy to chat about building tests that take much less time. The commented testing line is L228 in mne_util.py.

Hmm, I don't think this is quite a minimal reproducible bit of code.

When I tried to run the code, it failed because I needed autograd. I installed that, then it complained that I needed autograd_linalg. I can't find this on pip or conda, so I tried swapping in from scipy.linalg import solve_triangular.

Next I tried to run the script and got:

FileNotFoundError: [Errno 2] No such file or directory: '/home/larsoner/mne_data/MNE-sample-data/MEG/sample/labels/AUD-lh.label'

This makes sense, usually labels are in the SUBJECTS_DIR. But even then there aren't auditory labels in sample. So let's change it to something that should work I think:

regexp = '^(G_temp_sup-Plan_tempo.*|Pole_occipital)' labels = mne.read_labels_from_annot( 'sample', 'aparc.a2009s', regexp=regexp, subjects_dir=subjects_dir) label_names = [label.name for label in labels] assert len(label_names) == 4

Then I got a bit farther, until I hit:

AttributeError: python: undefined symbol: mkl_get_max_threads

So I then disabled all the thread-setting stuff, and then get to:

File "/home/larsoner/python/scipy/scipy/linalg/_basic.py", line 336, in solve_triangular raise ValueError('expected square matrix') ValueError: expected square matrix

So my swap of scipy.linalg is not correct.

I don't think I can proceed until I can install autograd_linalg somehow, how do you recommend doing this?

Try this:
pip install --src deps -e git+ssh://git@github.com/nfoti/autograd_linalg.git@master#egg=autograd_linalg

I pushed a commit for a solve_triangular that allows the code to run on my machine. I can look into the issue now!

adam2392 · 2022-10-03T16:25:51Z

Hi @jadrew43 and @larsoner any help needed to review code / look at prelim results here? Feel free to lmk where I can help!

larsoner · 2022-10-03T16:30:58Z

IIRC code is still WIP / needs to be systematically converted to MNE conventions, but some bugs have been found along the way which is good!

jadrew43 added 2 commits June 14, 2022 16:42

implement example of state-space model for connectivity

05eb0a2

mne_util imported from MEGLDS repo; some incompatibility with sample-…

fb40a1d

…data

adam2392 changed the title ~~implement example of state-space model for connectivity~~ [GSOC] implement example of state-space model for connectivity Jun 17, 2022

adam2392 reviewed Jun 17, 2022

View reviewed changes

examples/mne_util.py Outdated Show resolved Hide resolved

adam2392 reviewed Jun 17, 2022

View reviewed changes

examples/mne_util.py Outdated Show resolved Hide resolved

larsoner reviewed Jun 17, 2022

View reviewed changes

examples/state_space_connectivity.py Outdated Show resolved Hide resolved

larsoner reviewed Jun 17, 2022

View reviewed changes

adam2392 mentioned this pull request Jun 21, 2022

Is there a way to show the "directory" tree of the files in each mne.datasets? mne-tools/mne-python#10799

Open

jadrew43 added 3 commits June 23, 2022 15:15

added Yang et al 2016 to bib

c653564

changed var names (not in models.py), added folder for LDS execution,…

60a04ad

… moved functions into LDS()

does scale_sensor_data() == whitening?

97b9826

jadrew43 added 2 commits July 6, 2022 16:21

working on suggested API`

87269d7

API functioning properly, please test examples/state_space_connectivi…

93755ac

…ty.py

jadrew43 marked this pull request as ready for review July 7, 2022 21:46

larsoner reviewed Jul 8, 2022

View reviewed changes

files moved to separate folder, descriptive variable names used, mne …

aca431c

…functions for scaling data

adam2392 mentioned this pull request Jul 25, 2022

[State-space model; dependent on #100] Replace autograd with Jax #103

Open

SNR boost added via bootstrap_subject function

463b804

model working; _scale_sensor_data needs changed to MNE functions

5164d6c

larsoner reviewed Aug 27, 2022

View reviewed changes

jadrew43 and others added 8 commits August 29, 2022 12:44

Deleted files from /examples. Find files in /state_space

460fbbb

compare _mne_scale_sensor_data to _scale_sensor_data

e60d175

FIX: More easily runnable

71c6539

FIX: Now equiv

b05fd71

FIX: Better scaler (but still wrong)

39c1d1b

FIX: Small

e7ffafa

Final committ for GSOC - code cleaned and functioning properly

e63252a

fixed merge issue

4ea5b8a

[GSOC] implement example of state-space model for connectivity #100

Are you sure you want to change the base?

[GSOC] implement example of state-space model for connectivity #100

Conversation

jadrew43 commented Jun 14, 2022 • edited Loading

PR Description

Google Summer of Code (2022) project

Completed during GSoC

Todo after GSoC

Merge checklist

adam2392 left a comment

Choose a reason for hiding this comment

larsoner left a comment

Choose a reason for hiding this comment

cbrnr commented Jun 19, 2022 • edited Loading

jadrew43 commented Jun 20, 2022

larsoner commented Jul 5, 2022 • edited Loading

jadrew43 commented Jul 7, 2022 • edited Loading

jadrew43 commented Jul 7, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jadrew43 commented Jul 22, 2022

jadrew43 commented Jul 22, 2022

adam2392 commented Jul 23, 2022

jadrew43 commented Jul 25, 2022

adam2392 commented Jul 25, 2022

jadrew43 commented Jul 25, 2022

jadrew43 commented Aug 4, 2022

jadrew43 commented Aug 8, 2022

larsoner commented Aug 9, 2022

jadrew43 commented Aug 11, 2022 • edited Loading

jadrew43 commented Aug 11, 2022 • edited Loading

jadrew43 commented Aug 24, 2022

larsoner commented Aug 24, 2022

jadrew43 commented Aug 27, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adam2392 commented Oct 3, 2022

larsoner commented Oct 3, 2022

jadrew43 commented Jun 14, 2022 •

edited

Loading

cbrnr commented Jun 19, 2022 •

edited

Loading

larsoner commented Jul 5, 2022 •

edited

Loading

jadrew43 commented Jul 7, 2022 •

edited

Loading

jadrew43 commented Jul 7, 2022 •

edited

Loading

jadrew43 commented Aug 11, 2022 •

edited

Loading

jadrew43 commented Aug 11, 2022 •

edited

Loading