Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GSOC] Add surrogate data generation for significant connectivity estimation #223

Open
wants to merge 30 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
6debece
Add EpochsSpectrum support
tsbinns Aug 1, 2024
b517e79
Fix CircleCI
tsbinns Aug 1, 2024
dccecdc
Add reminder to change version
tsbinns Aug 1, 2024
45bf496
Update Spectrum skips
tsbinns Aug 1, 2024
0b8790b
Fix broken tests and version checking
tsbinns Aug 1, 2024
dac4153
Be explicit with intersphinx roles
tsbinns Aug 1, 2024
121b40a
Change empty weights contruction
tsbinns Aug 1, 2024
00c381a
Merge branch 'main' into spec_conn_spectrum_support
tsbinns Aug 1, 2024
ffb256d
Add surrogate data generation
tsbinns Aug 2, 2024
30dc4e7
Adjust n_jobs
tsbinns Aug 6, 2024
2a9c1e2
Adjust n_jobs
tsbinns Aug 6, 2024
50ea4cd
Try CPU count // 3
tsbinns Aug 6, 2024
9e4a11d
Try CPU count // 4
tsbinns Aug 6, 2024
de52286
Use n_jobs=1
tsbinns Aug 6, 2024
cece6a4
Reset tests to upstream-main
tsbinns Aug 7, 2024
17f30cd
Expand test coverage
tsbinns Aug 5, 2024
cabe170
Merge branch 'main' into add_surrogate_conn
tsbinns Aug 7, 2024
a0ebe2a
Update example from review
tsbinns Aug 7, 2024
ca6013d
Update surrogate example
tsbinns Aug 19, 2024
6114bd5
Merge branch 'main' into add_surrogate_conn
tsbinns Aug 20, 2024
7197a17
Merge branch 'add_surrogate_conn' of https://github.com/tsbinns/mne-c…
tsbinns Aug 20, 2024
bd180f0
Merge branch 'main' into add_surrogate_conn
larsoner Aug 20, 2024
88e081b
Merge branch 'main' into add_surrogate_conn
tsbinns Aug 20, 2024
6d1c9bb
Merge branch 'main' into add_surrogate_conn
tsbinns Sep 3, 2024
abfc9ab
Merge branch 'main' into add_surrogate_conn
tsbinns Sep 10, 2024
a710c43
Merge branch 'main' into add_surrogate_conn
tsbinns Oct 7, 2024
f90a58a
Merge branch 'main' into add_surrogate_conn
tsbinns Oct 31, 2024
9dbb10d
Merge branch 'main' into add_surrogate_conn
tsbinns Nov 11, 2024
c7df2da
Merge branch 'main' into add_surrogate_conn
tsbinns Nov 12, 2024
64a3224
Update example from review
tsbinns Nov 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -111,4 +111,5 @@ Dataset functions
.. autosummary::
:toctree: generated/

make_signals_in_freq_bands
make_signals_in_freq_bands
make_surrogate_data
21 changes: 21 additions & 0 deletions doc/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,16 @@ @article{Dawson_2016
year = {2016}
}

@article{DowdingHaufe2018,
title={Powerful statistical inference for nested data using sufficient summary statistics},
author={Dowding, Irene and Haufe, Stefan},
doi={10.3389/fnhum.2018.00103},
journal={Frontiers in Human Neuroscience},
volume={12},
pages={103},
year={2018}
}

@article{EwaldEtAl2012,
author = {Ewald, Arne and Marzetti, Laura and Zappasodi, Filippo and Meinecke, Frank C. and Nolte, Guido},
doi = {10.1016/j.neuroimage.2011.11.084},
Expand Down Expand Up @@ -185,6 +195,17 @@ @book{OppenheimEtAl1999
year = {1999}
}

@article{PellegriniEtAl2023,
title={Identifying good practices for detecting inter-regional linear functional connectivity from {EEG}},
author={Pellegrini, Franziska and Delorme, Arnaud and Nikulin, Vadim and Haufe, Stefan},
doi={10.1016/j.neuroimage.2023.120218},
journal={NeuroImage},
volume={277},
pages={120218},
year={2023},
publisher={Elsevier}
}

@book{SekiharaNagarajan2008,
author = {Sekihara, Kensuke and Nagarajan, Srikantan S.},
doi = {10.1007/978-3-540-79370-0},
Expand Down
353 changes: 353 additions & 0 deletions examples/surrogate_connectivity.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,353 @@
"""
==================================================================================
Determine the significance of connectivity estimates against baseline connectivity
==================================================================================

This example demonstrates how surrogate data can be generated to assess whether
connectivity estimates are significantly greater than baseline.
"""

# Author: Thomas S. Binns <t.s.binns@outlook.com>
# License: BSD (3-clause)
# sphinx_gallery_thumbnail_number = 3

# %%

import matplotlib.pyplot as plt
import mne
import numpy as np
from mne.datasets import somato

from mne_connectivity import make_surrogate_data, spectral_connectivity_epochs

########################################################################################
# Background
# ----------
#
# When performing connectivity analyses, we often want to know whether the results we
# observe reflect genuine interactions between signals. We can assess this by performing
# statistical tests between our connectivity estimates and a 'baseline' level of
# connectivity. However, due to factors such as background noise and sample
# size-dependent biases (see e.g. :footcite:`VinckEtAl2010`), it is often not
# appropriate to treat 0 as this baseline. Therefore, we need a way to estimate the
# baseline level of connectivity.
#
# One approach is to manipulate the original data in such a way that the covariance
# structure is destroyed, creating surrogate data. Connectivity estimates from the
# original and surrogate data can then be compared to determine whether the original
# data contains significant interactions.
#
# Such surrogate data can be easily generated in MNE using the
# :func:`~mne_connectivity.make_surrogate_data` function, which shuffles epoched data
# independently across channels :footcite:`PellegriniEtAl2023` (see the Notes section of
# the function for more information). In this example, we will demonstrate how surrogate
# data can be created, and how you can use this to assess the statistical significance
# of your connectivity estimates.

########################################################################################
# Loading the data
# ----------------
#
# We start by loading from the :ref:`somato-dataset` dataset, MEG data showing
# event-related activity in response to somatosensory stimuli. We construct epochs
# around these events in the time window [-1.5, 1.0] seconds.

# %%

# Load data
data_path = somato.data_path()
raw_fname = data_path / "sub-01" / "meg" / "sub-01_task-somato_meg.fif"
raw = mne.io.read_raw_fif(raw_fname)
events = mne.find_events(raw, stim_channel="STI 014")

# Pre-processing
raw.pick("grad").load_data() # focus on gradiometers
raw.filter(1, 35)
raw, events = raw.resample(sfreq=100, events=events) # reduce compute time

# Construct epochs around events
epochs = mne.Epochs(
raw, events, event_id=1, tmin=-1.5, tmax=1.0, baseline=(-0.5, 0), preload=True
)
epochs = epochs[:30] # select a subset of epochs to speed up computation

########################################################################################
# Assessing connectivity in non-evoked data
# -----------------------------------------
#
# We will first demonstrate how connectivity can be assessed from non-evoked data. In
# this example, we use data from the pre-trial period of [-1.5, -0.5] seconds. We
# compute Fourier coefficients of the data using the :meth:`~mne.Epochs.compute_psd`
# method with ``output="complex"`` (note that this requires ``mne >= 1.8``).
#
# Next, we pass these coefficients to
# :func:`~mne_connectivity.spectral_connectivity_epochs` to compute connectivity using
# the imaginary part of coherency (``imcoh``). Our indices specify that connectivity
# should be computed between all pairs of channels.

# %%

# Compute Fourier coefficients for pre-trial data
fmin, fmax = 3, 23
pretrial_coeffs = epochs.compute_psd(
fmin=fmin, fmax=fmax, tmin=None, tmax=-0.5, output="complex"
)
freqs = pretrial_coeffs.freqs

# Compute connectivity for pre-trial data
indices = np.tril_indices(epochs.info["nchan"], k=-1) # all-to-all connectivity
pretrial_con = spectral_connectivity_epochs(
pretrial_coeffs, method="imcoh", indices=indices
)

########################################################################################
# Next, we generate the surrogate data by passing the Fourier coefficients into the
# :func:`~mne_connectivity.make_surrogate_data` function. To get a reliable estimate of
# the baseline connectivity, we perform this shuffling procedure
# :math:`\text{n}_{\text{shuffle}}` times, producing :math:`\text{n}_{\text{shuffle}}`
# surrogate datasets. We can then iterate over these shuffles and compute the
# connectivity for each one.

# %%

# Generate surrogate data
n_shuffles = 100 # recommended is >= 1,000; limited here to reduce compute time
pretrial_surrogates = make_surrogate_data(
pretrial_coeffs, n_shuffles=n_shuffles, rng_seed=44
)

# Compute connectivity for surrogate data
surrogate_con = []
for shuffle_i, surrogate in enumerate(pretrial_surrogates, 1):
print(f"Computing connectivity for shuffle {shuffle_i} of {n_shuffles}")
surrogate_con.append(
spectral_connectivity_epochs(
surrogate, method="imcoh", indices=indices, verbose=False
)
)

########################################################################################
# We can plot the all-to-all connectivity of the pre-trial data against the surrogate
# data, averaged over all shuffles. This shows a strong degree of coupling in the alpha
# band (~8-12 Hz), with weaker coupling in the lower range of the beta band (~13-20 Hz).
# A simple visual inspection shows that connectivity in the alpha and beta bands are
# above the baseline level of connectivity estimated from the surrogate data. However,
# we need to confirm this statistically.

# %%

# Plot pre-trial vs. surrogate connectivity
fig, ax = plt.subplots(1, 1)
ax.plot(
freqs,
np.abs([surrogate.get_data() for surrogate in surrogate_con]).mean(axis=(0, 1)),
linestyle="--",
label="Surrogate",
)
ax.plot(freqs, np.abs(pretrial_con.get_data()).mean(axis=0), label="Original")
ax.set_xlabel("Frequency (Hz)")
ax.set_ylabel("Connectivity (A.U.)")
ax.set_title("All-to-all connectivity | Pre-trial ")
ax.legend()

########################################################################################
# Assessing the statistical significance of our connectivity estimates can be done with
# the following simple procedure :footcite:`PellegriniEtAl2023`
#
# :math:`p=\LARGE{\frac{\Sigma_{s=1}^Sc_s}{S}}` ,
#
# :math:`c_s=\{1\text{ if }\text{Con}\leq\text{Con}_{\text{s}}\text{ },\text{ }0
# \text{ if otherwise }` ,
#
# where: :math:`p` is our p-value; :math:`s` is a given shuffle iteration of :math:`S`
# total shuffles; and :math:`c` is a binary indicator of whether the true connectivity,
# :math:`\text{Con}`, is greater than the surrogate connectivity,
# :math:`\text{Con}_{\text{s}}`, for a given shuffle.
#
# Note that for connectivity methods which produce negative scores (e.g., imaginary part
# of coherency, time-reversed Granger causality, etc...), you should take the absolute
# values before testing. Similar adjustments should be made for methods that produce
# scores centred around non-zero values (e.g., 0.5 for directed phase lag index).
#
# Below, we determine the statistical significance of connectivity in the lower beta
# band. We simplify this by averaging over all connections and corresponding frequency
# bins. We could of course also test the significance of each connection, each frequency
# bin, or other frequency bands such as the alpha band. Naturally, any tests involving
# multiple connections, frequencies, and/or times should be corrected for multiple
# comparisons.
#
# The test confirms our visual inspection, showing that connectivity in the lower beta
# band is significantly above the baseline level of connectivity at an alpha of 0.05,
# which we can take as evidence of genuine interactions in this frequency band.

# %%

# Find indices of lower beta frequencies
beta_freqs = np.where((freqs >= 13) & (freqs <= 20))[0]

# Compute lower beta connectivity for pre-trial data (average connections and freqs)
beta_con_pretrial = np.abs(pretrial_con.get_data()[:, beta_freqs]).mean(axis=(0, 1))

# Compute lower beta connectivity for surrogate data (average connections and freqs)
beta_con_surrogate = np.abs(
[surrogate.get_data()[:, beta_freqs] for surrogate in surrogate_con]
).mean(axis=(1, 2))

# Compute p-value for pre-trial lower beta coupling
p_val = np.sum(beta_con_pretrial <= beta_con_surrogate) / n_shuffles
print(f"P = {p_val:.2f}")

########################################################################################
# Assessing connectivity in evoked data
# -------------------------------------
#
# When generating surrogate data, it is important to distinguish non-evoked data (e.g.,
# resting-state, pre/inter-trial data) from evoked data (where a stimulus is presented
# or an action performed at a set time during each epoch). Critically, evoked data
# contains a temporal structure that is consistent across epochs, and thus shuffling
# epochs across channels will fail to adequately disrupt the covariance structure.
#
# Any connectivity estimates will therefore overestimate the baseline connectivity in
# your data, increasing the likelihood of type II errors (see the Notes section of
# :func:`~mne_connectivity.make_surrogate_data` for more information, and see the
# section :ref:`inappropriate-surrogate-data` for a demonstration).
#
# **In cases where you want to assess connectivity in evoked data, you can use
# surrogates generated from non-evoked data (of the same subject).** Here we do just
# that, comparing connectivity estimates from the pre-trial surrogates to the evoked,
# post-stimulus response ([0, 1] second).
#
# Again, there is pronounced alpha coupling (stronger than in the pre-trial data) and
# weaker beta coupling, both of which appear to be above the baseline level of
# connectivity.

# %%

# Compute Fourier coefficients for post-stimulus data
poststim_coeffs = epochs.compute_psd(
fmin=fmin, fmax=fmax, tmin=0, tmax=None, output="complex"
)

# Compute connectivity for post-stimulus data
poststim_con = spectral_connectivity_epochs(
poststim_coeffs, method="imcoh", indices=indices
)

# Plot post-stimulus vs. (pre-trial) surrogate connectivity
fig, ax = plt.subplots(1, 1)
ax.plot(
freqs,
np.abs([surrogate.get_data() for surrogate in surrogate_con]).mean(axis=(0, 1)),
linestyle="--",
label="Surrogate",
)
ax.plot(freqs, np.abs(poststim_con.get_data()).mean(axis=0), label="Original")
ax.set_xlabel("Frequency (Hz)")
ax.set_ylabel("Connectivity (A.U.)")
ax.set_title("All-to-all connectivity | Post-stimulus")
ax.legend()

########################################################################################
# This is also confirmed by statistical testing, with connectivity in the lower beta
# band being significantly above the baseline level of connectivity. Thus, using
# surrogate connectivity estimates from non-evoked data provides a reliable baseline for
# assessing connectivity in evoked data.

# %%

# Compute lower beta connectivity for post-stimulus data (average connections and freqs)
beta_con_poststim = np.abs(poststim_con.get_data()[:, beta_freqs]).mean(axis=(0, 1))

# Compute p-value for post-stimulus lower beta coupling
p_val = np.sum(beta_con_poststim <= beta_con_surrogate) / n_shuffles
print(f"P = {p_val:.2f}")

########################################################################################
# .. _inappropriate-surrogate-data:
#
# Generating surrogate connectivity from inappropriate data
# ---------------------------------------------------------
# We discussed above how surrogates generated from evoked data risk overestimating the
# degree of baseline connectivity. We demonstrate this below by generating surrogates
# from the post-stimulus data.

# %%

# Generate surrogates from evoked data
poststim_surrogates = make_surrogate_data(
poststim_coeffs, n_shuffles=n_shuffles, rng_seed=44
)

# Compute connectivity for evoked surrogate data
bad_surrogate_con = []
for shuffle_i, surrogate in enumerate(poststim_surrogates, 1):
print(f"Computing connectivity for shuffle {shuffle_i} of {n_shuffles}")
bad_surrogate_con.append(
spectral_connectivity_epochs(
surrogate, method="imcoh", indices=indices, verbose=False
)
)

########################################################################################
# Plotting the post-stimulus connectivity against the estimates from the non-evoked and
# evoked surrogate data, we see that the evoked surrogate data greatly overestimates the
# baseline connectivity in the alpha band.
#
# Although in this case the alpha connectivity was still far above the baseline from the
# evoked surrogates, this will not always be the case, and you can see how this risks
# false negative assessments that connectivity is not significantly different from
# baseline.

# %%

# Plot post-stimulus vs. evoked and non-evoked surrogate connectivity
fig, ax = plt.subplots(1, 1)
ax.plot(
freqs,
np.abs([surrogate.get_data() for surrogate in surrogate_con]).mean(axis=(0, 1)),
linestyle="--",
label="Surrogate (pre-stimulus)",
)
ax.plot(
freqs,
np.abs([surrogate.get_data() for surrogate in bad_surrogate_con]).mean(axis=(0, 1)),
color="C3",
linestyle="--",
label="Surrogate (post-stimulus)",
)
ax.plot(
freqs, np.abs(poststim_con.get_data()).mean(axis=0), color="C1", label="Original"
)
ax.set_xlabel("Frequency (Hz)")
ax.set_ylabel("Connectivity (A.U.)")
ax.set_title("All-to-all connectivity | Post-stimulus")
ax.legend()

########################################################################################
# Assessing connectivity on a group-level
# ---------------------------------------
#
# While our focus here has been on assessing the significance of connectivity on a
# single recording-level, we may also want to determine whether group-level connectivity
# estimates are significantly different from baseline. For this, we can generate
# surrogates and estimate connectivity alongside the original signals for each piece of
# data.
#
# There are multiple ways to assess the statistical significance. For example, we can
# compute p-values for each piece of data using the approach above and combine them for
# the nested data (e.g., across recordings, subjects, etc...) using Stouffer's method
# :footcite:`DowdingHaufe2018`.
#
# Alternatively, we could take the average of the surrogate connectivity estimates
# across all shuffles for each piece of data and compare them to the original
# connectivity estimates in a paired test. The :mod:`scipy.stats` and :mod:`mne.stats`
# modules have many such tools for testing this, e.g., :func:`scipy.stats.ttest_1samp`,
# :func:`mne.stats.permutation_t_test`, etc...
#
# Altogether, surrogate connectivity estimates are a powerful tool for assessing the
# significance of connectivity estimates, both on a single recording- and group-level.

########################################################################################
# References
# ----------
# .. footbibliography::
Loading