slides.Rmd

---
title: "Analyzing single trial EEG data using the `hu-neuro-pipeline` package"
author: "Alexander Enge & Kirsten Stark"
institute: "Neuro Lab @ Humboldt-Universität zu Berlin"
date: 2023-12-13
classoption: "t"
bibliography: "misc/references.bib"
csl: "misc/template/apa.csl"
output:
  beamer_presentation:
    includes:
      in_header: "misc/template/hu_template.tex"
---

## The @fromer2018 pipeline

```{r, echo=FALSE, message=FALSE}
scale_colour_discrete <- function(...) scale_color_brewer(..., palette = "Set1")
scale_fill_discrete <- function(...) scale_fill_brewer(..., palette = "Set1")

figure <- function(path, ...) {
  require(here)
  if (file.exists(here(path))) knitr::include_graphics(here(path), ...)
}
```

```{r, echo=FALSE, message=FALSE, fig.align="center", out.width="100%"}
figure("figures/fromer.png")
```

## The @fromer2018 pipeline

- Allows single trial analysis of ERP amplitudes

  - Random effects for items [@burki2018]

  - Trial and item level covariates [@volpert-esmond2021]

  - Continuous predictor variables

  - Unbalanced designs

  - Brain--behavior associations

## Python implementation

```{r, echo=FALSE, fig.align="center", message=FALSE, out.width="100%"}
figure("figures/love.png")
```

## Python, I choose you!

```{r, echo=FALSE, fig.align="center", message=FALSE, out.width="50%"}
figure("figures/pokemon.jpg")
```

\tiny

**Blog post**: https://dominiquemakowski.github.io/post/2020-05-22-r_or_python \
**Online course**: https://swcarpentry.github.io/python-novice-inflammation

## MNE-Python

- Versatile

  - EEG, MEG, ECoG, fNIRS

  - Preprocessing, statistics, time-frequency analysis, visualization, machine learning, connectivity, source localization, ...

- Open source

  - \> 350 contributors on GitHub (December 2023)

  - Funded by NIH, NSF, ERC, Google, Amazon, ...

  - Code review, automated tests, user forum, office hours, ...

\bigskip

```{r, echo=FALSE, fig.align="center", message=FALSE, out.width="20%"}
figure("figures/mne.png")
```

## Python implementation

- No MATLAB required

- No Python skills required -- can be called from R

- New features:

  - Time-frequency analysis

  - Fully automatic ocular correction (ICA)

  - Automatic bad channel detection

  - Automatic missing trial detection

  - Example datasets

- Code standards, documentation, version control \tiny (https://github.com/alexenge/hu-neuro-pipeline/)

\bigskip

```{r, echo=FALSE, fig.align="right", message=FALSE, out.width="30%"}
figure("slides/figures/github_pypi.png")
```

## Python implementation

\vspace{-0.7cm}

```{r, echo=FALSE, fig.align="center", message=FALSE, out.width="40%"}
figure("figures/flowchart.png")
```

## Python implementation

\vspace{-1.4cm}

```{r, echo=FALSE, fig.align="center", message=FALSE, out.width="100%"}
figure("figures/docs.png")
```

\vspace{-0.6cm}

\tiny

Documentation at <https://hu-neuro-pipeline.readthedocs.io>

## Installation

For Python users:

```{bash, eval=FALSE}
# Install via the command line from the Python Packaging Index (PyPI)
pip install hu-neuro-pipeline
```

For R users:

```{r, eval=FALSE}
# Install reticulate for interfacing with Python from R
install.packages("reticulate")

# Install Python (Miniconda distribution)
reticulate::install_miniconda()

# Install the actual package from PyPI
reticulate::py_install("hu-neuro-pipeline", pip = TRUE, python_version = "3.8")
```

\bigskip

```{r, echo=FALSE, fig.align="right", out.width="15%"}
figure("figures/reticulate.png")
```

## General usage

```{r, eval=FALSE}
# Import the Python package
pipeline <- reticulate::import("pipeline")

# Run the pipeline
res <- pipeline$group_pipeline(...)
```

## Minimal example

```{r, eval=FALSE}
ucap_paths <- pipeline$datasets$get_ucap(participants = 2, path = "data")
```

\smallskip

```{r, echo=FALSE, fig.align="center", out.width="60%"}
figure("figures/ucap.png")
```

\tiny

For details, see @fromer2018

## Minimal example

```{r, eval=FALSE, results="hide"}
# Run the pipeline
res <- pipeline$group_pipeline(
  # Input/output paths
  raw_files = "data/ucap/raw",
  log_files = "data/ucap/log",
  output_dir = "output",
  # Preprocessing options
  besa_files = "data/ucap/cali",
  # Epoching options
  triggers = c(201:208, 211:218),
  components = list(
    "name" = list("N2", "P3b"),
    "tmin" = list(0.25, 0.4),
    "tmax" = list(0.35, 0.55),
    "roi" = list(
      c("FC1", "FC2", "C1", "C2", "Cz"),
      c("CP3", "CP1", "CPz", "CP2", "CP4", "P3", "Pz", "P4", "PO3", "POz", "PO4")
    )
  ),
  # Averaging options
  average_by = list(
    blurr_left = "n_b == 'blurr' and DeviantPosRL == 'li' and RT > 200",
    blurr_right = "n_b == 'blurr' and DeviantPosRL == 're' and RT > 200",
    normal_right = "n_b == 'normal' and DeviantPosRL == 're' and RT > 200",
    normal_left = "n_b == 'normal' and DeviantPosRL == 're' and RT > 200"
  )
)
```

## Pipeline inputs

```{r, eval=FALSE}
# Input/output paths
raw_files = "data/raw",
log_files = "data/log",
output_dir = "output",
```

- Directory or list of raw EEG files (`.vhdr`)

- Directory or list of behavioral log files (`.txt`/`.tsv`/`.csv`)

- Output directory

## Pipeline inputs

```{r, eval=FALSE}
# Preprocessing options
besa_files = "data/cali",
```

- Directory path or list of BESA files (`.matrix`)

- Default bandpass filter (0.1--40 Hz)

- Default re-referencing (common average)

## Pipeline inputs

```{r, eval=FALSE}
# Epoching options
triggers = c(201:208, 211:218),
components = list(
  "name" = list("N2", "P3b"),
  "tmin" = list(0.25, 0.4),
  "tmax" = list(0.35, 0.55),
  "roi" = list(
    c("FC1", "FC2", "C1", "C2", "Cz"),
    c("CP3", "CP1", "CPz", "CP2", "CP4", "P3", "Pz", "P4", "PO3", "POz", "PO4")
  )
),
```

- List of numerical EEG triggers

- List of ERP component definitions:

    - `name`: Custom label for each component

    - `tmin` + `tmax`: Onset and offset times (in s)

    - `roi`: List of channel names for each component

## Pipeline inputs

```{r, eval=FALSE}
# Epoching options
triggers = c(201:208, 211:218),
components = list(
  "name" = list("N2", "P3b"),
  "tmin" = list(0.25, 0.4),
  "tmax" = list(0.35, 0.55),
  "roi" = list(
    c("FC1", "FC2", "C1", "C2", "Cz"),
    c("CP3", "CP1", "CPz", "CP2", "CP4", "P3", "Pz", "P4", "PO3", "POz", "PO4")
  )
),
```

- Default baseline correction (-0.2 -- 0.0 s)

- Default rejection of bad epochs (peak-to-peak ampl. > 200 µV)


## Pipeline inputs

```{r, eval=FALSE}
# Averaging options
average_by = list(
  blurr_left = "n_b == 'blurr' and DeviantPosRL == 'li' and RT > 200",
  blurr_right = "n_b == 'blurr' and DeviantPosRL == 're' and RT > 200",
  normal_right = "n_b == 'normal' and DeviantPosRL == 're' and RT > 200",
  normal_left = "n_b == 'normal' and DeviantPosRL == 're' and RT > 200"
)
```

- List with all (combinations of) conditions to create by-participant averages for:

    - List names are custom labels for each average

    - List values are query strings to select log file rows (trials/epochs)

## More pipeline inputs

- Downsampling (`downsample_sfreq`)

- Interpolate bad channels (`bad_channels`)

- Frequency filter (`highpass_freq`, `lowpass_freq`)

- Epoch duration (`epochs_tmin`, `epochs_tmax`)

- Baseline duration (`baseline`)

- Skip log file rows (`skip_log_rows`, `skip_log_conditions`)

- Threshold for artifact rejection (`reject_peak_to_peak`)

- ...

\tiny

See <https://hu-neuro-pipeline.readthedocs.io/en/latest/usage_inputs.html>

## Pipeline outputs

Extract directly from the pipeline run:

```{r, eval=FALSE}
trials <- res[[1]]   # Single trial data frame
evokeds <- res[[2]]  # Evokeds data frame
config <- res[[3]]   # List of pipeline options
```

Or read from the output directory:

```{r, message=FALSE, warning=FALSE}
library(tidyverse)
trials <- read_csv("output/trials.csv")
evokeds <- read_csv("output/ave.csv")
config <- jsonlite::read_json("output/config.json")
```

\tiny

See <https://hu-neuro-pipeline.readthedocs.io/en/latest/usage_outputs.html>

## Pipeline outputs

```{r, message=FALSE, warning=FALSE}
# Single trial data frame
print(trials)
```

## Pipeline outputs

```{r, results="hold", out.width="50%"}
# Single trial N2 mean amplitudes
ggplot(trials, aes(x = N2)) +
  geom_density() +
  theme_classic(base_size = 30)
```

## Pipeline outputs

```{r}
# Linear mixed-effects model
form <- N2 ~ n_b * DeviantPosRL + (1 | participant_id)
mod <- lme4::lmer(form, trials)
summary(mod)
```

## Pipeline outputs

```{r, eval=TRUE, out.width="50%", message=FALSE, warning=FALSE}
# Single trial N2 mean amplitudes by condition
ggplot(trials, aes(x = DeviantPosRL, y = N2, color = n_b, group = n_b)) +
  geom_point(position = position_jitterdodge(0.3), alpha = 0.1) +
  stat_summary(
    geom = "line",
    linewidth = 2.0,
    position = position_dodge(0.75)
  ) +
  theme_classic(base_size = 30)
```

## Pipeline outputs

```{r}
# Evokeds by participant and condition
print(evokeds)
```

## Pipeline outputs

```{r, eval=FALSE, out.width="50%", message=FALSE, warning=FALSE}
# Time course plot with within-participant standard errors
evokeds |>
  separate_wider_delim(label, delim = "_", names = c("n_b", "DeviantPosRL")) |>
  Rmisc::summarySEwithin(
    measurevar = "N2",
    withinvars = c("time", "n_b", "DeviantPosRL"),
    idvar = "participant_id"
  ) |>
  mutate(time = as.numeric(levels(time))[time]) |>
  ggplot(aes(
    x = time,
    y = N2,
    ymin = N2 - se,
    ymax = N2 + se,
    color = n_b,
    fill = n_b
  )) +
  facet_wrap(~DeviantPosRL) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  geom_vline(xintercept = 0, linetype = "dashed") +
  geom_line(linewidth = 1) +
  geom_ribbon(color = NA, alpha = 0.2) +
  coord_cartesian(xlim = c(-0.2, 0.8)) +
  theme_classic(base_size = 20)
```

## Pipeline outputs

```{r, echo=FALSE, message=FALSE, warning=FALSE, fig.width=10, fig.height=5}
# Evokeds by participant/condition (core repeated for creating the plot)
evokeds |>
  separate_wider_delim(label, delim = "_", names = c("n_b", "DeviantPosRL")) |>
  Rmisc::summarySEwithin(
    measurevar = "N2",
    withinvars = c("time", "n_b", "DeviantPosRL"),
    idvar = "participant_id"
  ) |>
  mutate(time = as.numeric(levels(time))[time]) |>
  ggplot(aes(
    x = time,
    y = N2,
    ymin = N2 - se,
    ymax = N2 + se,
    color = n_b,
    fill = n_b
  )) +
  facet_wrap(~DeviantPosRL) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  geom_vline(xintercept = 0, linetype = "dashed") +
  geom_line(linewidth = 1) +
  geom_ribbon(color = NA, alpha = 0.2) +
  coord_cartesian(xlim = c(-0.2, 0.8)) +
  theme_classic(base_size = 20)
```

## Pipeline outputs

```{r}
# List of pipeline options
names(config)
```

```{r, results="hold"}
# Number of rejected epochs per participant
lengths(config$auto_rejected_epochs)
```

## More pipeline outputs

- Cleaned continuous data (`clean_dir`)

- Epoched data (`epochs_dir`)

- Automated QC reports (`reports_dir`)

## QC reports

\vspace{-0.7cm}

```{r, echo=FALSE, fig.align="center", message=FALSE, out.width="90%"}
figure("figures/report.png")
```

## Cluster-based permutation tests

```{r, eval=FALSE}
# Permutation test input
perm_contrasts = list(
  c("blurr_left", "normal_left"),
  c("blurr_right", "normal_right")
)
```

```{r, message=FALSE, warning=FALSE}
# Permutation test output
clusters <- read_csv("output/clusters.csv") # or clusters <- res[[4]]
print(na.omit(clusters))
```

## Artifact correction

- **Multiple source eye correction (MSEC)**

  - Requires `.matrix` files from BESA

## Artifact correction

- **Independent component analysis (ICA)**

  - Different algorithms available \
  (e.g., `ica_method = "fastica"`)
  
  - Can specify initial number of principal components with `ica_n_components`
  
  - Automatic detection + exclusion of eye movement components based on correlation with HEOG and VEOG \tiny (see <https://mne.tools/stable/generated/mne.preprocessing.ICA.html#mne.preprocessing.ICA.find_bads_eog>) \normalsize

  - Verify in QC reports

  - Other selection methods (manual selection, [ICLabel](<https://labeling.ucsd.edu/tutorial/about>)) not yet implemented

## Artifact correction

```{r, echo=FALSE, fig.align="center", message=FALSE, out.width="80%"}
figure("figures/oc_comparison_sd.png")
```

## Artifact correction

```{r, echo=FALSE, fig.align="center", message=FALSE, out.width="80%"}
figure("figures/oc_comparison_sme.png")
```

## Artifact correction

- Coming soon: **Residue iteration decomposition (RIDE)**

  - For correcting speech artifacts

  - Based on iterative separation of stimulus- and response-related ERP components [@ouyang2011; @ouyang2015; @ouyang2016]

  - Subtract response-related component from single trials using their voice onset times

```{r, echo=FALSE, fig.align="center", message=FALSE, out.width="80%"}
figure("figures/ride.png")
```

\tiny

See <https://github.com/kirstenstark/eeg-ride>

## Artifact rejection

- Per-channel peak-to-peak amplitude threshold via `reject_peak_to_peak` (default: `200.0`)
  
- In addition to or instead of BESA or ICA

## Repairing bad channels

- Pass participant-specific vectors of bad channel labels

  - E.g., `bad_channels = list("05" = c("C3", "P7"), ...)`

- Uses spherical spline interpolation

- Experimental: Automatic bad channel detection (`bad_channels = "auto"`)

  - Based on channel $SD$s across epochs

\vspace{0.4cm}

```{r, echo=FALSE, fig.align="right", out.width="30%"}
figure("figures/automate.jpg")
```

## Detecting missing epochs

- Requires log file column (`triggers_column`) with the EEG trigger for every trial

- Pipeline magically detects and deletes log file trials with missing EEG

\bigskip

```{r, echo=FALSE, fig.align="right", out.width="30%"}
figure("figures/magic.jpg")
```

## Time-frequency analysis

```{r, echo=FALSE, fig.align="center", out.width="80%"}
figure("figures/morlet.png")
```

\tiny

See <https://github.com/alexenge/tfr-workshop>

## Time-frequency analysis

```{r, eval=FALSE}
# Time-frequency analysis options
perform_tfr = TRUE,
tfr_components = list(
  "name" = list("alpha"),
  "tmin" = list(0.0), "tmax" = list(0.2),
  "fmin" = list(8.0), "fmax" = list(14.0),
  "roi" = list(c("PO9", "PO7", "PO3", "POz", "PO4", "PO8", "PO10", "O1", "Oz", "O2"))
)
```

- `tfr_components` extracts single trial power values

- Additional options:

  - Morlet frequencies (`tfr_freqs`, default 4, 5, 6, ..., 40 Hz)
  
  - Morlet no. of cycles (`tfr_cycles`, default 2, 2.5, 3, ..., 20)
  
  - Baseline window (`tfr_baseline`, default -450 ms to -50 ms)
  
  - Baseline method (`tfr_method`, default percent signal change)

## Time-frequency analysis

```{r, echo=FALSE, fig.align="center", out.width="95%"}
figure("figures/tfr.png")
```

\tiny

See @enge2023a

## Example datasets

:::::::::::::: {.columns}

::: {.column width="62%"}
```{r, echo=FALSE, fig.align="center", out.width="95%"}
figure("figures/erp_core.png")
```

\bigskip

\tiny

See @kappenman2021
:::

::: {.column width="38%"}
```{r, eval=FALSE}
# Download example data from UCAP
pipeline$datasets$get_ucap(
  participants = 10,
  path = "data"
)

# Download example data from ERP CORE
pipeline$datasets$get_erpcore("N170")
pipeline$datasets$get_erpcore("MMN")
pipeline$datasets$get_erpcore("N2pc")
pipeline$datasets$get_erpcore("N400")
pipeline$datasets$get_erpcore("P3")
pipeline$datasets$get_erpcore("ERN")
```
:::

::::::::::::::

## Plans

- Enhance documentation (examples, boilerplate, preprint)

- Unit tests

- Mixed models with `pymer4` or `bambi`

- Better permutation tests [@frossard2021; @frossard2022]

- BIDS interface

- Your ideas + contributions?

\tiny

See <https://github.com/alexenge/hu-neuro-pipeline/issues>

## Learning/teaching EEG analysis

\vspace{-0.6cm}

```{r, echo=FALSE, fig.align="center", message=FALSE, out.width="100%"}
figure("figures/intro_to_eeg.png")
```

\vspace{-0.6cm}

\tiny

See <https://alexenge.github.io/intro-to-eeg>

## Thanks

## References

\tiny