- GLM to get beta for each voxel x condition
- e.g., OLS fitting
- vector of values for each voxel, for each condition
- some point in n-dim space (sum of squares of components)
- Difference = sum of square differences between points in voxel space
- Distance of points from origin = univariate effect, difference from baseline
- Could just subtract the mean from each vector, and move the voxels to plane
- Some voxels might be relatively less active compared to others
- z transform --> divisive normalization of voxel variance
- project to circle?
- 1 - cos(angle between A and B) = distance
- baseline shift can change angle between 2 points, but not Euclidean distance
- try subtracting out mean pattern (but makes patterns anticorrelated!)
- change the pattern scaling (make B slightly stronger)
- changes Euclidean distance, but not correlation
- split-half reliability analysis
- similar reliability between 2 metrics
- Some voxels are noiser than others
- Could weight voxels by noisier (e.g., using residuals from epsilon matrix) ~ t-stat
- Voxel by voxel covariance matrix, variance on matrix, weight
$\beta$ =$\beta$ /$\sigma_{voxel}$ - Could normalize by whole covariance
$\Sigma$ matrix --> huge increase in reliability- u = beta *
$\Sigma^{1/2}$
- u = beta *
-
Noise drives distances further apart (although ranks of distances are preserved)
-
Multiple runs in the experiment
-
Each estimated voxel pattern = true pattern + noise
- run specific noise, that is independent between runs
-
estimated (u_a - u_b) from run 1 * (u_a - u_b)^T from run 2
- since noise is independent between runs, can call CV distance the true distance
-
average across folds if can have more folds (e.g., leave 1 out)
- 1/4(d_1 (d2+d2 + d4)/3^T)
-
CV Mahalanobis distance has better split-half reliability than multivariate noise normalization when SNR is low
-
Multivariate noise normalization > CV MD > distance!
-
For each cell in RDM, random effects t-test (vs. 0) across subjects, then FDR correct, threshold
-
Fixed effects within subject, can get Linear discriminate t-value, by normalizing the distance by standard error
- ratio between values is no longer preserved, because SD might differ between diff contrasts
- Instead of RDM, do pairwise classification accuracies, and insert into matrix
- But pattern classifiers are quantized (broken into bins)
- If doing binary classification, then fine, but not for investigating representation
- 3 main steps:
- compute/visualize RDMs from different brain regions
- compare brain and model RDMs
- statistical inference
- vectorized RDM is way to convert symmetric RDM to vector (N(N-1))/2
- squareRDMs() to convert back to full RDM
- Naming convention for RDMs:
- structure = RDM
- field: RDM = dissimilarity matrix
- field: name = 'hIT | BE | Session: 1' (separate info w/vertical bars)
- field: color = [0 0 1]
- averageRDMs_subjectSession(RDMs, 'session'), can avg over name id
- structure = RDM
-
MultiDimensional Scaling (MDS)
- show each condition with plot, and specify color of the plot
- project data into 2D space, and visualize on 2D circle
- Shepard plot: shows similarity vs. disparity
- Pearson & Spearman rank correlations
MDSConditions(cell name (has name, color, RDM))
- Dendrogram: avg RDMs across subjects, and can visualize categorial stucture
- hierarchical clustering (exploratory!)
- plots smalled colored dots that match up with MDS plot
dendrogramConditions()
-
Comparing RDMs (brain vs. model)
- Create RDM that is dissimilarity between separate RDMs
- How to quantify correlation
- Between values: Pearson, rank correlation (e.g., Spearman, Kendall's tau a)
- Between correlation matrices: Kendall's
$\tau$ a is the correlation measure for 2nd order correlation matrix- Pearson assumes linear, but should use rank correlation, e.g., how similar are the ranks
- Spearman correlation = correlation of the ranks (rank dissimilarities, and then compute pearson corr)
- doesn't penalize
- Kendall's tau a gives more bonus to the more specific predictions
- within RDM get 2 dissimilarities, see if one is larger than the other, repeat for the other matrix
- If the 2 RDMs match, then give a rank of 1, otherwise 0. Then, get proportion of 1s for all pairs of dissimilarities for each cell
- allow for detailed predictions (.7 in 1 condition, .8 in the other)
- Correlation between RDMs:
pairwiseCorrelateRDMs(cell array {models})
- MDS for RDM-RDM similarities:
MDSRDMs()
- 'subjectRFXsignedRank'
-
plotpValues = '='
or'*'
(write pval, or * for significance) - Height of bars is average correlation across subjects
- Threshold for multiple comparisons
- FDR, or Bonferroni (N; 2)
- When comparing 2 RDMs, and not enough evidence for base RDM, could do bootstrapping
userOptions.candRDMdifferencesTest = 'conditionRFXbootstrap'
- Noise ceiling: Compute threshold (gray) for maximum correlations for this dataset
- Upper bound
- best you can do is at the mean of all the subjects (group mean of average RDMs is the best)
- for RDMs rarely use Euclidean distance, but rather pearson correlation (or spearman, etc.)
- but 1-r =
$\sqrt{2}$ * Euclidean distance - for spearman, do the same thing as pearson, but rank transform first
- Kendall tau a --> gradient descent (maximum correlation)
- but 1-r =
- Lower bound
- Leave 1-subject out, average RDMs for n-1, and repeat for all folds, then average
- Underfit (so lower bound on ceiling), but still pretty good
- If model is between lower and upper bound, doing pretty well.
- But, what if data noisy?
- Might be useful to report the value of the ceiling, since can't really do much better, given the noise.
- Upper bound statistic:
stats_p_r.ceiling(2)
- Upper bound
DEMO1_RSA_ROI_sim.m
- Edit
betaCorrespondence.m
- Might want to use t statistics, rather than betas
- Example:
- finger representations in M1
- thumb, index... pinky
- patterns are unique(ish) for each finger, but within subject
- pattern is different in subject 2 (r between subjects is low)
- but relationship in peaks is somewhat consistent across subjs
- finger representations in M1
$(u^1-u^1)(u^2-u^2)^T$ - 1 ROI, 2 hemispheres, 2 hands, 6 subjects, 10 conditions, 8 runs
- Estimate betas, residuals, and multivariate noise normalization
-
rsa_noiseNormalizeBeta(Y,SPM)
, where Y: timeseries after preprocessing- Estimate beta coefficients and residuals from preproc time series
- ordinary least squares estimate of beta_hat = inv(X'*X)*X'*Y
- residuals: res = Y - X*beta
- Run-wise multivariate noise normalization
- For each run, estimate the residual variance for each voxel (Sw_hat)
covdiag(res(indices_run_i,:))
- ~100 time points per 1050 voxels; if just cov, then rank deficient, and need to be able to invert
- this applies shrinkage estimation, so covariance matrix of resids (voxel to voxel), and optimal estimation of shrinkage
- Mutlivariate noise normalization
beta_hat(indices_run_i,:)*Sw_hat(:,:,run_i)^(-1/2)
- For each run, estimate the residual variance for each voxel (Sw_hat)
- Estimate beta coefficients and residuals from preproc time series
- Calculate the CV squared Euclidian distances
- Mahalanobis distance (LDC)
- Compute pairwise distances
- Leave 1-run out, and cross validate
- Take inverse of X, and apply to partition (10xnumber_voxels)
A(:,:,i) = pinv(Xa)*Ya; B = pinv(Xb)*Yb;
- Compute the difference between these in each partition
-
d(i,:) = sum((C*A(:,:,i)).*(C*B),2)';
(1x45 vector --> 10 x 10 matrix)
-
- Take inverse of X, and apply to partition (10xnumber_voxels)
- Take average across CV partitions
d = sum(d)./numPart;
- Show dissimilarity matrix, rank-transformed, scaled (0,1)
- if rank transformed, then highlights small differences; analyses should be done on raw values
- t-value that tells us how likely patterns are truly dissimilar
- Normalize contrast by
$SE_{LDC}$ - Take diagonal of covariance matrix of noise (variance), and multiply by training w (u_A-u_B)
A(:,:,i) = pinv(Xa)*Ya; B = pinv(Xb)*Yb; w=(C*A(:,:,i)); d(i,:) = sum(w.*(C*B),2)'; BCovb=SPM.xX.Bcov(partition==part(i),partition==part(i)); sigma=sum(w*Sw(:,:,i).*w,2); se = sqrt(sigma.*diag(C*BCovb*C'))'; d(i,:)=d(i,:)./se;
- Can easily threshold matrix w/p-value for visualization purposes
(Jorn Diedrichsen, UCL)
- How are different features of a stimulus (e.g., color, shape) observed in neural activity?
- How many voxels are activated by a given feature?
- Approach:
- weighting of different features
- for instance, red might not be coded for, but blue a lot
-
$/beta$ matrix: k conditions X p voxels; each row, u, is a pattern of voxels - Compute distances (d) between patterns, u = inner product of
$u_i-u_j$ - u_{i}u_i^T _ u_{i}u_{i}^T - 2u_{i}u_{j}^T
- Pattern "covariance": G = U{i}U{j}^T, related to pattern distance: D = G + G - G - G
- Can go from covariance/inner product matrix to the Mahalanobis-distance matrix, just need the origin
- Make sure sum of rows/columns = 0
- If you can prove something for covariance matrix, applies to distance matrix too
- Each pattern (
$u$ ) is a sum of different features ($f$ ) times a feature pattern ($w_h$ )- As feature weighting changes, how does RDM change?
-
$G = UU^T$ =$\Sigma f_hw_h \Sigma w_h^Tf_h^T$ - if feature patterns are independent, then
$w_i w_j^T = 0$ - =
$\Sigma f_h W_h W_h^Tf_h^T$ , since cross sums cancel, if weight (w) inner product = 0- strength with which feature is represented
- how stimuli differ from eachother (f_h f_h^T)
- G =
$\Sigma \omega_h G_h$ ($\omega = w_h w_h^T$ ) - Distance matrix (d) =
$\Sigma \omega_h D_h$
- if feature patterns are independent, then
- How well does a neural area represent part of the model, rather than estimating weight for each feature
- could group features into representational components
- sometimes we want to think of the features as dependent, loading onto a "component"
- beta matrix = sum of F_h (k conditions x q features) * w_h (q features x p voxels)
- Example:
- 5 different stimuli (conditions 1-5, linearly increasing brightness) = F_h
- Features correlation, V_h = 1
- Covariance, G_h - F_hV_hF
- Distance component = D = G_i+G_j-2G_{ij}
- Example 2:
- Faces, places, different features
- Correlation matrix = [1 0; 0 1]
- Covariance: faces related to each other, not places, vice versa
- Distance matrix = reverse of covariance matrix
- Features don't really matter, its the component matrices (covariance & distance) that you get out
- The exact values of features doesn't matter too much, many different weightings of feature correlations give rise to same component matrices
- ** Estimation**
- D =
$\Sigma \omega_h D_h$ - linear regression!
- Vectorize distances, then build component matrix X [d_1 d_2]
$\omega = (X^T X)^{-1} X^T d$
- D =
- Integrated vs. independent encoding of features?
- How to test?
- Vary both factorially in design, and test for interaction
- Where is factor A encoded? where is B encoded?
- How to test?
- Example:
- A: grasp, pinch, grip; B: see, do
- Start with features:
- Factor A: 6 x 3 ([1 0 0; 0 1 0; 0 0 1; 1 0 0; 0 1 0; 0 0 1])
- Factor B: 6 x 2 ([1 0; 1 0; 1 0 ...])
- Interaction: A * B
- Representational components
- component matrix for each feature
- Determine weights for each component
- string representational components together into component design matrix (X)
- Get data
- Y --> betas --> pattern estimates --> D (RDM)
- Bring representational components and D together w/linear regression
$\omega = (X^T X)^{-1} X^T vec(\hat{D})$
-
$(X^T X)^{-1} X^T$ yields interesting matrices- Similar to cross "decoding": train classifier on see-grasp, test on do-pinch/grasp
- Get component weights for each factor (simulate over many iterations)
- if weights around 0, then doesn't code for that factor.
- Models for BOLD signals adding
- patterns engage independent neural subpopulations
- combine linearly to determine firing rate
- relationship between neural activity and BOLD is approximately linear
- NOT true! Log
- BUT, if within the range of a normal task, then work around that range (small % signal change), then its small variation around signal, and linear approximation of nonlinear function is fine
-
Coding of action sequences (Yokoi et al., in prep)
- Set up
- sequences consisting of chunks
- where are sequences encoded in motor areas?
- what is the nature of chunks?
- finger presses, chunks, 2nd level chunks, entire sequence?
- From features, create representational component matrices
- feature 1: finger transitions for each sequence --> covariances --> distances
- feature 2: chunks (8), sequence # (11) x 8 chunks --> covariances --> distances
- feature 3: 2nd order chunks ...
- feature 4: sequences
- Note: Component matrices are not orthogonal, so need to worry about that, but try to maximize non-orthoginality in design
- OLS
- assume observations are independent (I)
- equal variance (I)
- normally distributed (D)
- If this is violated, then have an unbiased estimator, but not the best linear unbiased estimator (BLUE)
- not the one with lowest variance
- If distances are non-zero, variance/covariance go up. If add noise, variance/covariance goes up
- variance(
$\hat{d}$ ) = true variance of$dd^t$ + distance dependent variance * distance ($\delta \delta^T$ ) + constant dependent on noise - related to how you do cross validation
- usually maximize & exhaust CV
- If low SNR, independent noise dominates; otherwise other term dominates
- variance(
- Can do iterated reweighted least squares (IRLS; since variances are dep on true probabilities)
- start with some guess of component weights (
$\omega$ ) - predict distances:
$\hat{d} = X\omega$ - calculate variance-covariance of d
- use this in estimation, OLS, get best estimate
- repeat til convergence2
- moderate improvements in SD, but depends on your model how much it'll improve estimates over OLS
- start with some guess of component weights (
- Set up
- Sometimes model components are non-linear functions of parameters
- Example:
- area might have narrow tuning curve, or wider (tuning widths change)
- can express as feature vectors
- change between distances and feature matrices are nonlinearly related
- How to solve:
- could have linear approximation (model 1st derivatives), like AR-estimation in 1st level SPMs
- estimate nonlinear parameters to optimize log-likelihood, e.g., log p(d|$\theta$)
- when to search for better model (not at noise ceiling yet), or better data (in noise ceiling)
- Correlation distance proportional to squared Euclidean distance (for normalized patterns)
- Comparing RDMs
- use correlations, not distances, mostly since want a bigger bar for better model
- Norm pattern vectors, so length = 1, with angle
$\alpha$ - d is Euclidean distance between them; cos(
$\alpha$ ) = r, distance is 1-r - d^2 = 2(1-r)
- d is Euclidean distance between them; cos(
- Max = group mean RDM
- Min = leave one out RDMs
- Important when you have categorical models that predict tied dissimilarities (e.g., if predict 1 for within and 0 for between categories)
- Have values for x and y
- compute 2 difference matrices: a (differences between all points on x), b (diff between all points on y)
- coeff = normalized inner product of a and b (just Pearson)
- Kendall -> signed differences (concordant or discordant pairs, so 1 if both go up/down, and 0 if one does up, one goes down)
- Spearman --> ranked differences
- Using pearson or spearman, true model can lose to simplified step model, but using Kendall
$\tau_a$ , true model can't lose
- Might not know how brain-activity measurements mix and weight representational dimensions
- Local averaging w/voxels in fMRI
- sparse, biased neuronal sample in cell recordings
-
Toolboxes:
- rsatoolbox.v.2a
- DataFrame
- toolboxes from Jorn's lab for graphing
-
MANOVA design
- Define how conditions related to the features
- Make representational model matrices
rsa_modelRDMsMANOVA(F_A, F_B)
- Visualize pseudo-inverse matrices for visualization purposes (what's going into regression)
- Get prewhitened data from ROIs
- get $\beta$s:
rsa_noiseNormalizeBetas()
- get CV distances:
rsa_ldc()
- NOTE! Distances should be LDC. This might not hold for correlations, LD-t, etc.
- get $\beta$s:
- Plot for each subject for visualization
- don't want to do rank transform
- Can apply nonlinear function to data before plotting:
rsa_figureRDMs('transformFcn', 'fnc_name')
- e.g., 'ssqrt' -> sign preserving sqrt
- Plot average RDMs across subjects
rsa_flattenRDMs(RDMs)
- R-like function to average over subjects:
T=tapply(Data,{'regType','regSide'},{'RDM'})
- Put into RDM format:
avrgRDMs=rsa_foldRDMs(T);
- Plot:
rsa_figureRDMs(avrgRDMs,'rankTransform',0,'transformFcn','ssqrt');
- Estimate component weights w/normal OLS regression
- Basically multiply each RDM by the 3 M RDM models
-
rsa_fitRepModelRegress(M,RDMs,'method','OLS')
- Arguments: Model, and RDMs (subject x condition x ROI),
- Returns: omegas, 1 per Model (e.g., 448 x 3)
- Flatten Model if necessary (# models X 36)
- Y = (448 X 36)
- OLS
omega = Y*pinv(X);
-
Hierarchical design/Linear model
- Making models from features
- Features:
- Rows = sequences, and columns = features from sequences
- Each sequence might be represented
- e.g., I-matrix (size 8, or # of conditions)
- Each chunk might be represented
- sequence #1 has chunks 2, 3, and shares chunk with 5
rsa_features2ModelRDMs()
- Features:
- Load in data in RDMs
- Calculate variance-covariance of distances
- given from
rsa_LDCsigma
; just save Sig into data structure
- given from
- Use variance-covariance in estimation of
$\omega$ estimates - Repeat update of
$\omega$ through convergence
- Making models from features
- Where in the brain is something represented?
- for each voxel in the brain, and a sphere of voxels around it, and calculate RDM
- Issues
- If interested in one region, sphere around center voxel might go into other areas of non-interest
- Solutions
- If cortical areas of regions, you could use anatomically guided surface searchlight (surface-based)
- Could draw a mask for anatomical ROI and constrain searchlight to ROI
- Have spheres that cut off at boundaries (e.g., just within cerebellum)
- could specify a minimum number of voxels to include
- If want to look at cortex and other areas, can force it so that only one region can claim a voxel
- Example:
- Location of representations of skilled finger movements
- Generate searchlight indices
rsa_defineSearchlight({M},[],'sphere',[40 160]);
- M = functional mask, [] means just the functional mask, look everywhere
- at center node, give all voxels within 40 mm, but at least 160 voxels
- to get fixed radius, just [40]
- Returns: structure with fields LI, voxels, voxmin, etc.
- every row is one searchlight (linear indices to center voxels of searchlight)
- 80613 total in tutorial
- L.LI(1) --> 160 voxels
- L.structure --> all 1s, since only looking at 1 region
- Run searchlight
- Define contrast vector for RDM:
D = load(fullfile(Opt.rootPath,Opt.spmDir,'SPM_info.mat'));
Opt.conditionVec = (D.hand == 1 & D.stimType==0) .* D.digit;
- Which trials are trials that below to right hand, active movements, code by digit (5 conditions)
- Actually run the searchlight:
rsa_runSearchlightLDC(L,Opt)
- Returns:
- EPI size x 10 (5 x 5 RDM (10 unique distances))
- Take mean of distances at each voxel (collapse over all 10 values, so just mean distance)
- Define contrast vector for RDM:
- Generate searchlight indices from multiple regions (functional and anat)
L = rsa_defineSearchlight(M(2:3),M{1},'sphere',[40 160]);
- Smaller # of indices, just for 2 masks, within functional mask
- Surface-based searchlight
- define searchlight on the surface (e.g., lh.pial, lh.white)
- Gifti is like nifti, but for surface
- Convert FS files to .gii
rsa_readSurf({white},{pial});
rsa_defineSearchlight(S,mask,'sphere',[20 80]);