-
Notifications
You must be signed in to change notification settings - Fork 127
ENH: Enable models for sparsely sampled fMRI series #376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 9 commits
90b861d
9057a7c
4921a0a
ff319d1
03cbeb0
42384fc
979a4a2
998ce37
10af708
b6d8271
0f0531c
09e8d9b
aa72c46
258f243
2d32955
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,6 +2,7 @@ | |
Transformations that primarily involve numerical computation on variables. | ||
''' | ||
|
||
from math import gcd | ||
import numpy as np | ||
import pandas as pd | ||
from bids.utils import listify | ||
|
@@ -35,6 +36,25 @@ def _transform(self, var, model='spm', derivative=False, dispersion=False, | |
|
||
if isinstance(var, SparseRunVariable): | ||
sr = self.collection.sampling_rate | ||
# Resolve diferences in TR and TA to milliseconds | ||
trs = {ri.tr for ri in var.run_info} | ||
tas = {ri.ta for ri in var.run_info} | ||
assert len(trs) == 1 | ||
assert len(tas) == 1 | ||
TR = int(np.round(1000. * trs.pop())) | ||
TA = int(np.round(1000. * tas.pop())) | ||
SR = int(np.round(1000. / sr)) | ||
if TA is None or TA < TR: | ||
effigies marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# Use a unit that fits an whole number of times into both | ||
# the interscan interval (TR) and the integration window (TA) | ||
dt = gcd(TR, TA) | ||
if dt > SR: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I'm probably missing something, but if we have to give up on a value that neatly bins There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 25 also permits neat binning... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, duh. The inversion tripped me up. That is indeed the thing I was missing. :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That said, I'm still not sure we shouldn't just use the I think maybe we need to come up with a general policy about this, because it crops up in other places too (e.g., the issue about what to do if event files contain very short durations). I.e., the general question is, in a situation where the user appears to be picking sampling rates that are objectively inefficient or result in loss of information, should we be paternalistic and fix it for them, or just let them know they can probably do better? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One thing we could also do, though it's a bit annoying, is internally maintain a flag that indicates whether or not the user has explicitly overridden the default sampling rate. If There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It feels like we're trying to thread a needle where we make smart default choices based on our expertise on the one hand, and on the other, load a gun, cock it and point it at the foot of anyone who looks like they might know what they're doing... Currently we reserve the footgun for people who use But perhaps there's a better general approach here? I admit that I chose this simply because building a boxcar function and taking a mean is easier than learning how to do this with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The general issue potentially affects anyone who sets the I think making decisions for the user internally is fine if there's an explicit argument in the transformation that implies as much. I.e., if |
||
# Find the nearest whole factor of dt >= SR | ||
# This compromises on the target sampling rate and | ||
# one that neatly bins TR and TA | ||
dt = dt // (dt // SR) | ||
sr = 1000. / dt | ||
|
||
var = var.to_dense(sr) | ||
|
||
df = var.to_df(entities=False) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -253,8 +253,8 @@ def split(self, grouper): | |
subsets = [] | ||
for i, (name, g) in enumerate(data.groupby(grouper)): | ||
name = '%s.%s' % (self.name, name) | ||
args = [name, g, getattr(self, 'run_info', None), self.source] | ||
col = self.__class__(*args) | ||
col = self.__class__(name=name, data=g, source=self.source, | ||
run_info=getattr(self, 'run_info', None)) | ||
subsets.append(col) | ||
return subsets | ||
|
||
|
@@ -419,7 +419,7 @@ def _build_entity_index(self, run_info, sampling_rate): | |
self.timestamps = pd.concat(_timestamps, axis=0, sort=True) | ||
return pd.concat(index, axis=0, sort=True).reset_index(drop=True) | ||
|
||
def resample(self, sampling_rate, inplace=False, kind='linear'): | ||
def resample(self, sampling_rate, integration_window=None, inplace=False, kind='linear'): | ||
'''Resample the Variable to the specified sampling rate. | ||
|
||
Parameters | ||
|
@@ -436,7 +436,7 @@ def resample(self, sampling_rate, inplace=False, kind='linear'): | |
''' | ||
if not inplace: | ||
var = self.clone() | ||
var.resample(sampling_rate, True, kind) | ||
var.resample(sampling_rate, integration_window, True, kind) | ||
return var | ||
|
||
if sampling_rate == self.sampling_rate: | ||
|
@@ -447,18 +447,36 @@ def resample(self, sampling_rate, inplace=False, kind='linear'): | |
|
||
self.index = self._build_entity_index(self.run_info, sampling_rate) | ||
|
||
x = np.arange(n) | ||
num = len(self.index) | ||
|
||
from scipy.interpolate import interp1d | ||
f = interp1d(x, self.values.values.ravel(), kind=kind) | ||
x_new = np.linspace(0, n - 1, num=num) | ||
self.values = pd.DataFrame(f(x_new)) | ||
if integration_window is not None: | ||
from scipy.sparse import lil_matrix | ||
old_times = np.arange(n) / old_sr | ||
new_times = np.arange(num) / sampling_rate | ||
integrator = lil_matrix((num, n), dtype=np.uint8) | ||
count = None | ||
for i, new_time in enumerate(new_times): | ||
cols = (old_times >= new_time) & (old_times < new_time + integration_window) | ||
# This could be determined analytically, but dodging off-by-one errors | ||
if count is None: | ||
count = np.sum(cols) | ||
integrator[i, cols] = 1 | ||
|
||
old_vals = self.values.values | ||
self.values = pd.DataFrame(integrator.tocsr().dot(old_vals) / count) | ||
else: | ||
from scipy.interpolate import interp1d | ||
x = np.arange(n) | ||
f = interp1d(x, self.values.values.ravel(), kind=kind) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is ok for upsampling not fine for downsampling. depending on frequency content this will introduce aliasing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure avoiding aliasing is in scope... or at least, I'm not sure we want to implicitly start low-pass filtering the input signal without the user's explicit consent. I'd be okay adding an explicit |
||
x_new = np.linspace(0, n - 1, num=num) | ||
self.values = pd.DataFrame(f(x_new)) | ||
|
||
assert len(self.values) == len(self.index) | ||
|
||
self.sampling_rate = sampling_rate | ||
|
||
def to_df(self, condition=True, entities=True, timing=True, sampling_rate=None): | ||
def to_df(self, condition=True, entities=True, timing=True, sampling_rate=None, | ||
integration_window=None): | ||
'''Convert to a DataFrame, with columns for name and entities. | ||
|
||
Parameters | ||
|
@@ -474,7 +492,8 @@ def to_df(self, condition=True, entities=True, timing=True, sampling_rate=None): | |
sampled uniformly). If False, omits them. | ||
''' | ||
if sampling_rate not in (None, self.sampling_rate): | ||
return self.resample(sampling_rate).to_df(condition, entities) | ||
return self.resample(sampling_rate, | ||
integration_window=integration_window).to_df(condition, entities) | ||
|
||
df = super(DenseRunVariable, self).to_df(condition, entities) | ||
|
||
|
Uh oh!
There was an error while loading. Please reload this page.