You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For datasets with over 200 features, df-analyze current feature selection methods, specifically, stepwise methods, are extremely inefficient. It would be nice to simply prefilter many-featured datasets down to a more manageable size before continuing with further tuning and selection, etc.
This should be based on either associations (e.g. thresholding mutual info or correlations), univariate predictions (though these would be too costly perhaps to get in the first place), and MultiSurf.
This could be implemented with a few arguments:
--pre-filter
activate two-step feature selection
--pre-filter-method <relief|assoc|pred>
specify filter method, e.g. Relief or univariariate associations/predictions
--pre-filter-option <option>
e.g. MultiSurf/Turf/Surf for relief
AUROC/F1/acc for univariate preds, if affordable
mutual info / Pearson correlation / Cramer V for univariate associations
--n-pre-filter <n>
Number of features to pre-filter down to
The text was updated successfully, but these errors were encountered:
For datasets with over 200 features, df-analyze current feature selection methods, specifically, stepwise methods, are extremely inefficient. It would be nice to simply prefilter many-featured datasets down to a more manageable size before continuing with further tuning and selection, etc.
This should be based on either associations (e.g. thresholding mutual info or correlations), univariate predictions (though these would be too costly perhaps to get in the first place), and MultiSurf.
This could be implemented with a few arguments:
--pre-filter
--pre-filter-method <relief|assoc|pred>
--pre-filter-option <option>
--n-pre-filter <n>
The text was updated successfully, but these errors were encountered: