Skip to content

Feature: Can we trivially speed up things for categorical X by aggregating Y's within categories? #174

@adw96

Description

@adw96

In the course of pursuing faster null fitting for categorical covariates, it occurred to me that we may be able to speed up fitting under the alternative using a trivial technique:

In any discrete X setting with p categories I think we can immediately collapse Y from n x J to p x J, summing over the same X's. This may reduce computation especially for large n. This should work for any g().

I can't remember if scaling in n with radEmu is poor, but this could be worth a try. There's no need to keep rows as distinct, as any time they appear in the likelihood there is going to be aggregation over common X's.

@svteichman would you have the bandwidth to investigate if this is promising? This is fitting under both the null and the alternative (potentially best tested individually), and any g(). I think we could first confirm that the results of fitting are the same when Y is n x J vs when Y is p x J (aggregating over the p categories).

There would need to be a standard error adjustment (for I? Dy?) to ensure the Wald test stats are right. Presumably the same is true for the score.

This is lower priority than #173 .

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions