Skip to content

Provide note to users for too many missing values in imputation#145

Draft
vbrennsteiner wants to merge 10 commits intomainfrom
warn_too_high_missing
Draft

Provide note to users for too many missing values in imputation#145
vbrennsteiner wants to merge 10 commits intomainfrom
warn_too_high_missing

Conversation

@vbrennsteiner
Copy link
Collaborator

@vbrennsteiner vbrennsteiner commented Jan 12, 2026

Add a warning for users when they are about to impute features above a predermined missingness threshold. This is meant to avoid situations where users run imputation per default, not considering that features which are mostly missing may become increasingly unreliable when imputed.

--> As pointed out in the meeting on 2026.01.19, "warning" is maybe not the correct term here and I would prefer "notification" - it's not that we want to suggest to users that something is going wrong, but there's a chance that every once in the wile we raise the right eyebrows to reconsider some downstream step that would be performed on only imputed values.


To-do list (outside contributers only)

@vbrennsteiner vbrennsteiner self-assigned this Jan 12, 2026
@vbrennsteiner vbrennsteiner marked this pull request as ready for review January 14, 2026 12:29
@lucas-diedrich
Copy link
Collaborator

I would say that it is difficult to set a specific threshold

Example: Imagine that you profile a heterogeneous cell population with single cell proteomics. A certain protein is a very specific marker for a cell population that only makes up 10% of the total cell counts. The missingness of this protein would always be higher than any reasonable threshold?

@vbrennsteiner
Copy link
Collaborator Author

I would say that it is difficult to set a specific threshold

Example: Imagine that you profile a heterogeneous cell population with single cell proteomics. A certain protein is a very specific marker for a cell population that only makes up 10% of the total cell counts. The missingness of this protein would always be higher than any reasonable threshold?

Agreed, the point is not necessarily to warn/dissuade, just to make aware that from here on the majority of values will be imputed. Perseus has to my knowledge some kind of imputation cutoff where it won't impute unless a minimal number of proportion of values is non-missing. It would just be a convenience warning that might prompt introspection if imputation is used without really thinking about the implications..

@vbrennsteiner vbrennsteiner changed the title Warn too high missing Provide note to users for too many missing values in imputation Jan 19, 2026
@vbrennsteiner vbrennsteiner marked this pull request as draft March 15, 2026 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants