Do we need to correct the batch effects of given datasets #43

HelloWorldLTY · 2024-08-30T23:46:17Z

Hi, thanks for your great work. I wonder if we need to correct the batch effects of these spatial transcriptomic data or not. Thanks a lot!

guillaumejaume · 2024-08-30T23:54:21Z

Hi, it depends on what you want to do with HEST data. What's your use case?

HelloWorldLTY · 2024-08-31T01:48:35Z

I am interested in the Visium data only. Thanks.

guillaumejaume · 2024-09-01T16:05:18Z

Visium data integrated into HEST-1k are very diverse: 2 species (mouse and human), multiple diseases, and organs. Batch effect correction should always be done if there are some guarantees that it won't significantly alter the biological signal.

To give a better answer, I need a better understanding of your problem statement, e.g., multimodal representation learning, ST prediction from H&E, characterization of morphological correlates of expression changes, etc.

If you want to explore batch effect, we implemented 2 core functions:

Batch effect visualization, here, which does a UMAP viz of the gene expression of housekeeping genes (ie stable genes) in the stromal region. The function can take as input a series of visium samples that you want to use.
Batch effect correction, here, which can correct batch effects using MNN, Harmony, and Combat. The output of each method is different, e.g., Harmony creates a new latent space, so the output cannot be interpreted as gene counts anymore (this may or may not be an issue for your problem statement)

HelloWorldLTY · 2024-09-02T15:53:04Z

Thanks! I will take a look at it!

guillaumejaume · 2024-09-04T14:35:44Z

@HelloWorldLTY, feel free to document any findings on this GitHub issue.

skambha6 · 2024-10-01T18:28:58Z

Related to this, I am noticing fairly strong batch effects by sample-of-origin for the H&E patch embeddings from Visium data even from the same tissue and disease. Is this to be expected or am I missing a key pre-processing step? I am loading in the patches using a H5HESTDataset object and applying only the model-specific eval_transforms (which generally appear to be resizing and ImageNet Normalization).

guillaumejaume · 2024-10-02T12:00:14Z

Batch effects in the H&E images exist. Why patch encoder are you using?

skambha6 · 2024-10-02T13:37:13Z

I see this with both the Gigapath and UNI encoders.

guillaumejaume · 2024-10-02T13:48:15Z

In my experience CONCH is less sensitive to staining variations. Also, keep in mind that the image latent space can express staining variations, while also encoding all the relevant biological signal. Depending on the downstream task, it may not be critical.

skambha6 · 2024-10-02T13:50:08Z

I see. Are there any ways to correct for the staining variations with preprocessing/normalization? It seems that Harmony can remove some of the image batch effects from the embeddings, but not all.

guillaumejaume · 2024-10-02T13:53:19Z

Many approaches exist for stain normalization in computational pathology, e.g., Macenko or Vahadane normalization. However, these can also alter the biological signal from the image. I'd need to better understand your problem statement to provide a more informed answer.

skambha6 · 2024-10-02T13:55:12Z

Got it! We were interested in predicting gene expression from the patch embeddings, but it seems from what you're saying that batch effect correction can hurt more than help for this task.

guillaumejaume · 2024-10-02T14:04:03Z

In HEST-Benchmark we didn't apply additional corrections. I'm sure that performance can be improved. But the big unknown becomes how to ensure good generalization.

skambha6 · 2024-10-02T14:10:25Z

Okay got it, thank you for the information!

pauldoucet added the scientific-discussion label Sep 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do we need to correct the batch effects of given datasets #43

Do we need to correct the batch effects of given datasets #43

HelloWorldLTY commented Aug 30, 2024

guillaumejaume commented Aug 30, 2024 •

edited

Loading

HelloWorldLTY commented Aug 31, 2024 •

edited

Loading

guillaumejaume commented Sep 1, 2024

HelloWorldLTY commented Sep 2, 2024

guillaumejaume commented Sep 4, 2024

skambha6 commented Oct 1, 2024

guillaumejaume commented Oct 2, 2024

skambha6 commented Oct 2, 2024

guillaumejaume commented Oct 2, 2024

skambha6 commented Oct 2, 2024

guillaumejaume commented Oct 2, 2024

skambha6 commented Oct 2, 2024

guillaumejaume commented Oct 2, 2024

skambha6 commented Oct 2, 2024

Do we need to correct the batch effects of given datasets #43

Do we need to correct the batch effects of given datasets #43

Comments

HelloWorldLTY commented Aug 30, 2024

guillaumejaume commented Aug 30, 2024 • edited Loading

HelloWorldLTY commented Aug 31, 2024 • edited Loading

guillaumejaume commented Sep 1, 2024

HelloWorldLTY commented Sep 2, 2024

guillaumejaume commented Sep 4, 2024

skambha6 commented Oct 1, 2024

guillaumejaume commented Oct 2, 2024

skambha6 commented Oct 2, 2024

guillaumejaume commented Oct 2, 2024

skambha6 commented Oct 2, 2024

guillaumejaume commented Oct 2, 2024

skambha6 commented Oct 2, 2024

guillaumejaume commented Oct 2, 2024

skambha6 commented Oct 2, 2024

guillaumejaume commented Aug 30, 2024 •

edited

Loading

HelloWorldLTY commented Aug 31, 2024 •

edited

Loading