You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been utilizing your scGPT model to perform perturbation prediction tasks by fine-tuning it with perturb-seq data. Your model is exceptionally well-designed and has significantly advanced my research.
In my current workflow, I compare the model’s prediction results with gene expression values from control cells to perform differential gene expression (DEG) analysis. Specifically, I generate prediction values equivalent to the number of control cells before the model averages multiple predictions, and then compare these values with the control gene expressions to identify DEGs.
However, I have encountered an issue where the gene expression distribution of the model’s output differs from that of the control cells, which results in a distorted volcano plot, as shown in the figure below.
It appears that the model’s output does not adequately reflect the sparsity typically observed in single-cell data. To address this, I performed imputation on the control cells to better align their distribution with the model’s output. Despite this adjustment, DEG analyses using both Wilcoxon and MAST methods still resulted in p-value inflation and much more downregulated DEGs than up DEGs, as shown below.
Given that DEG analysis is crucial for deriving meaningful biological insights from the model, I am seeking your guidance on best practices for conducting DEG analysis with scGPT model outputs. Your expertise would be invaluable in guiding my research forward. I would greatly appreciate any recommendations or insights you could share.
Thank you very much in advance!
The text was updated successfully, but these errors were encountered:
I have been utilizing your scGPT model to perform perturbation prediction tasks by fine-tuning it with perturb-seq data. Your model is exceptionally well-designed and has significantly advanced my research.
In my current workflow, I compare the model’s prediction results with gene expression values from control cells to perform differential gene expression (DEG) analysis. Specifically, I generate prediction values equivalent to the number of control cells before the model averages multiple predictions, and then compare these values with the control gene expressions to identify DEGs.
However, I have encountered an issue where the gene expression distribution of the model’s output differs from that of the control cells, which results in a distorted volcano plot, as shown in the figure below.
![PastedGraphic-1](https://private-user-images.githubusercontent.com/54621392/399479900-1e191d88-586f-458f-9220-1b6cebe6dba4.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkyNTM1MTEsIm5iZiI6MTczOTI1MzIxMSwicGF0aCI6Ii81NDYyMTM5Mi8zOTk0Nzk5MDAtMWUxOTFkODgtNTg2Zi00NThmLTkyMjAtMWI2Y2ViZTZkYmE0LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTElMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjExVDA1NTMzMVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTU0NjUzZmIzYmE3MTEyOGYxMDJlMGE1NjBkM2M4NDY1NGNhNGUwYTAyNTNiYjU2YWQ0YzQ0NjJiM2JlOWFlY2QmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.T5alI58K_f48XQpoq1PoNJ7bIaixjUYDO-8_SCtJnGc)
![PastedGraphic-2](https://private-user-images.githubusercontent.com/54621392/399479940-3e80c5d6-0384-4d7f-81ae-1289691ecbb8.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkyNTM1MTEsIm5iZiI6MTczOTI1MzIxMSwicGF0aCI6Ii81NDYyMTM5Mi8zOTk0Nzk5NDAtM2U4MGM1ZDYtMDM4NC00ZDdmLTgxYWUtMTI4OTY5MWVjYmI4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTElMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjExVDA1NTMzMVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTBmNTk1NjAzYTkwMTVlZDNlMjA1NjJhMDVmYTU0MzQxYWJmODk2ZTZiNGJmMzIzZDM0OTc1MjEzNzFjZTkzZTEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.3f57wHEUEdJzH5a51nXcqvTSc7qmsJUcVP6dsNLJ88o)
It appears that the model’s output does not adequately reflect the sparsity typically observed in single-cell data. To address this, I performed imputation on the control cells to better align their distribution with the model’s output. Despite this adjustment, DEG analyses using both Wilcoxon and MAST methods still resulted in p-value inflation and much more downregulated DEGs than up DEGs, as shown below.
![PastedGraphic-5](https://private-user-images.githubusercontent.com/54621392/399480037-0bca727e-46f4-49b9-8cf3-6be32e9b2fd2.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkyNTM1MTEsIm5iZiI6MTczOTI1MzIxMSwicGF0aCI6Ii81NDYyMTM5Mi8zOTk0ODAwMzctMGJjYTcyN2UtNDZmNC00OWI5LThjZjMtNmJlMzJlOWIyZmQyLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTElMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjExVDA1NTMzMVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTgzMzQ5ZmE3MjRiZDlhNzM3YjU1ODUzYWVmYjMwN2VmZTkzYjZlMTZkZjUxYzE5YWYxMjUzZmUyNjE0YzhkZDEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.BnTlRi4GucO9u0Qb4IhCgx-uAxgn93-wEtRg6ktU7mU)
![PastedGraphic-4](https://private-user-images.githubusercontent.com/54621392/399480059-d32e4cbe-8237-49eb-a7c3-558f9f47778c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkyNTM1MTEsIm5iZiI6MTczOTI1MzIxMSwicGF0aCI6Ii81NDYyMTM5Mi8zOTk0ODAwNTktZDMyZTRjYmUtODIzNy00OWViLWE3YzMtNTU4ZjlmNDc3NzhjLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTElMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjExVDA1NTMzMVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPThkMDFkMWM5ODE5NjRiMzcxMTkxNjA2YWJmYWFmNTVmM2VkZTQ5M2U5OWI2YjY0Y2EzMDU5YjUxODJmOTVlOGUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.7F7jSn6L0M3XYZy4XY8nsWyRdHgjNwcv8T6_EYEkLak)
Given that DEG analysis is crucial for deriving meaningful biological insights from the model, I am seeking your guidance on best practices for conducting DEG analysis with scGPT model outputs. Your expertise would be invaluable in guiding my research forward. I would greatly appreciate any recommendations or insights you could share.
Thank you very much in advance!
The text was updated successfully, but these errors were encountered: