-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mapping gene names to Ensembl IDs #70
Comments
As of today, we only provide support to de-alias HUGO gene names, https://github.com/mahmoodlab/HEST/blob/main/src/hest/HESTData.py#L1178. @konst-int-i, do you have a code snipped to provide? @JosephDiPalma, PR welcome! Handling gene names is always complex. |
I'll look into it further. If I figure out a good solution, I'll send a PR. |
Hi @JosephDiPalma, I also encountered this issue in the past and here is a working version of a fix that I have not PR'ed yet. It's handled relatively easily using def _ensembleID_to_gene(adata: sc.AnnData, species: str):
"""
Converts ensemble gene IDs to gene names using BioMart annotations
"""
org = "hsapiens" if species == "Homo sapiens" else "mmusculus"
annotations = sc.queries.biomart_annotations(org=org,attrs=['ensembl_gene_id', 'external_gene_name'], use_cache=True)
ensembl_to_gene_name = dict(zip(annotations['ensembl_gene_id'], annotations['external_gene_name']))
adata.var['gene_name'] = adata.var_names.map(ensembl_to_gene_name)
# Filter out genes where the conversion returned NaN
adata.var_names = adata.var['gene_name'].fillna('')
valid_genes = adata.var['gene_name'].notna()
adata = adata[:, valid_genes]
return adata Let me know if you have any troubles getting this to work - I will also create a draft PR to continue this discussion. |
Did you have a chance to explore mapping the gene names to Ensembl IDs?
I'm looking into that now and don't know if there's a recommended way.
The text was updated successfully, but these errors were encountered: