Mapping gene names to Ensembl IDs #70

JosephDiPalma · 2024-11-05T19:41:57Z

Did you have a chance to explore mapping the gene names to Ensembl IDs?
I'm looking into that now and don't know if there's a recommended way.

guillaumejaume · 2024-11-05T20:08:27Z

As of today, we only provide support to de-alias HUGO gene names, https://github.com/mahmoodlab/HEST/blob/main/src/hest/HESTData.py#L1178.

@konst-int-i, do you have a code snipped to provide?

@JosephDiPalma, PR welcome! Handling gene names is always complex.

JosephDiPalma · 2024-11-05T21:27:10Z

I'll look into it further.
Ultimately, I'd like to use it in something like Geneformer which requires the Ensembl IDs.
Any further ideas would be appreciated too.

If I figure out a good solution, I'll send a PR.

konst-int-i · 2024-11-09T00:26:05Z

Hi @JosephDiPalma,

I also encountered this issue in the past and here is a working version of a fix that I have not PR'ed yet.

It's handled relatively easily using sc.queries.biomart_annotations. I strongly recommend caching as the initial queries can be slow.

    def _ensembleID_to_gene(adata: sc.AnnData, species: str): 
        """
        Converts ensemble gene IDs to gene names using BioMart annotations
        """
        org = "hsapiens" if species == "Homo sapiens" else "mmusculus"
        
        annotations = sc.queries.biomart_annotations(org=org,attrs=['ensembl_gene_id', 'external_gene_name'], use_cache=True)
        ensembl_to_gene_name = dict(zip(annotations['ensembl_gene_id'], annotations['external_gene_name']))
        adata.var['gene_name'] = adata.var_names.map(ensembl_to_gene_name)
        

        # Filter out genes where the conversion returned NaN       
        adata.var_names = adata.var['gene_name'].fillna('')
        valid_genes = adata.var['gene_name'].notna()
        adata = adata[:, valid_genes]

        return adata

Let me know if you have any troubles getting this to work - I will also create a draft PR to continue this discussion.

konst-int-i mentioned this issue Nov 9, 2024

HESTData: provide util to map ensemble ID to gene name #71

Draft

2 tasks

pauldoucet added the enhancement New feature or request label Nov 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mapping gene names to Ensembl IDs #70

Mapping gene names to Ensembl IDs #70

JosephDiPalma commented Nov 5, 2024

guillaumejaume commented Nov 5, 2024

JosephDiPalma commented Nov 5, 2024

konst-int-i commented Nov 9, 2024 •

edited

Loading

Mapping gene names to Ensembl IDs #70

Mapping gene names to Ensembl IDs #70

Comments

JosephDiPalma commented Nov 5, 2024

guillaumejaume commented Nov 5, 2024

JosephDiPalma commented Nov 5, 2024

konst-int-i commented Nov 9, 2024 • edited Loading

konst-int-i commented Nov 9, 2024 •

edited

Loading