Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider to include Human Nephrogenesis Atlas expression values in manual curation of high-evidence genes #8

Open
ninarank opened this issue Jun 27, 2023 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@ninarank
Copy link

ninarank commented Jun 27, 2023

Instead of or in addition to using Descartes fetal gene expression, consider using "Human Nephrogenesis Atlas".

The "Human Nephrogenesis Atlas" provides ssRNA seq data of two 14-weeks human kidneys (after QC: 8,316 cells (6,667 nephrogenic)).
They identified 18 cell clusters in the developing kidney. For each gene, you can get expression values of each cell cluster or of the total "Developmental Kidney". I suppose the "Developmental Kidney" expression value could be used if you only want 1 value per gene for the fetal kidney. This values is accessible via an API (script in progress).

So far, I did not find out, how exactly they calculate this expression value (which normalization technique etc.). I screened the original paper, but they didn't specify it there. I have not read the paper in detail, yet.

https://sckidney.flatironinstitute.org/
To get an example expression value: => Tab 'Gene Network' => select 'Developmental Kidney' => select gene of interest

@ninarank
Copy link
Author

ninarank commented Jul 6, 2023

# This is a pre-version of a script using the Human Nephrogenesis Atlas - API 
# When querying one specific gene, the API queries some other genes of the gene functional network as well. Maybe this can be somehow avoided to make everything faster, but I didn't find out how. 
# TODO: Replace paths

library(httr)
library(jsonlite)
library(tidyverse)
library(progress)


# download HGNC gene table from github repository "kidney-genetics"
hgnc_gt_version <- "2023-06-21"
gene_table_url <- paste0("https://github.com/halbritter-lab/kidney-genetics/raw/main/analyses/A_AnnotationHGNC/results/non_alt_loci_set_coordinates.", hgnc_gt_version, ".csv.gz")
download.file(url = gene_table_url, destfile = paste0("gene_score/raw/HGNC_", hgnc_gt_version, ".csv.gz")) # TODO: Replace paths
gunzip(filename = paste0("gene_score/raw/HGNC_", hgnc_gt_version, ".csv.gz"), 
       destname = paste0("gene_score/raw/HGNC_", hgnc_gt_version, ".csv")) # TODO: Replace paths

# load HGNC gene table and filter for protein-coding genes 
HGNC_table <- read.csv("gene_score/raw/HGNC_2023-06-21.csv") %>% 
  filter(locus_group == "protein-coding gene") # TODO: Replace paths


# function for getting expression values via the Human nephrogenesis atlas API
get_nga_expression <- function(cell_cluster, entrez_id_vec){
  # set query url
  entrez_id_collapse <- paste(paste0("entrez=", entrez_id_vec), collapse = "&")
  url <- "https://sckidney.flatironinstitute.org/integrations/query/"
  query_url <- paste0(url, cell_cluster, "?", entrez_id_collapse)
  
  # set the header
  headers <- c("Content-Type" = "application/json")
  
  # get the response
  response <- GET(query_url, add_headers(headers))
  
  # get the response content
  content <- content(response, as="text", encoding = "UTF-8")
  content_list <- fromJSON(content)
  
  # select queried genes
  res <- data.frame(content_list$genes) %>% 
    filter(query == TRUE) %>% 
    dplyr::select(standard_name, systematic_name, entrez_id = entrez, expression_value) %>% 
    mutate(cell_cluster = cell_cluster)
  
  return(res)
}

# get expression values chunk wise (all genes at once not possible)
entrez_table <- HGNC_table %>% dplyr::select(entrez_id) %>% unique()
entrez_split_list <- split(entrez_table$entrez_id, ceiling(seq_along(entrez_table$entrez_id) / 250)) 
expr_vals_df <- data.frame()
cell_clusters <- c(57) # 57 = Developmental Kidney
pb <- progress_bar$new(total = length(entrez_split_list)*length(cell_clusters))


for (cell_cluster in cell_clusters){
  for (i in entrez_split_list){
    pb$tick()  # Increment progress bar
    cat("\n")
    sub_df <- get_nga_expression(cell_cluster, i)
    
    expr_vals_df <- dplyr::bind_rows(expr_vals_df, sub_df)
  }
}


View(expr_vals_df)

# write csv ...

@berntpopp berntpopp self-assigned this Nov 20, 2023
@berntpopp berntpopp added the enhancement New feature or request label Nov 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants