Skip to content

A robust homologous recombination deficiency predictor based on copy number alteration features

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

XSLiuLab/HRDCNA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HRDCNA: Homologous recombination deficiency prediction by copy number alteration features

Introduction

HRDCNA is a robust HRD predictor based on copy number alteration (CNA) features. CNA information can be obtained from a diverse type of data, such as shallow WGS, WES, SNP array, and panel sequencing, and could represent a cost-effective type of biomarker for cancer diagnosis and clinical response prediction. HRDCNA can precisely predict HR status across cancer types using CNA features data derived from different platforms and it provides a robust tool for cost-effective HRD prediction and also demonstrates the applicability of CNA features in cancer precision medicine. We made HRDCNA into an R package HRDCNA for use.

The package provides you with the functions and example data to automise this process of extracting CNA features and calculating HRDCNA scores.

Getting started

Requirements

  • Software: R
  • Operating system: Linux, OS X, Windows
  • R version: 4.1.0

Installation

You can install the development version of HRDCNA from Github with:

# From GitHub
install.packages("remotes")
remotes::install_github("XSLiuLab/HRDCNA")
library(HRDCNA)

Extracting CNA features

Input file

The HRDCNA input requires absolute copy number profile with following information:

  • Segment chromosome.
  • Segment start.
  • Segment end.
  • Absolute copy number value for this segment.
  • Sample ID.

The input data can be result from any software which provides information above. Useful softwares are listed below:

data(testdata)
head(cn_wgs)
#      sample chromosome    start      end segVal
# 1 FBC020030          1    13116  1598432      2
# 2 FBC020030          1  1599547  1661844      1
# 3 FBC020030          1  1662895 11043464      2
# 4 FBC020030          1 11044593 28695187      3
# 5 FBC020030          1 28696392 28747637      2
# 6 FBC020030          1 28749014 29497250      3

Extracting CNA features

nmfcn_wgs <- sigminercopy(data = cn_wgs, "hg19")
# ℹ [2022-11-09 05:06:01]: Started.
# ℹ [2022-11-09 05:06:01]: Genome build  : hg19.
# ℹ [2022-11-09 05:06:01]: Genome measure: called.
# ✔ [2022-11-09 05:06:01]: Chromosome size database for build obtained.
# ℹ [2022-11-09 05:06:01]: Reading input.
# ✔ [2022-11-09 05:06:01]: A data frame as input detected.
# ✔ [2022-11-09 05:06:01]: Column names checked.
# ✔ [2022-11-09 05:06:01]: Column order set.
# ✔ [2022-11-09 05:06:02]: Chromosomes unified.
# ✔ [2022-11-09 05:06:02]: Data imported.
# ℹ [2022-11-09 05:06:02]: Segments info:
# ℹ [2022-11-09 05:06:02]:     Keep - 25915
# ℹ [2022-11-09 05:06:02]:     Drop - 0
# ✔ [2022-11-09 05:06:02]: Segments sorted.
# ℹ [2022-11-09 05:06:02]: Joining adjacent segments with same copy number value. Be patient...
# ✔ [2022-11-09 05:06:06]: 24809 segments left after joining.
# ✔ [2022-11-09 05:06:06]: Segmental table cleaned.
# ℹ [2022-11-09 05:06:06]: Annotating.
# ✔ [2022-11-09 05:06:07]: Annotation done.
# ℹ [2022-11-09 05:06:07]: Summarizing per sample.
# ✔ [2022-11-09 05:06:07]: Summarized.
# ℹ [2022-11-09 05:06:07]: Generating CopyNumber object.
# ✔ [2022-11-09 05:06:07]: Generated.
# ℹ [2022-11-09 05:06:07]: Validating object.
# ✔ [2022-11-09 05:06:07]: Done.
# ℹ [2022-11-09 05:06:07]: 6.438 secs elapsed.
# ℹ [2022-11-09 05:06:07]: Started.
# ℹ [2022-11-09 05:06:08]: Step: getting copy number features.
# ℹ [2022-11-09 05:06:08]: Getting breakpoint count per 10 Mb...
# ℹ [2022-11-09 05:06:14]: Getting breakpoint count per chromosome arm...
# ℹ [2022-11-09 05:06:20]: Getting copy number...
# ℹ [2022-11-09 05:06:20]: Getting change-point copy number change...
# ℹ [2022-11-09 05:06:25]: Getting length of chains of oscillating copy number...
# ℹ [2022-11-09 05:06:29]: Getting (log10 based) segment size...
# ℹ [2022-11-09 05:06:29]: Getting the minimal number of chromosome with 50% CNV...
# ℹ [2022-11-09 05:06:32]: Getting burden of chromosome...
# ✔ [2022-11-09 05:06:33]: Gotten.
# ℹ [2022-11-09 05:06:33]: Step: generating copy number components.
# ✔ [2022-11-09 05:06:33]: `feature_setting` checked.
# ℹ [2022-11-09 05:06:33]: Step: counting components.
# ✔ [2022-11-09 05:06:35]: Counted.
# ℹ [2022-11-09 05:06:35]: Step: generating components by sample matrix.
# ✔ [2022-11-09 05:06:35]: Matrix generated.
# ℹ [2022-11-09 05:06:35]: 27.582 secs elapsed.

This step returns an NMF matrix containing information about CNA features and their counts. Below we can see what a part of the NMF matrix looks like.

head(nmfcn_wgs)[1:9]
#   BP10MB[0] BP10MB[1] BP10MB[2] BP10MB[3] BP10MB[4] BP10MB[5] BP10MB[>5] BPArm[0] BPArm[1]
# 1       246        29        31         4         3         1          2       11        6
# 2       262        30        19         4         0         1          0       19        5
# 3       147        54        50        23        21        12          9        4        0
# 4       222        59        26         7         2         0          0        8        8
# 5       175        61        46        18         9         4          3        5        2
# 6       127        72        49        32        14         9         13        4        1

Calculating HRDCNA score

Once we have the NMF matrix containing information about CNA features and their counts, HRDCNA score can be calculated and we can use it for predicting HRD.

score_wgs <- HRDprediction(data = nmfcn_wgs)
head(score_wgs)
#  HRDCNAScore    sample
# 1 0.06710088 FBC013587
# 2 0.05277765 FBC016006
# 3 0.97387076 FBC016026
# 4 0.98887292 FBC016050
# 5 0.97191659 FBC020021
# 6 0.96064644 FBC020030

The higher the HRDCNA score, the greater the probability that the sample is HRD.

The development process of HRDCNA model, its applications in biology, and generated data and figures can be achieved can be read online at InterpretationAnalysisHRDCNA.

About

A robust homologous recombination deficiency predictor based on copy number alteration features

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages