Skip to content

HaploCI

Gian M. Franceschini edited this page Jan 6, 2025 · 2 revisions

Introduction

HaploCI calculates the difference of insulation at each genomic bin between the two phased Hi-C maps. To compute the insulation score of a genomic bin, we use the distance normalized Hi-C matrix A (a.k.a. the matrix storing observed / expected contact values). The insulation score of a genomic bin is defined as $ins = 0.5-(inter/total)$, where $\text{total}=\text{sum}(A_{(j-20:j+20,j-20:j+20)})$ and $\text{inter}=2×\text{sum}(A_{(j-20:j,j:j+20)})$. The differential insulation at a genomic position between the two haplotypes is then defined as: $\Delta\text{ins}=\text{ins}_{\text{Hap2}}-\text{ins}_{\text{Hap1}}$

Usage

HaploC-tools/bin/downstreams.sh --help

Examples

conda run -n nHapCUT2 HaploC-tools/bin/downstreams.sh -d demo_data -k diffIns -s 25000

Parameters:

Name Description
-d The working directory for phasing
-k The module to run, should be one of diffIns, diffComp or HaploCNV
-s Bin size to run the analysis. Default value: 25000 for HaploCI analysis, 100000 for HaploCNV analysis

Output Structure

The output of the workflow is stored in the diffIns sub-directory and will look like this:

diffIns/
|-- ins.mat.bedgraph
|-- ins.pat.bedgraph
|-- diffIns.bedgraph
`-- log2diffIns.bedgraph

File description:

Name Description
ins.mat.bedgraph a .bedgraph file containing the insulation score at each bin for phased Hi-C map 'mat'
ins.pat.bedgraph a .bedgraph file containing the insulation score at each bin for phased Hi-C map 'pat'
diffIns.bedgraph a .bedgraph file containing the difference of insulation score at each bin between 'pat' and 'mat' Hi-C map
log2diffIns.bedgraph a .bedgraph file containing the difference of insulation score (log2(ins_pat/ins_mat)) at each bin between 'pat' and 'mat' Hi-C map

All .bedgraph files can be viewed directly through IGV

Run time:

For the computational requirement, running HaploCI on the xx Hi-C dataset at bin size of 25kb it took xx minutes (server information: 40 cores, 64GB Ram, Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz). The evaluation was done using a single core although HaploCI can be run in a parallel manner.