Skip to content

Latest commit

 

History

History
134 lines (108 loc) · 5.63 KB

Doc.md

File metadata and controls

134 lines (108 loc) · 5.63 KB

Module HiCTools

A collections of functions to facilitate analysis of HiC data based on the cooler and cooltools interfaces.

Functions

assignRegions(window, binsize, chroms, positions, arms) : Constructs a 2d region around a series of chromosomal location. Window specifies the windowsize for the constructed regions. The total region assigned will be pos-window until pos+window. The binsize specifies the size of the HiC bins. The positions which represent the center of the regions is givin the the chroms series and the positions series.

assignRegions2d(window, binsize, chroms1, positions1, chroms2, positions2, arms) : Constructs a 2d region around a series of chromosomal location pairs. Window specifies the windowsize for the constructed regions. The total region assigned will be pos-window until pos+window. The binsize specifies the size of the HiC bins. The positions which represent the center of the regions is given by the chroms1 and chroms2 series as well as the positions1 and positions2 sereis.

doPileupICCF(clr, snipping_windows, proc=5, collapse=True) : Takes a cooler file handle and snipping windows constructed by assignRegions and performs a pileup on all these regions based on the corrected HiC counts. Returns a numpy array that contains averages of all selected regions. The collapse parameter specifies whether to return the average window over all piles (collapse=True), or the individual windows (collapse=False).

doPileupObsExp(clr, expected_df, snipping_windows, proc=5, collapse=True) : Takes a cooler file handle, an expected dataframe constructed by getExpected, snipping windows constructed by assignRegions and performs a pileup on all these regions based on the obs/exp value. Returns a numpy array that contains averages of all selected regions. The collapse parameter specifies whether to return the average window over all piles (collapse=True), or the individual windows (collapse=False).

downSamplePairs(sampleDict, Distance=10000) : Will downsample cis and trans reads in sampleDict to contain as many combined cis and trans reads as the sample with the lowest readnumber of the specified distance.

getArmsHg19() : Downloads the coordinates for chromosomal arms of the genome assembly hg19 and returns it as a dataframe.

getDiagIndices(arr) : Helper function that returns the indices of the diagonal of a given array into a flattened representation of the array. For example, the 3 by 3 array: [0, 1, 2] [3, 4, 5] [6, 7, 8] would have diagonal indices [0, 4, 8].

getExpected(clr, arms, proc=20, ignoreDiagonals=2) : Takes a clr file handle and a pandas dataframe with chromosomal arms (generated by getArmsHg19()) and calculates the expected read number at a certain genomic distance. The proc parameters defines how many processes should be used to do the calculations. ingore_diags specifies how many diagonals to ignore (0 mains the main diagonal, 1 means the main diagonal and the flanking tow diagonals and so on)

getPairingScore(clr, windowsize=40000, func=<function mean>, regions=Empty DataFrame Columns: [] Index: [], norm=True, blankDiag=True) : Takes a cooler file (clr), a windowsize (windowsize), a summary function (func) and a set of genomic regions to calculate the pairing score as follows: A square with side-length windowsize is created for each of the entries in the supplied genomics regions and the summary function applied to the Hi-C pixels at the location in the supplied cooler file. The results are returned as a dataframe. If no regions are supplied, regions are constructed for each bin in the cooler file to construct a genome-wide pairing score. Norm refers to whether the median of the calculated pairing score should be subtracted from the supplied vlaues and blankDiga refers to whether the diagonal should be blanked before calculating pairing score.

getPairingScoreObsExp(clr, expected, windowsize=40000, func=<function mean>, regions=Empty DataFrame Columns: [] Index: [], norm=True) : Takes a cooler file (clr), an expected dataframe (expected; maybe generated by getExpected), a windowsize (windowsize), a summary function (func) and a set of genomic regions to calculate the pairing score as follows: A square with side-length windowsize is created for each of the entries in the supplied genomics regions and the summary function applied to the Hi-C pixels (obs/exp values) at the location in the supplied cooler file. The results are returned as a dataframe. If no regions are supplied, regions are constructed for each bin in the cooler file to construct a genome-wide pairing score.

loadPairs(path) : Function to load a .pairs or .pairsam file into a pandas dataframe. This only works for relatively small files!

pileToFrame(pile) : Takes a pile of pileup windows produced by doPileupsObsExp/doPileupsICCF (with collapse set to False; this is numpy ndarray with the following dimensions: pile.shape = [windoSize, windowSize, windowNumber]) and arranges them as a dataframe with the pixels of the pile flattened into columns and each individual window being a row.

slidingDiamond(array, sideLen=6, centerX=True) : Will slide a dimaond of side length 'sideLen' down the diagonal of the passed array and return the average values for each position and the relative position of each value with respect to the center of the array (in Bin units)