Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: open2c/cooltools
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v0.7.1
Choose a base ref
...
head repository: open2c/cooltools
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: master
Choose a head ref
  • 10 commits
  • 7 files changed
  • 3 contributors

Commits on Jul 4, 2024

  1. Test external datasets: updated test external mcool datasets for mESC…

    … Micro-c for multiple resolutions
    agalitsyna committed Jul 4, 2024
    Copy the full SHA
    2a4ee69 View commit details

Commits on Aug 9, 2024

  1. reduce the chunksize in rearrange to a more reasonable 2e7

    golobor committed Aug 9, 2024
    Copy the full SHA
    ed70cef View commit details
  2. Merge pull request #531 from open2c/bugfix-rearrange

    reduce the chunksize in rearrange to a more reasonable 2e7
    golobor authored Aug 9, 2024

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    e0dc6ce View commit details
  3. fix a major bug in cross score that resulted in miscalculation of the…

    … first distance bin
    golobor committed Aug 9, 2024
    Copy the full SHA
    461f906 View commit details
  4. Merge pull request #532 from open2c/bugfix_cross_score

    fix a major bug in cross score
    golobor authored Aug 9, 2024

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    5655cf2 View commit details
  5. an attempt to fix register_cmap import

    golobor committed Aug 9, 2024
    Copy the full SHA
    566dd14 View commit details
  6. Merge pull request #533 from open2c/fix_register_cmap_import

    an attempt to fix register_cmap import in plotting
    golobor authored Aug 9, 2024

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    f608f25 View commit details
  7. make register_cmap backward compatible wrt to argument order

    golobor committed Aug 9, 2024
    Copy the full SHA
    dcdf337 View commit details
  8. Merge pull request #534 from open2c/fix_register_cmap_import

    make register_cmap backward compatible wrt to argument order
    golobor authored Aug 9, 2024

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    ab97a04 View commit details

Commits on Oct 21, 2024

  1. Update pool decorator, and multiprocessing (#539)

    closes #535 #536
    
    * 1. update pool_decorator 
    * 2. convert lambda func to def func to avoid conflicts with pickle
    Yaoyx authored Oct 21, 2024

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    aedd531 View commit details
Showing with 74 additions and 37 deletions.
  1. +5 −2 cooltools/api/dotfinder.py
  2. +6 −3 cooltools/api/expected.py
  3. +1 −1 cooltools/api/rearrange.py
  4. +2 −2 cooltools/lib/common.py
  5. +4 −3 cooltools/lib/plotting.py
  6. +52 −22 cooltools/sandbox/cross_score.py
  7. +4 −4 datasets/external_test_files.tsv
7 changes: 5 additions & 2 deletions cooltools/api/dotfinder.py
Original file line number Diff line number Diff line change
@@ -1262,6 +1262,9 @@ def cluster_filtering_hiccups(
# large helper functions wrapping smaller step-specific ones
####################################################################

def _compose_score_hist(tile, to_score, to_hist):
return to_hist(to_score(tile))

@pool_decorator
def scoring_and_histogramming_step(
clr,
@@ -1300,7 +1303,7 @@ def scoring_and_histogramming_step(
to_hist = partial(histogram_scored_pixels, kernels=kernels, ledges=ledges)

# compose scoring and histogramming together :
job = lambda tile: to_hist(to_score(tile))
job = partial(_compose_score_hist, to_score=to_score, to_hist=to_hist)

# standard multiprocessing implementation
if nproc > 1:
@@ -1388,7 +1391,7 @@ def scoring_and_extraction_step(
)

# compose scoring and histogramming together
job = lambda tile: to_extract(to_score(tile))
job = partial(_compose_score_hist, to_score=to_score, to_hist=to_extract)

# standard multiprocessing implementation
if nproc > 1:
9 changes: 6 additions & 3 deletions cooltools/api/expected.py
Original file line number Diff line number Diff line change
@@ -1035,6 +1035,10 @@ def per_region_smooth_cvd(
)

return cvd

def _balance_transform(p, weight1, weight2):
return p["count"] * p[weight1] * p[weight2]

# user-friendly wrapper for diagsum_symm and diagsum_pairwise - part of new "public" API
@pool_decorator
def expected_cis(
@@ -1179,7 +1183,7 @@ def expected_cis(
# define balanced data transform:
weight1 = clr_weight_name + "1"
weight2 = clr_weight_name + "2"
transforms = {"balanced": lambda p: p["count"] * p[weight1] * p[weight2]}
transforms = {"balanced": partial(_balance_transform, weight1=weight1, weight2=weight2)}
else:
raise ValueError(
"cooler is not balanced, or"
@@ -1317,8 +1321,7 @@ def expected_trans(
# define balanced data transform:
weight1 = clr_weight_name + "1"
weight2 = clr_weight_name + "2"
transforms = {"balanced": lambda p: p["count"] * p[weight1] * p[weight2]}

transforms = {"balanced": partial(_balance_transform, weight1=weight1, weight2=weight2)}
else:
raise ValueError(
"cooler is not balanced, or"
2 changes: 1 addition & 1 deletion cooltools/api/rearrange.py
Original file line number Diff line number Diff line change
@@ -212,6 +212,6 @@ def rearrange_cooler(
),
assembly=assembly,
mode=mode,
mergebuf=int(1e9),
mergebuf=int(2e7),
)
logging.info(f"Created a new cooler at {out_cooler}")
4 changes: 2 additions & 2 deletions cooltools/lib/common.py
Original file line number Diff line number Diff line change
@@ -2,7 +2,7 @@
import numpy as np
import pandas as pd
import bioframe
from multiprocess import Pool
from multiprocessing import Pool
from functools import wraps
import logging

@@ -526,7 +526,7 @@ def wrapper(*args, **kwargs):
# If alternative or third party map functors are provided
if "map_functor" in kwargs.keys():
logging.info(f"using an alternative map functor: {kwargs['map_functor']}")
return func(*args, **kwargs, map_functor=kwargs["map_functor"])
return func(*args, **kwargs)

pool = None
if "nproc" in kwargs.keys():
7 changes: 4 additions & 3 deletions cooltools/lib/plotting.py
Original file line number Diff line number Diff line change
@@ -5,7 +5,8 @@
try:
from matplotlib.cm import register_cmap
except ImportError:
from matplotlib.colormaps import register
from matplotlib import colormaps
register_cmap = colormaps.register

import matplotlib as mpl
import matplotlib.pyplot as plt
@@ -99,8 +100,8 @@ def get_cmap(name):

def _register_cmaps():
for name, pal in PALETTES.items():
register_cmap(name, list_to_colormap(pal))
register_cmap(name + "_r", list_to_colormap(pal[::-1]))
register_cmap(cmap=list_to_colormap(pal), name=name)
register_cmap(cmap=list_to_colormap(pal[::-1]), name=name + "_r")


_register_cmaps()
74 changes: 52 additions & 22 deletions cooltools/sandbox/cross_score.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,19 @@
from operator import index
import pathlib
import itertools
import multiprocessing as mp

import numpy as np
import cooler
import bioframe
import cooler

import argparse
import logging

import pandas as pd

import pybigtools # we will need it for writing bigwig files

parser = argparse.ArgumentParser(
description="""Calculate distance-dependent contact marginals of a Hi-C map.
@@ -21,7 +24,11 @@
"""
)

parser.add_argument("COOL_URI", metavar="COOL_URI", type=str, help="input cooler URI")
parser.add_argument(
"COOL_URI",
metavar="COOL_URI",
type=str,
help="input cooler URI")

parser.add_argument(
"--dist-bins",
@@ -37,13 +44,23 @@
)

parser.add_argument(
"--ignore-diags", type=int, default=2, help="How many diagonals to ignore"
"--ignore-diags",
type=int,
default=2,
help="How many diagonals to ignore"
)

parser.add_argument("--outfolder", type=str, default="./", help="The output folder")
parser.add_argument(
"--outfolder",
type=str,
default="./",
help="The output folder")

parser.add_argument(
"--prefix", type=str, default=None, help="The prefix for output files"
"--prefix",
type=str,
default=None,
help="The prefix for output files"
)

parser.add_argument(
@@ -118,11 +135,16 @@ def drop_resolution(clrname):
logging.basicConfig(level=logging.INFO)
if __name__ == "__main__":
mp.freeze_support()

chunksize = int(float(args.chunksize))
clr = cooler.Cooler(args.COOL_URI)
bins = clr.bins()[:]
n_pixels = clr.pixels().shape[0]
dist_bins = np.array([int(float(i)) for i in args.dist_bins.split(",")])

# dist_bins contain *right* bins edges; 0 is implied as the left edge of the first bin.
dist_bins = np.array([int(float(i)) for i in args.dist_bins.split(",")]).astype(np.int64)
dist_bins = np.r_[dist_bins, np.iinfo(dist_bins.dtype).max]

weight_name = args.clr_weight_name
ignore_diags = args.ignore_diags
formats = args.format.split(",")
@@ -133,15 +155,20 @@ def drop_resolution(clrname):

nproc = mp.cpu_count() if args.nproc is None else args.nproc

logging.info(f"Calculating marginals for {args.COOL_URI} using {nproc} processes")
with mp.Pool(nproc) as pool:
out = pool.starmap(
get_dist_margs,
[
(args.COOL_URI, lo, hi, dist_bins, weight_name, ignore_diags)
for lo, hi in zip(chunk_spans[:-1], chunk_spans[1:])
],
)
if nproc == 1:
mapfunc = itertools.starmap
else:
pool = mp.Pool(nproc)
mapfunc = pool.starmap
logging.info(f"Calculating marginals for {args.COOL_URI}, weight name {weight_name}, ignore diags {ignore_diags}; using {nproc} processes")

out = mapfunc(
get_dist_margs,
[
(args.COOL_URI, lo, hi, dist_bins, weight_name, ignore_diags)
for lo, hi in zip(chunk_spans[:-1], chunk_spans[1:])
],
)

margs_up = np.zeros(len(bins) * n_dist_bins + 1)
margs_down = np.zeros(len(bins) * n_dist_bins + 1)
@@ -165,7 +192,7 @@ def drop_resolution(clrname):
prefix = clr_name if args.prefix is None else args.prefix
res = clr.binsize

for dist_bin_id in range(n_dist_bins):
for dist_bin_id in range(n_dist_bins-1):
lo = np.r_[0, dist_bins][dist_bin_id]
hi = np.r_[0, dist_bins][dist_bin_id + 1]

@@ -180,18 +207,21 @@ def drop_resolution(clrname):

if "bigwig" in formats:
file_name = f"{prefix}.{res}.cross.{dir_str}.{lo}-{hi}.bw"
logging.info(f"Write output into {file_name}")
bioframe.io.to_bigwig(
out_path = (out_folder / file_name).resolve().as_posix()
logging.info(f"Write output into {out_path}")
bioframe.to_bigwig(
out_df,
chromsizes=clr.chromsizes,
outpath=str(out_folder / file_name),
chromsizes=clr.chromsizes.astype(int).to_dict(),
outpath=out_path,
engine='pybigtools'
)

if "bedgraph" in formats:
file_name = f"{prefix}.{res}.cross.{dir_str}.{lo}-{hi}.bg.gz"
logging.info(f"Write output into {file_name}")
out_path = (out_folder / file_name).resolve().as_posix()
logging.info(f"Write output into {out_path}")
out_df.to_csv(
str(out_folder / file_name),
out_path,
sep="\t",
index=False,
header=False,
8 changes: 4 additions & 4 deletions datasets/external_test_files.tsv
Original file line number Diff line number Diff line change
@@ -3,7 +3,7 @@ HFF_MicroC test.mcool e4a0fc25c8dc3d38e9065fd74c565dd1 https://osf.io/3h9js/down
hESC_MicroC test_hESC.mcool ac0e636605505fb76fac25fa08784d5b https://osf.io/3kdyj/download Micro-C data from human ES cells for two chromosomes (hg38) in a multi-resolution mcool format. Krietenstein et al. 2021 data.
HFF_CTCF_fc test_CTCF.bigWig 62429de974b5b4a379578cc85adc65a3 https://osf.io/w92u3/download ChIP-Seq fold change over input with CTCF antibodies in HFF cells (hg38). Downloaded from ENCODE ENCSR000DWQ, ENCFF761RHS.bigWig file
HFF_CTCF_binding test_CTCF.bed.gz 61ecfdfa821571a8e0ea362e8fd48f63 https://osf.io/c9pwe/download Binding sites called from CTCF ChIP-Seq peaks for HFF cells (hg38). Peaks are from ENCODE ENCSR000DWQ, ENCFF498QCT.bed file. The motifs are called with gimmemotifs (options --nreport 1 --cutoff 0), with JASPAR pwm MA0139.
mESC_dRad21_IAA dRAD21_IAA.mm10.mapq_30.mcool a68e4c57f15d2f0ba3b0bb9d7c4066a2 https://osf.io/wy2mh/download Micro-C data from mESC for three chromosomes (mm10) in a multi-resolution mcool format (Hsieh et al. 2022). dRad21 IAA treatment, degraded Rad21.
mESC_dRad21_UT dRAD21_UT.mm10.mapq_30.mcool 1028b3f9fce7d77c9dc73d5e6b77d078 https://osf.io/z83wv/download Micro-C data from mESC for three chromosomes (mm10) in a multi-resolution mcool format (Hsieh et al. 2022). dRad21 untreated (UT), control for Rad21 degradation.
mESC_dCTCF_IAA dCTCF_IAA.mm10.mapq_30.mcool 84fea965b341e0110df63bc06b7f1d77 https://osf.io/79qje/download Micro-C data from mESC for three chromosomes (mm10) in a multi-resolution mcool format (Hsieh et al. 2022). dCTCF IAA treatment, degraded CTCF.
mESC_dWapl_IAA dWAPL_IAA.mm10.mapq_30.mcool 1514422b821c0f96757e76e98a6507b5 https://osf.io/etdgf/download Micro-C data from mESC for three chromosomes (mm10) in a multi-resolution mcool format (Hsieh et al. 2022). dWapl IAA treatment, degraded Wapl.
mESC_dRAD21_IAA dRAD21_IAA.mm10.mapq_30.mcool 40087388c443aae19110fdf099738c06 https://osf.io/5xaut/download Micro-C data from mESC for three chromosomes (mm10) in a multi-resolution mcool format (Hsieh et al. 2022). dRad21 IAA treatment, degraded Rad21.
mESC_dRAD21_UT dRAD21_UT.mm10.mapq_30.mcool 2ff91a7def1a9dd3e1f9b62d89d579a7 https://osf.io/u75pd/download Micro-C data from mESC for three chromosomes (mm10) in a multi-resolution mcool format (Hsieh et al. 2022). dRad21 untreated (UT), control for Rad21 degradation.
mESC_dCTCF_IAA dCTCF_IAA.mm10.mapq_30.mcool 33ec02cafa9f1f31d2cbba227cf38cc6 https://osf.io/xwy9j/download Micro-C data from mESC for three chromosomes (mm10) in a multi-resolution mcool format (Hsieh et al. 2022). dCTCF IAA treatment, degraded CTCF.
mESC_dWAPL_IAA dWAPL_IAA.mm10.mapq_30.mcool 11088c9a6d10826a23a69807fc296005 https://osf.io/fk74t/download Micro-C data from mESC for three chromosomes (mm10) in a multi-resolution mcool format (Hsieh et al. 2022). dWapl IAA treatment, degraded Wapl.