- Remove the soon-to-be-deprecated
BRGenomics
dependency.- Port
tidyChromosomes
function toEpiCompare
.
- Port
- Update maintainer details.
- Example report:
- Delete report/ folder and upload to Releases instead: https://github.com/neurogenomics/EpiCompare/releases
- Add Rscript to replicate example report in inst/examples.
- EpiCompare
- New arg
add_download_button
. - Always keep download button for post-processed peak files.
- New arg
- `save_output``
- Change .txt suffixes to more informative .csv suffixes for saved files.
- Create
EpiArchives
and offload report to there:- Updated vignettes/example_report.Rmd so that it just renders the markdown landing page for
EpiArchives
.
- Updated vignettes/example_report.Rmd so that it just renders the markdown landing page for
- README.Rmd:
- Fix broken link to example report.
- test-EpiCompare.R:
- Fix issue with PNG save path.
- Make separate subfolders for each set of tests.
- test-output_files.R
- Make separate subfolders for each set of tests.
plot_enrichment
- Conditionally generate plots only when enrichment results aren't
NULL
. - Return KEGG/GO enrichment results as well as the plots.
- Conditionally generate plots only when enrichment results aren't
- Add
workers <- check_workers(workers = workers)
to all functions that takeworkers
to handleworkers=NULL
properly.
download_button
:- Saves and downloads files.
prepare_blacklist
:- Auto-selects appropriate blacklist, or returns user-specified option.
EpiCompare(blacklist=NULL)
is now the default.
prepare_genome_builds
:- Update to handle supplying builds for "peakfiles" and "reference"
but not "blacklist"
(so long as the
blacklist
arg is not a user-suppliedGRanges
object)
- Update to handle supplying builds for "peakfiles" and "reference"
but not "blacklist"
(so long as the
- Added
mm9_blacklist
- Made more plots interactive:
width_boxplot
plot_enrichment
plot_ChIPseeker_annotation
overlap_stat_plot
- Name elements in output list.
- Change
annotation
arg to more informativetxdb
arg, and set default toNULL
, whichChIPseeker
functions will automatically handle. - New function
as_interactive
:- Help standardise this.
- New
EpiCompare::EpiCompare
arguments:error
: keep knitting even on errors.tss_distance
: upstream/downstream of TSS.quiet
: knit quietly
- Rename 'test-EpiCompare_combinations.R' --> 'test-EpiCompare.R'
- Separate test-generalMetrics_functions.R into function-specific test files.
- Separate test-peakOverlap_functions.R into function-specific test files.
- Make fancy header with new func:
report_header()
- Create
EpiCompare
command code as text:report_command()
width_boxplot
:- Make more efficient with
data.table
andlapply
- Make more efficient with
- Update hex sticker to match custom.css palette.
- README.Rmd
- Collapse more detailed sections.
tss_plot
:- Fix examples/tests after Sera updated the arguments.
- Pass upstream/downstream to
ChIPseeker::getTagMatrix
- Make interactive
- Name plots in list
- Remove unnecessary extra level of list nesting.
- Make documentation width <80 lines where possible.
- EpiCompare.Rmd
- Remove
methods::show
from all parts - Name all chunks
- Make explanations more clear
- Add table of contents for main 3 sections.
- Fix header levels
- Set
results='asis'
globally instead of in each chunk header. - Automatically number sections with yaml arg:
number_sections: true
- Omit specific headers from numbering system with
{-}
tags. - Add custom.css
- Remove
plot_chromHMM
:Error in (function (classes, fdef, mtable) unable to find an inherited method for function ‘annotateWithFeatures’ for signature ‘"SimpleGRangesList", "list"’
- Misleading error message; was actually due to
chromHMM_annotation
not being converted from a list to aGRangesList
. - Change yaml arg
peakfile
-->peakfiles
to be consistent with other variables.
- Replace
badger
withrworkfows
:- Use
rworkflows::use_badges
- Use
- New helper functions:
precision_recall_matrix
report_time
overlap_upset_plot
:- Switched out
UpSetR
forComplexUpset
to show percentages. - Moved up dep checks to beginning of function.
- Switched out
- Handle bug with
heatmaply
by checking args where it might be used:check_heatmap_args
tss_plot
:- Add unit tests
- Drastically reduce example/test runtime by setting
upstream=50
compute_corr
:- Reduce example runtime by setting
bin_size = 200000
(takes <2s).
- Reduce example runtime by setting
- Fix typo in
EpiCompare
docs: "hg38 blacklist dataset" - Avoid explicitly specifying "/" in paths to help cross-platform testing.
tss_plot
:- Use
parallel::detectCores-1
by default to set workers, but set to 1 in examples/tests to meet CRAN/Bioc standards.
- Use
- Add back example report html:
- Put it in the main dir
- Add it to the .Rbuildignore: report/EpiCompare_example.html
- Add a new vignette that renders the HTML from the pre-saved file.
- Remove Dockerfile, as it's no longer necessary with
the updated version of
rworkflows
- Add
@returns
togroup_files
function. - Add all authors to vignettes.
- New function:
predict_precision_recall
- Added unit tests.
compute_corr
andprecision_recall
now save outputs, including when run viaEpiCompare
Rmarkdown script.- Make subfunctions for
plot_precision_recall
:plot_precision_recall_prcurve
plot_precision_recall_f1
rebin_peaks
:- Allow users to specify
sep
between genomic coordinates in rownames.
- Allow users to specify
- Update
gather_files
to match new Picard file scheme in nf-core/cutandrun 3.0.
rebin_peaks
:- Added arg
drop_empty_chr
to automatically drop chroms that aren't in any of thepeakfiles
. - Added "score" as one of the default
intensity_cols
in all relevant functions. - Make examples use 5000bp bins to speed up.
- Added arg
translate_genome
:- Add
default_genome
arg to handlegenome=NULL
.
- Add
bpplapply
:- New exported function to automate handling of known issues with
BiocParallel
across OS platforms. - Enable users to specify their own apply function.
- New exported function to automate handling of known issues with
get_bpparam
: Add args to allow users to choose whichBiocParallel
func to use.checkCache
: Make default argcache=BiocFileCache::BiocFileCache(ask = FALSE)
to skip user input during runtime.precision_recall
:- Change
increment_threshold
arg ton_threshold
arg, using theseq(length.out=)
feature to avoid accidentally choosing an inappropriately largeincrement_threshold
.
- Change
gather_files
:- Replace iterator with
bpplapply
. - Pass up args from
bpplapply
. - Provide warning message, not error, when 0 files found. Returns
NULL
. - Add "multiqc" as a search option.
- Add dedicated subfunctions for reading in a variety of
nf-core/cutandrun outputs files:
read_picard
,read_multiqc
,read_bowtie
,read_trimgalore
,read_bam
,read_peaks
- Add file paths to each object.
- Add new arg
rbind_list
.
- Replace iterator with
rebin_peaks
/compute_corr
: -Change defaultbin_size
from 100 --> 5kb to improve efficiency and align with other defaults of other packages (e.gSignac
).tss_plot
:- Pass up more arg for specifying upstream/downstream.
EpiCompare
: Pass up new args:bin_size
n_threshold
workers
- Fix
rebin_peaks
unit tests. - Fix pkg size issue by adding inst/report to .Rbuildignore.
EpiCompare
wasn't being run when reference was a single unlistedGRanges
object because it was indeed length>1, but thenames
were allNULL
. Now fixed.plot_precision_recall
: Set defaultinitial_threshold=
to 0.- Switch from
BiocParallel
toparallel
, as the former is extremely buggy and inconsistent.
check_genome_build
: Addtranslate_genome
as prestep.rebin_peaks
:- Move all steps that could be done just once (e.g. creating the genome-wide tiles object) outside of the
BiocParallel::bpmapply
iterator. - Ensure all outputs of
BiocParallel::bpmapply
are of the same length, within the exact same bins, so that we can return just the bare minimum data needed to create the matrix (1 numeric vector/sample). - Instead of
rbind
ing the results and then casting them back into a matrix (which is safer bc it can handle vectors of different lengths), simplycbind
all vectors into one matrix directly and name the rows using the predefined genome-wide tiles. - Because we are no longer
rbind
ing a series of very long tables, this avoids the issue encountered here #103. This means this function is now much more scalable to many hundreds/thousands of samples (cells) even at very small bin sizes (e.g. 100bp). - A new argument
keep_chr
allows users to specify whether they want to restrict which chromosomes are used during binning. By default, all chromosomes in the reference genome are used (keep_chr=NULL
), but specifying a subset of chromosomes (e.g.paste0("chr",seq_len(12))
) can drastically speed up compute time and reduce memory usage. It can also be useful for removing non-standard chromosomes (e.g. "chr21_gl383579_alt", "chrUns...", "chrRand..."). - As a bonus,
rebin_peaks
now reports the final binned matrix dimensions and a sparsity metric.
- Move all steps that could be done just once (e.g. creating the genome-wide tiles object) outside of the
compute_corr
:- Added unit tests at different bin sizes.
- Allow
reference
to beNULL
.
- Updated README to reflect latest vesion of
EpiCompare
withgather_files
.
- Bumped version to align with Bioc devel (currently 1.1.0).
compute_percentiles
:- Making default
initial_threshold=0
, so as not to assume any particular threshold.
- Making default
rebin_peaks
:- Addressed error that occurs when there's many samples/cells with small bins.
plot_precision_recall
: Don't plot the reference as part of the PR curve.
- Changed terminology from "epigenetic" to "epigenomic"
- Updated README to include precision_recall_plot and corr_plot
- Removed bugs in html report
- Added example EpiCompare report in inst/report/EpiCompare_example.html
- Upgraded
liftover_grl
and added genome standardization.- Enable cross-species liftover from/to mm10 and mm9 --> hg19 and hg38.
- Subfunctionalize
get_chain_file
. - Add
merge_all
option.
- Support mm10/mm9 as
output_build
options. - Removed
dplyr
. - Moved
plyranges
to Suggests. plot_precision_recall
:- New exported function to create precision-recall plots from MACS2, MACS3, HOMER, or SEACR peak files.
- Added unit tests.
- Added to EpiCompare html report.
- Added
EpiCompare(precision_recall_plot=)
param and documented.
- Add ISSUE templates.
- Include code in html report (collapsed by default).
- Add correlation matrix/plot functionality.
- Add
compute_consensus_peaks()
as function for preprocessing peak files.- Add
group_files()
function to help assign each peakfile to a group based on substring searches.
- Add
EpiCompare
:- Return paths to HTML reports.
- Automatically open report in browser or rstudio.
- Add Docker vignette and advertise in README.
- Made
BiocParallel
functions compatible with Windows. - Organize author fields in DESCRIPTION.
- Fix typos in README.
- Remove threshold=1 from list of thresholds to test in precision-recall curves.
- Set first chunk in EpiCompare.Rmd as
echo=FALSE
instead ofinclude=FALSE
so the output messages will still be printed (without showing the code). - Remove
here
from Suggests. - Fix directory creation in
EpiCompare::EpiCompare
.
- Simplified loops with
mapply
/lapply
.
EpiCompare
: accepts multiple reference files - creates individual reports for each reference. Added timing feature.save_output()
: this function saves all plots and tables generated by EpiCompare. Also saves interactive heatmaps. Used in EpiCompare.Rmd.fig_length()
: This function outputs dynamic figure height/width depending on the number of items. Used in EpiCompare.Rmd.
prepare_reference
: Validate reference input before passing to next step.- Pass named list to
genome_build
to allow for different builds betweenreference
andpeaklist
. - Liftover
blacklist
to match GRanges list it's being used to filter intidy_peakfile
. - Ensure all names are unique in
peaklist
andreference
. gather_files
:- Avoid gathering duplicates peak files from
nf-core/cutandrun
. - Add progress bar.
- Add report at the end.
- Add extra arg
return_paths
to return only the paths without actually reading in the files.
- Avoid gathering duplicates peak files from
- Overhaul how EpiCompare handles genome builds:
- New argument
genome_build_output
allows users to specify which genome build to standardise all inputs to. genome_build
can now take a named list to specify different genome builds forpeakfiles
,reference
, andblacklist
.- Added functions to parse and validate all genome build-related arguments.
- New argument
- Remove unnecessary deps.
- Use
data.table
to read/write tables. prepare_peaklist
:- Simplified code.
- Added arg
remove_empty
to automatically drop any empty elements. - Embed
check_list_names
within.
plot_chromHMM
:- Can return data as well with
return_data
. - Performs liftover on chromHMM data instead of the
peaklist
.
- Can return data as well with
- Make
output_dir
creation recursive and without warnings. - Add new params to Code section of rmarkdown output.
- Add new
peaklist
length check toprepare_peaklist
. - New check functions:
check_genomebuild
: ensure necessary packages installed and that "genomebuild" is valid.check_cell_lines
liftover_grlist
: Dedicated liftover function, exported.- Document
checkCache
. get_chromHMM_annotation
can now take a list of cell lines as an argument.
- Fix GHA pkgdown building:
- The newest version of git introduced bugs when building pkgdown sites from within Docker containers (e.g. via my Linux GHA workflow). Adjusting GHA to fix this.
- New functions with examples/unit tests:
import_narrowPeak
: Import narrowPeak files, with automated header annotation using metadata from ENCODE.\gather_files
: Automatically peak/picard/bed files and read them in as a list ofGRanges
objects.\write_example_peaks
: Write example peak data to disk.
- Update .gitignore
- Update .Rbuildignore
- New parameter in EpiCompare:
genome_build
: Specify the genome build, either "hg19" or "hg38". This parameter is also included inplot_chromHMM
,plot_ChIPseeker_annotation
,tss_plot
andplot_enrichment
.
EpiCompare
submitted to Bioconductor.