Details on how HiFiCNV auxiliary data files were generated.
Excluded regions can optionally be specified with a bed file. Two example exclusion files for hg38/GRCh38 are provided here:
- cnv.excluded_regions.hg38.bed.gz - Contains regions that are known to cause artifacts during data processing (e.g. centromeres). Script to generate this file can be found here.
- cnv.excluded_regions.common_50.hg38.bed.gz - Contains all of the regions in the above file, plus regions that were frequently called as a duplication or deletion in a population. The additional regions were generated by running HiFiCNV on our population (N=97), and then storing any bin where >50% of the population had a duplication or deletion overlapping that bin.
All depth bins intersecting an excluded region are removed from the depth bins track. All minor allele frequency evidence intersecting an excluded region are removed from the MAF track.
Segmentation will treat any depth bins intersecting an excluded region as having a small bias in favor of a special unknown copy-number state -- the probability of all other copy number states are equal, but lower than the unknown state. This means that a copy number change can span through a short excluded region if there is sufficient evidence on the left or right flank, but longer excluded regions should be segmented into an unknown state.
By default, HiFiCNV expects each chromosome to have two full copies (e.g. a diploid organism). When reporting variants to the output VCF file, it will only report deviations from this expectation. However, this expectation is undesirable for some chromosomes (e.g. sex chromosomes) or non-diploid organisms. The expectation can be overridden by providing a BED file with expected copy number values. Two examples corresponding to male/female in human hg38/GRCh38 are provided here: