add support for STRICT intra-shard boundary #201

deflaux · 2016-06-14T20:50:17Z

When the genomic region we want to examine is divided into shards, we use a STRICT shard boundary to remove duplicate data that would occur at the end of the current shard and also at the beginning of the next shard.

This works fine when we are working over an entire chromosome.
BUT when we want to shard a subset of a chromosome, we are filtering out the records at the beginning of the very first shard even though they would not be duplicated in any other shards.
- Some times we do want to make use of those records that overlap the beginning of the shard boundary.
- We need a way to use OVERLAPS for the first shard and STRICT for all subsequent shards.

Confirm this functionality with a JoinNonVariantSegmentsWithVariants integration test that operates over a small genomic region specified by both normal sharding and SitesToShards.

deflaux self-assigned this Jun 14, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for STRICT intra-shard boundary #201

add support for STRICT intra-shard boundary #201

deflaux commented Jun 14, 2016 •

edited

Loading

add support for STRICT intra-shard boundary #201

add support for STRICT intra-shard boundary #201

Comments

deflaux commented Jun 14, 2016 • edited Loading

deflaux commented Jun 14, 2016 •

edited

Loading