Contains multiple implementations of a 'region join', an operation that joins two sets of regions based on the spatial overlap between the regions.
Extends the ShuffleRegionJoin trait to implement a full outer join.
Partition a genome into a set of bins.
Partition a genome into a set of bins.
Note that this class will not tolerate invalid input, so filter in advance if you use it.
The size of each bin in nucleotides
A map containing the length of each contig
GenomicPositionPartitioner partitions ReferencePosition objects into separate, spatially-coherent regions of the genome.
GenomicPositionPartitioner partitions ReferencePosition objects into separate, spatially-coherent regions of the genome.
This can be used to organize genomic data for computation that is spatially distributed (e.g. GATK and Queue's "scatter-and-gather" for locus-parallelizable walkers).
The number of equally-sized regions into which the total genomic space is partitioned; the total number of partitions is numParts + 1, with the "+1" resulting from one extra partition that is used to capture null or UNMAPPED values of the ReferencePosition type.
a map relating sequence-name to length and indicating the set and length of all extant sequences in the genome.
Extends the BroadcastRegionJoin trait to implement an inner join.
Extends the ShuffleRegionJoin trait to implement an inner join.
Extends the ShuffleRegionJoin trait to implement an inner join followed by grouping by the left value.
Extends the ShuffleRegionJoin trait to implement a left outer join.
Repartitions objects that are keyed by a ReferencePosition or ReferenceRegion into a single partition per contig.
Extends the BroadcastRegionJoin trait to implement a right outer join.
Extends the ShuffleRegionJoin trait to implement a right outer join.
Extends the ShuffleRegionJoin trait to implement a right outer join followed by grouping by all non-null left values.
Helper object to merge sharded files together.
Contains multiple implementations of a 'region join', an operation that joins two sets of regions based on the spatial overlap between the regions.
Different implementations will have different performance characteristics -- and new implementations will likely be added in the future, see the notes to each individual method for more details.