@Deprecated
public class IndexKmerByHammingPlugin
create fullGenomeKmerToRefIdMap mutable map:
call createKmerToRefRangeMap() with the mutable map from above.
kmers created based on PHG graph nodes
if a kmer is seen in multiple ranges its id is set to -1. If it is seen in more than once in a single range it is not.
Find purgable kmers:
call markKmersForPurgeUsingHammingDistance() to mark kmers for purging
creates a data class with 3 primitive arrays whose indexes correspond to each other: kmerArray, refRangeIDArray, purgeArray;
hamming distance is created twice: once for first half of the kmer, once for the second half
purgeArray gets 1 at index for the kmer if hamming distance fails
purge the kmers using maps from markKmersForPurgeUsingHammingDistance() output:
purge if purgeArray index is set to 1 for this kmer; purge if refRangeId is set to -1 for this kmer
return list of kmers not purged. (the "keep" list)
second pass: create refRangeID-to-haplotypesIds map
if a given kmer is in the "keep" list, create map associating the kmer with all the haplotypes that contain it.
write the data to tab-delimited file containing kmer and list of hapIds containing that kmer
public IndexKmerByHammingPlugin(@Nullable java.awt.Frame parentFrame, boolean isInteractive)
create fullGenomeKmerToRefIdMap mutable map:
call createKmerToRefRangeMap() with the mutable map from above.
kmers created based on PHG graph nodes
if a kmer is seen in multiple ranges its id is set to -1. If it is seen in more than once in a single range it is not.
Find purgable kmers:
call markKmersForPurgeUsingHammingDistance() to mark kmers for purging
creates a data class with 3 primitive arrays whose indexes correspond to each other: kmerArray, refRangeIDArray, purgeArray;
hamming distance is created twice: once for first half of the kmer, once for the second half
purgeArray gets 1 at index for the kmer if hamming distance fails
purge the kmers using maps from markKmersForPurgeUsingHammingDistance() output:
purge if purgeArray index is set to 1 for this kmer; purge if refRangeId is set to -1 for this kmer
return list of kmers not purged. (the "keep" list)
second pass: create refRangeID-to-haplotypesIds map
if a given kmer is in the "keep" list, create map associating the kmer with all the haplotypes that contain it.
write the data to tab-delimited file containing kmer and list of hapIds containing that kmer
@Nullable public net.maizegenetics.plugindef.DataSet processData(@Nullable net.maizegenetics.plugindef.DataSet input)
@Nullable public javax.swing.ImageIcon getIcon()
@NotNull public java.lang.String getButtonName()
@NotNull public java.lang.String getToolTipText()
@NotNull public java.lang.String kmerMapFile()
Binary file of kmer and haplotype ids
@NotNull public IndexKmerByHammingPlugin kmerMapFile(@NotNull java.lang.String value)
Set Kmer Map File. Binary file of kmer and haplotype ids
value
- Kmer Map File@NotNull public java.lang.String duplicateSetFile()
Binary file of duplicate kmers
@NotNull public IndexKmerByHammingPlugin duplicateSetFile(@NotNull java.lang.String value)
Set Duplicate Set File. Binary file of duplicate kmers
value
- Duplicate Set Filepublic int kmerSize()
kmer size for indexing genome. Maximum size is 32. Use the default of 32 unless you know what you are doing.
@NotNull public IndexKmerByHammingPlugin kmerSize(int value)
Set Kmer Size. kmer size for indexing genome. Maximum size is 32. Use the default of 32 unless you know what you are doing.
value
- Kmer Size@NotNull public IndexKmerByHammingPlugin kmerStepSize(int value)
Set Kmer Step Size. kmer step size for the sliding window when indexing genome. Minimum size is 1. A higher step size will give you less kmers.
value
- Kmer Step Sizepublic int kmerStepSize()
kmer step size for the sliding window when indexing genome. Minimum size is 1. A higher step size will give you less kmers.
@Nullable public java.lang.String indexKmersPrefix()
Prefix to keep kmers that start with that prefix. This is to reduce the number of Kmers by 1/4
@NotNull public IndexKmerByHammingPlugin indexKmersPrefix(@NotNull java.lang.String value)
Set Kmer Prefix. Prefix to keep kmers that start with that prefix. This is to reduce the number of Kmers by 1/4
value
- Kmer Prefixpublic boolean revCompliment()
Create Kmers on both strands. If true, this will reverse compliment each haplotype node's sequence and will create kmers for both.
@NotNull public IndexKmerByHammingPlugin revCompliment(boolean value)
Set Reverse Compliment. Create Kmers on both strands. If true, this will reverse compliment each haplotype node's sequence and will create kmers for both.
value
- Reverse Complimentpublic int minAllowedHamming()
Minimum kmer count required to define a kmer as unique. Within reference range, this filter criteria is not applied. Only when comparing kmers across reference ranges is Hamming Distance taken into account. A higher number here should result in fewer kmers as more will be marked as repetitive.
@NotNull public IndexKmerByHammingPlugin minAllowedHamming(int value)
Set Min Allowed Hamming. Minimum kmer count required to define a kmer as unique. Within reference range, this filter criteria is not applied. Only when comparing kmers across reference ranges is Hamming Distance taken into account. A higher number here should result in fewer kmers as more will be marked as repetitive.
value
- Min Allowed Hammingpublic int minKmerCountPerRefRange()
Minimum kmer counts to be included in processing. This is to reduce the number of one off kmers
@NotNull public IndexKmerByHammingPlugin minKmerCountPerRefRange(int value)
Set Min Kmer Count Per Range. Minimum kmer counts to be included in processing. This is to reduce the number of one off kmers
value
- Min Kmer Count Per Rangepublic int maxKmerCountPerRefRange()
Maximum kmer counts to be included in processing. This is to reduce the number of highly repetative kmers
@NotNull public IndexKmerByHammingPlugin maxKmerCountPerRefRange(int value)
Set Max Kmer Count Per Range. Maximum kmer counts to be included in processing. This is to reduce the number of highly repetative kmers
value
- Max Kmer Count Per Range@Nullable public java.lang.String kmerDistFile()
Kmer to Distance files.
@NotNull public IndexKmerByHammingPlugin kmerDistFile(@NotNull java.lang.String value)
Set Kmer Dist Export. Kmer to Distance files.
value
- Kmer Dist Export@Nullable public java.lang.String purgeOutputFile()
Purge Array export after marking file.
@NotNull public IndexKmerByHammingPlugin purgeOutputFile(@NotNull java.lang.String value)
Set Purge Array Export. Purge Array export after marking file.
value
- Purge Array Export