public class BestHaplotypePathPlugin
Plugin that takes a haplotype graph and a set of read mappings to infer the best (most likely) path through the graph given the read mappings. Read mappings are a list of reads with a set of haplotypes to which that read aligned.
The plugin can (1) take a file of read mappings and return a file with a list of haplotypes or (2) take read mappings from a PHG DB and store the resulting list of haplotypes in the DB.
If (1) the input is a file, then the plugin can take either a file or a directory containing multiple files. If a directory, all read mapping files will be processed and the haplotype lists output as separate files to an output directory. If the output directory is not specified, then the lists will be written to the input directory. Any path files of the same name, will not be overwritten and a message will be written to the log to that effect, unless the overwrite flag is set to true.
If (2) the input comes from a PHG DB, an input read map method and the output path method must be supplied. In addition, a specific taxon or list of taxa for which paths are to be imputed can be supplied. If paths for any of the taxa and methods exist, the paths will not be imputed and a warning message will be written to the log. If an overwrite flag is set to true, any existing paths will be overwritten and a message to that effect written to the log.
public BestHaplotypePathPlugin(@Nullable java.awt.Frame parentFrame, boolean isInteractive)
Plugin that takes a haplotype graph and a set of read mappings to infer the best (most likely) path through the graph given the read mappings. Read mappings are a list of reads with a set of haplotypes to which that read aligned.
The plugin can (1) take a file of read mappings and return a file with a list of haplotypes or (2) take read mappings from a PHG DB and store the resulting list of haplotypes in the DB.
If (1) the input is a file, then the plugin can take either a file or a directory containing multiple files. If a directory, all read mapping files will be processed and the haplotype lists output as separate files to an output directory. If the output directory is not specified, then the lists will be written to the input directory. Any path files of the same name, will not be overwritten and a message will be written to the log to that effect, unless the overwrite flag is set to true.
If (2) the input comes from a PHG DB, an input read map method and the output path method must be supplied. In addition, a specific taxon or list of taxa for which paths are to be imputed can be supplied. If paths for any of the taxa and methods exist, the paths will not be imputed and a warning message will be written to the log. If an overwrite flag is set to true, any existing paths will be overwritten and a message to that effect written to the log.
protected void preProcessParameters(@Nullable net.maizegenetics.plugindef.DataSet input)
protected void postProcessParameters()
@Nullable public net.maizegenetics.plugindef.DataSet processData(@Nullable net.maizegenetics.plugindef.DataSet input)
public void processReadFile(@NotNull java.lang.String readFileName, @NotNull HaplotypeGraph graph)
Method to process a single read file and write the resulting path to the output directory. If overwrite = false, then the method will not overwrite an existing path file.
readFileName
- The full path of the the file with read mappingsgraph
- The HaplotypeGraph that will be used to infer a path as a list of haplotype idspublic void processReadDirectory(@NotNull java.io.File readDir, @NotNull HaplotypeGraph graph)
Method that gets a list of read mapping files from the read directory and calls processReadFile for each of the files.
readDir
- The directory containing the read mapping files to be processedgraph
- The HaplotypeGraph used to infer pathsprocessReadFile
public void findPathsFromDBReadMappings(@NotNull HaplotypeGraph graph, @NotNull java.lang.String keyFile)
Method that gets a list of read mappings from the database for a specific method and writes a path to the database for each read mapping record
graph
- The HaplotypeGraph used to infer pathspublic void findPathsFromDBReadMappingsMultithread(@NotNull HaplotypeGraph graph, @NotNull java.lang.String keyFile)
Method to find paths by extracting the read mappings from the DB. Then it will aggregate the counts together by summing the counts for a given hapId set.
The code will then find a path for those read mappings.
@NotNull public java.lang.String getToolTipText()
@Nullable public javax.swing.ImageIcon getIcon()
@NotNull public java.lang.String getButtonName()
@NotNull public java.lang.String pluginDescription()
@NotNull public java.lang.String keyFile()
KeyFile file name. Must be a tab separated file using the following headers: SampleName ReadMappingIds LikelyParents ReadMappingIds and LikelyParents need to be comma separated for multiple values
@NotNull public BestHaplotypePathPlugin keyFile(@NotNull java.lang.String value)
Set KeyFile. KeyFile file name. Must be a tab separated file using the following headers: SampleName ReadMappingIds LikelyParents ReadMappingIds and LikelyParents need to be comma separated for multiple values
value
- KeyFile@Nullable public java.lang.String readMapFilename()
Filename of read mappings. Do not supply both a filename and a directory.
@NotNull public BestHaplotypePathPlugin readMapFilename(@Nullable java.lang.String value)
Set Read Map File. Filename of read mappings. Do not supply both a filename and a directory.
value
- Read Map File@Nullable public java.lang.String readMapDirectory()
Directory of read mapping files. If this is supplied, do not also assign a read filename.
@NotNull public BestHaplotypePathPlugin readMapDirectory(@Nullable java.lang.String value)
Set Read Map Directory. Directory of read mapping files. If this is supplied, do not also assign a read filename.
value
- Read Map Directory@Nullable public java.lang.String pathOutDirectory()
Directory to which path files will be written.
@NotNull public BestHaplotypePathPlugin pathOutDirectory(@NotNull java.lang.String value)
Set Read Map Directory. Directory to which path files will be written.
value
- Read Map Directory@Nullable public java.lang.String readMethodName()
The name of the read method in the PHG DB
@NotNull public BestHaplotypePathPlugin readMethodName(@NotNull java.lang.String value)
Set Read Method. The name of the read method in the PHG DB
value
- Read Method@Nullable public java.lang.String pathMethodName()
The name of the path method used to write the results to the PHG DB
@NotNull public BestHaplotypePathPlugin pathMethodName(@NotNull java.lang.String value)
Set Path Method. The name of the path method used to write the results to the PHG DB
value
- Path Method@Nullable public java.lang.String pathMethodDescription()
An additional description that will be stored with the path method name, if desired.
@NotNull public BestHaplotypePathPlugin pathMethodDescription(@NotNull java.lang.String value)
Set Path Method Description. An additional description that will be stored with the path method name, if desired.
value
- Path Method Descriptionpublic boolean overwrite()
Overwrite
@NotNull public BestHaplotypePathPlugin overwrite(boolean value)
Set Overwrite. Overwrite
value
- Overwritepublic int minTaxaPerRange()
minimum number of taxa per anchor reference range. Ranges with fewer taxa will not be included in the output node list.
@NotNull public BestHaplotypePathPlugin minTaxaPerRange(int value)
Set Min Taxa. minimum number of taxa per anchor reference range. Ranges with fewer taxa will not be included in the output node list.
value
- Min Taxapublic int minReads()
minimum number of reads per anchor reference range. Ranges with fewer reads will not be included in the output node list.
@NotNull public BestHaplotypePathPlugin minReads(int value)
Set Min Reads. minimum number of reads per anchor reference range. Ranges with fewer reads will not be included in the output node list.
value
- Min Readspublic int maxReadsPerKB()
maximum number of include counts per anchor reference range Kb. Ranges with more reads will not be included in the output node list.
@NotNull public BestHaplotypePathPlugin maxReadsPerKB(int value)
Set Max Reads. maximum number of include counts per anchor reference range Kb. Ranges with more reads will not be included in the output node list.
value
- Max Readspublic int maxNodesPerRange()
maximum number of nodes per reference range. Ranges with more nodes will not be included in the output node list.
@NotNull public BestHaplotypePathPlugin maxNodesPerRange(int value)
Set Max Nodes. maximum number of nodes per reference range. Ranges with more nodes will not be included in the output node list.
value
- Max Nodespublic double minTransitionProb()
minimum probability of a transition between nodes at adjacent reference ranges.
@NotNull public BestHaplotypePathPlugin minTransitionProb(double value)
Set Min Transition Prob. minimum probability of a transition between nodes at adjacent reference ranges.
value
- Min Transition Probpublic double probReadMappedCorrectly()
minimum number of reads per anchor reference range. Ranges with fewer reads will not be included in the output node list.
@NotNull public BestHaplotypePathPlugin probReadMappedCorrectly(double value)
Set Prob Correct. minimum number of reads per anchor reference range. Ranges with fewer reads will not be included in the output node list.
value
- Prob Correctpublic boolean splitConsensusNodes()
split consensus nodes into one node per taxon.
@NotNull public BestHaplotypePathPlugin splitConsensusNodes(boolean value)
Set Split Nodes. split consensus nodes into one node per taxon.
value
- Split Nodespublic double splitTransitionProb()
When the consensus nodes are split by taxa, this is the transition probability for moving from a node to the next node of the same taxon. It equals 1 minus the probability of a recombination between adjacent nodes.
@NotNull public BestHaplotypePathPlugin splitTransitionProb(double value)
Set Split Prob. When the consensus nodes are split by taxa, this is the transition probability for moving from a node to the next node of the same taxon. It equals 1 minus the probability of a recombination between adjacent nodes.
value
- Split Probpublic boolean useBackwardForward()
Use the Backward-Forward algorithm instead of the Viterbi algorithm for the HMM.
@NotNull public BestHaplotypePathPlugin useBackwardForward(boolean value)
Set Usebf. Use the Backward-Forward algorithm instead of the Viterbi algorithm for the HMM.
value
- Usebfpublic double minProbBF()
Only nodes with minP or greater probability will be kept in the path when using the Backward-Forward algorithm,
@NotNull public BestHaplotypePathPlugin minProbBF(double value)
Set Min P. Only nodes with minP or greater probability will be kept in the path when using the Backward-Forward algorithm,
value
- Min P@Nullable public java.lang.String bfInfoFilename()
The base name of the file to node probabilities from the backward-forward algorithm will be written. taxonName.txt will be appended to each file.
@NotNull public BestHaplotypePathPlugin bfInfoFilename(@Nullable java.lang.String value)
Set Bf Info File. The base name of the file to node probabilities from the backward-forward algorithm will be written. taxonName.txt will be appended to each file.
value
- Bf Info Filepublic boolean removeRangesWithEqualCounts()
Ranges with equal read counts for all haplotypes should be removed from the graph. Defaults to true but will be always be false if minReads = 0.
@NotNull public BestHaplotypePathPlugin removeRangesWithEqualCounts(boolean value)
Set Remove Equal. Ranges with equal read counts for all haplotypes should be removed from the graph. Defaults to true but will be always be false if minReads = 0.
value
- Remove Equalpublic int numThreads()
Number of threads used to upload
@NotNull public BestHaplotypePathPlugin numThreads(int value)
Set Num Threads. Number of threads used to upload
value
- Num Threads@Nullable public net.maizegenetics.taxa.TaxaList requiredTaxaList()
Optional list of taxa required to have haplotypes. Any reference range that does not have a haplotype for one of these taxa will not be used for path finding. This can be a comma separated list of taxa (no spaces unless surrounded by quotes), file (.txt) with list of taxa names to include, or a taxa list file (.json or .json.gz). By default, all taxa will be included.
@NotNull public BestHaplotypePathPlugin requiredTaxaList(@NotNull net.maizegenetics.taxa.TaxaList value)
Set Required Taxa. Optional list of taxa required to be have haplotypes. Any reference range that does not have a haplotype for one of these taxa will not be used for path finding. This can be a comma separated list of taxa (no spaces unless surrounded by quotes), file (.txt) with list of taxa names to include, or a taxa list file (.json or .json.gz). By default, all taxa will be included.
value
- Required Taxa@NotNull public net.maizegenetics.pangenome.hapCalling.BestHaplotypePathPlugin.ALGORITHM_TYPE algorithmType()
the type of algorithm. Choices are classic, which is the original implementation describe by Rabiner 1989, or efficient, which is modified for improved computational efficiency.
@NotNull public BestHaplotypePathPlugin algorithmType(@NotNull net.maizegenetics.pangenome.hapCalling.BestHaplotypePathPlugin.ALGORITHM_TYPE value)
Set Algorithm Type. the type of algorithm. Choices are classic, which is the original implementation describe by Rabiner 1989, or efficient, which is modified for improved computational efficiency.
value
- Algorithm Typepublic int maxParents()
To restrict path finding to the most likely parents, the number of parents used will not be greater than maxParents. The number of parents used will be the minimum of maxParents and the number of parents needed to reach minCoverage. If both maxParents and minCoverage are left at the default, all parents in the input HaplotypeGraph will be used.
@NotNull public BestHaplotypePathPlugin maxParents(int value)
Set Max Parents. To restrict path finding to the most likely parents, the number of parents used will not be greater than maxParents. The number of parents used will be the minimum of maxParents and the number of parents needed to reach minCoverage. If both maxParents and minCoverage are left at the default, all parents in the input HaplotypeGraph will be used.
value
- Max Parentspublic double minCoverage()
To restrict path finding to the most likely parents, the smallest number of parents needed to provide read coverage greater than or equal to minCoverage will be used to find paths. If maxParents is smaller, that number of parents will be used.
@NotNull public BestHaplotypePathPlugin minCoverage(double value)
Set Min Coverage. To restrict path finding to the most likely parents, the smallest number of parents needed to provide read coverage greater than or equal to minCoverage will be used to find paths. If maxParents is smaller, that number of parents will be used.
value
- Min Coverage@Nullable public java.lang.String likelyParentFile()
The name and path of the file of likely parents and their read counts.
@NotNull public BestHaplotypePathPlugin likelyParentFile(@NotNull java.lang.String value)
Set Parent Output File. The name and path of the file of likely parents and their read counts.
value
- Parent Output Filepublic boolean isTestMethod()
Indication if the data is to be loaded against a test method. Data loaded with test methods are not cached with the PHG ktor server
@NotNull public BestHaplotypePathPlugin isTestMethod(boolean value)
Set Is Test Method. Indication if the data is to be loaded against a test method. Data loaded with test methods are not cached with the PHG ktor server
value
- Is Test Method