-
- All Implemented Interfaces:
-
java.lang.Runnable
,net.maizegenetics.plugindef.Plugin
,net.maizegenetics.plugindef.PluginListener
,net.maizegenetics.util.ProgressListener
@Deprecated() public class AssemblyHaplotypesPlugin extends AbstractPlugin
This Class is DEPRECATED!! Should not be used anymore. It will not create or store the genome_file_data table entries. Process Assemblies - both anchor and interanchor. This class exercises the following tools: mummer4 scripts: (nucmer, delta-filter, show-coords, show-snps) Algorithm when processing all steps: 1. run mummer4:nucmer with the parameters -c 250 --mum 2. run mummer4:show-coords with parameters -T -r (tab-delimited, sorted by ref) 3. run mummer4:show-snps with parameters -T -r (tab-delimited, sorted by ref ID and SNP positions) 4. Interanchors. a. Processed the same as the anchors. The VariantContext code grabs sequence aligning to both. We expect a certain linearity to the alignments. There will be large indels between sequences that are picked up as inter-genic regions. 5. The DB is loaded in batches of 3000 reference_ranges at a time. Assumptions/requirements: 1. alignments are on a per-chromosome basis: I expect fastas to be broken down by chromosome, 1 chrom per fasta 2. Id lines in fasta must start with >X, or >chrX or >chromosomeX where X is the chromosome number. Additional data is allowed after a space, but the first data must identify the chromosome as above. 3. The assembly and reference chromosomes need to be named in a similar fashion to allow for db matching of genome_interval_ids IE, if the ref chrom name has a number without leading 0's, so must the assembly. IE, chr1 for ref, then chr1 for assembly. 4. For populating the DB per Ed: ploidy=1, genes/chroms phased=TRUE (put the confidence at 1) NOTE: Entry points There are now 3 places where the user may enter this code. The default is to run all steps. - "all": entryPoint parameter is "all", all steps are run - "refilter": entryPoint parameter is "refilter" Processing begins with the call to MummerScriptProcessing.refilterCoords() It is assumed the alignment directory contains mummer output for nucmer, delta-filter and show-coords on both the original and filtered delta files. - "haplotypes": entryPoint parameter is "haplotypes" Processing begins after all mummer4 scripts have finished, and the coordinates and snps files have been filtered. Only the VariantContext/sequence creation and db loading steps are performed. It is assumed the algin directory contains the necessary mummer files for processing.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description public enum
AssemblyHaplotypesPlugin.ASSEMBLY_ENTRY_POINT
-
Field Summary
Fields Modifier and Type Field Description public final static String
DEFAULT_CITATION
public final static String
POSITION_LIST_NONE
public final static String
TAXA_LIST_NONE
-
Constructor Summary
Constructors Constructor Description AssemblyHaplotypesPlugin()
AssemblyHaplotypesPlugin(Frame parentFrame)
AssemblyHaplotypesPlugin(Frame parentFrame, boolean isInteractive)
-
Method Summary
Modifier and Type Method Description DataSet
processData(DataSet input)
static Map<Integer, AnchorDataPHG>
createAndLoadAssemblyData(Map<Integer, ReferenceRange> idByRangeMap, List<VariantContext> fullVC, GenomeSequence refSequence, Connection dbConn, String assemblyName, String chrom, int clusterSize, String method, Map<String, String> pluginParams)
Create the assembly genotype/haplotype data and load to the PHG database ImageIcon
getIcon()
String
getButtonName()
String
getToolTipText()
static void
main(Array<String> args)
String
ref()
Input reference fasta file for single chromosome AssemblyHaplotypesPlugin
ref(String value)
Set Reference Fasta File. String
assembly()
Assembly fasta file for a single chromosome to align against the reference AssemblyHaplotypesPlugin
assembly(String value)
Set Assembly Fasta File. String
outputDir()
Output directory including trailing / for writing files AssemblyHaplotypesPlugin
outputDir(String value)
Set Output Directory. String
dbConfigFile()
File containing lines with data for host=, user=, password= and DB=, DBtype= used for db connection AssemblyHaplotypesPlugin
dbConfigFile(String value)
Set DB Config File. String
assemblyName()
Name of Assembly Taxon, to be stored as taxon name in the DB AssemblyHaplotypesPlugin
assemblyName(String value)
Set Assembly Name. String
chrom()
Name of chromosome as it appears both for the reference in the db reference_ranges table, and in the fasta file idLine for the assembly AssemblyHaplotypesPlugin
chrom(String value)
Set Chromosome Name. String
mummer4Path()
Path to mummer4 binaries AssemblyHaplotypesPlugin
mummer4Path(String value)
Set Mummer4 binary path Integer
clusterSize()
Cluster size to use with mummer4 nucmer script. AssemblyHaplotypesPlugin
clusterSize(Integer value)
Set Mummer4 Nucmer Cluster Size . String
gVCFOutputDir()
Directory for gvcf files to be output for later use AssemblyHaplotypesPlugin
gVCFOutputDir(String value)
Set GVCF Output Dir. AssemblyHaplotypesPlugin.ASSEMBLY_ENTRY_POINT
entryPoint()
Where to begin processing. AssemblyHaplotypesPlugin
entryPoint(AssemblyHaplotypesPlugin.ASSEMBLY_ENTRY_POINT value)
Set Assembly Entry Point. Integer
minInversionLen()
Minimum length of inversion for it to be kept as part of the alignment. AssemblyHaplotypesPlugin
minInversionLen(Integer value)
Set Minimum Inversion Length. String
assemblyMethod()
Method name to load to db for assembly processing, default is mummer4 AssemblyHaplotypesPlugin
assemblyMethod(String value)
Set Assembly Method. -
Methods inherited from class net.maizegenetics.plugindef.AbstractPlugin
addListener, cancel, convert, dataSetReturned, getCitation, getInputs, getListeners, getMenu, getPanel, getParameter, getParentFrame, getUsage, getUsageHTML, hasListeners, isInteractive, isPluginParameter, performFunction, pluginDescription, pluginParameters, pluginUserManualURL, progress, receiveInput, reverseTrace, run, setConfigParameters, setParameter, setParameters, setParametersToDefault, setThreaded, trace, usageParameters, wasCancelled
-
Methods inherited from class net.maizegenetics.plugindef.Plugin
getPluginInstance, isPlugin
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
-
Method Detail
-
processData
DataSet processData(DataSet input)
-
createAndLoadAssemblyData
static Map<Integer, AnchorDataPHG> createAndLoadAssemblyData(Map<Integer, ReferenceRange> idByRangeMap, List<VariantContext> fullVC, GenomeSequence refSequence, Connection dbConn, String assemblyName, String chrom, int clusterSize, String method, Map<String, String> pluginParams)
Create the assembly genotype/haplotype data and load to the PHG database
-
getButtonName
String getButtonName()
-
getToolTipText
String getToolTipText()
-
ref
AssemblyHaplotypesPlugin ref(String value)
Set Reference Fasta File. Input reference fasta file for single chromosome
- Parameters:
value
- Reference Fasta File
-
assembly
String assembly()
Assembly fasta file for a single chromosome to align against the reference
-
assembly
AssemblyHaplotypesPlugin assembly(String value)
Set Assembly Fasta File. Assembly fasta file for a single chromosome to align against the reference
- Parameters:
value
- Assembly Fasta File
-
outputDir
AssemblyHaplotypesPlugin outputDir(String value)
Set Output Directory. Output directory including trailing / for writing files
- Parameters:
value
- Output Directory
-
dbConfigFile
String dbConfigFile()
File containing lines with data for host=, user=, password= and DB=, DBtype= used for db connection
-
dbConfigFile
AssemblyHaplotypesPlugin dbConfigFile(String value)
Set DB Config File. File containing lines with data for host=, user=, password= and DB=, DBtype= used for db connection
- Parameters:
value
- DB Config File
-
assemblyName
String assemblyName()
Name of Assembly Taxon, to be stored as taxon name in the DB
-
assemblyName
AssemblyHaplotypesPlugin assemblyName(String value)
Set Assembly Name. Name of Assembly Taxon, to be stored as taxon name in the DB
- Parameters:
value
- Assembly Name
-
chrom
String chrom()
Name of chromosome as it appears both for the reference in the db reference_ranges table, and in the fasta file idLine for the assembly
-
chrom
AssemblyHaplotypesPlugin chrom(String value)
Set Chromosome Name. Name of chromosome as it appears both for the reference in the db reference_ranges table, and in the fasta file idLine for the assembly
- Parameters:
value
- Chromosome Name
-
mummer4Path
String mummer4Path()
Path to mummer4 binaries
-
mummer4Path
AssemblyHaplotypesPlugin mummer4Path(String value)
Set Mummer4 binary path
- Parameters:
value
- Mummer4 binary path
-
clusterSize
Integer clusterSize()
Cluster size to use with mummer4 nucmer script.
-
clusterSize
AssemblyHaplotypesPlugin clusterSize(Integer value)
Set Mummer4 Nucmer Cluster Size . Cluster size to use with mummer4 nucmer script.
- Parameters:
value
- Mummer4 Nucmer Cluster Size
-
gVCFOutputDir
String gVCFOutputDir()
Directory for gvcf files to be output for later use
-
gVCFOutputDir
AssemblyHaplotypesPlugin gVCFOutputDir(String value)
Set GVCF Output Dir. Directory for gvcf files to be output for later use
- Parameters:
value
- GVCF Output Dir
-
entryPoint
AssemblyHaplotypesPlugin.ASSEMBLY_ENTRY_POINT entryPoint()
Where to begin processing. All runs everything. Refilter means run from the re-filtering of the coords files. hapltypes runs just the haplotypes processing.
-
entryPoint
AssemblyHaplotypesPlugin entryPoint(AssemblyHaplotypesPlugin.ASSEMBLY_ENTRY_POINT value)
Set Assembly Entry Point. Where to begin processing. All runs everything. Refilter means run from the re-filtering of the coords files. hapltypes runs just the haplotypes processing.
- Parameters:
value
- Assembly Entry Point
-
minInversionLen
Integer minInversionLen()
Minimum length of inversion for it to be kept as part of the alignment. Default is 7500.
-
minInversionLen
AssemblyHaplotypesPlugin minInversionLen(Integer value)
Set Minimum Inversion Length. Minimum length of inversion for it to be kept as part of the alignment. Default is 7500
- Parameters:
value
- Minimum Inversion Length
-
assemblyMethod
String assemblyMethod()
Method name to load to db for assembly processing, default is mummer4
-
assemblyMethod
AssemblyHaplotypesPlugin assemblyMethod(String value)
Set Assembly Method. Method name to load to db for assembly processing, default is mummer4
- Parameters:
value
- Assembly Method
-
-
-
-