Package 

Class AssemblyHaplotypesPlugin

  • All Implemented Interfaces:
    java.lang.Runnable , net.maizegenetics.plugindef.Plugin , net.maizegenetics.plugindef.PluginListener , net.maizegenetics.util.ProgressListener

    @Deprecated() 
    public class AssemblyHaplotypesPlugin
    extends AbstractPlugin
                        

    This Class is DEPRECATED!! Should not be used anymore. It will not create or store the genome_file_data table entries. Process Assemblies - both anchor and interanchor. This class exercises the following tools: mummer4 scripts: (nucmer, delta-filter, show-coords, show-snps) Algorithm when processing all steps: 1. run mummer4:nucmer with the parameters -c 250 --mum 2. run mummer4:show-coords with parameters -T -r (tab-delimited, sorted by ref) 3. run mummer4:show-snps with parameters -T -r (tab-delimited, sorted by ref ID and SNP positions) 4. Interanchors. a. Processed the same as the anchors. The VariantContext code grabs sequence aligning to both. We expect a certain linearity to the alignments. There will be large indels between sequences that are picked up as inter-genic regions. 5. The DB is loaded in batches of 3000 reference_ranges at a time. Assumptions/requirements: 1. alignments are on a per-chromosome basis: I expect fastas to be broken down by chromosome, 1 chrom per fasta 2. Id lines in fasta must start with >X, or >chrX or >chromosomeX where X is the chromosome number. Additional data is allowed after a space, but the first data must identify the chromosome as above. 3. The assembly and reference chromosomes need to be named in a similar fashion to allow for db matching of genome_interval_ids IE, if the ref chrom name has a number without leading 0's, so must the assembly. IE, chr1 for ref, then chr1 for assembly. 4. For populating the DB per Ed: ploidy=1, genes/chroms phased=TRUE (put the confidence at 1) NOTE: Entry points There are now 3 places where the user may enter this code. The default is to run all steps. - "all": entryPoint parameter is "all", all steps are run - "refilter": entryPoint parameter is "refilter" Processing begins with the call to MummerScriptProcessing.refilterCoords() It is assumed the alignment directory contains mummer output for nucmer, delta-filter and show-coords on both the original and filtered delta files. - "haplotypes": entryPoint parameter is "haplotypes" Processing begins after all mummer4 scripts have finished, and the coordinates and snps files have been filtered. Only the VariantContext/sequence creation and db loading steps are performed. It is assumed the algin directory contains the necessary mummer files for processing.