Package 

Class MummerScriptProcessing

  • All Implemented Interfaces:

    
    public class MummerScriptProcessing
    
                        

    This class contains methods that run mummer4 scripts, e.g. nucmer, delta-filter, show-coords and show-snps. In addition, there are methods that process the output from these scripts. NOTE: processing here is relative to the needs of AssemblyHaplotypesPlugin.

    • Constructor Detail

    • Method Detail

      • alignWithNucmer

         static void alignWithNucmer(String refFasta, String asmFasta, String outputDeltaFilePrefix, String outputDir, String mummer4Path, int clusterSize)

        Call mummer nucmer program to align the sequences Parameters are: -c 250: Set the minimum cluster length to 250 --mum: Use anchor matches that are unique in both the reference and query

      • runDeltaFilter

         static void runDeltaFilter(String deltaFilePrefix, String outputDir, String mummer4Path)

        Call mummer4 delta-filter method with parameter: -g 1-to-1 global alignment not allowing rearrangements NOTE: the -g option filters out many alignments, including inversions. Some of these will be added back when the "refilterCoords" method is run later.

      • refilterCoordsFileMinusG

         static void refilterCoordsFileMinusG(String outputDeltaFilePrefix, String coordsDelta, String coordsDeltaG, String coordsGNoEmbedded, String chrom)

        This method post-processes the filtered and original coords file when the mummer coords file came from a delta filtered with the -G option. It will 1. create a list of entries to add back based on groups of ascending/descending entries of at least 3 adjacent alignments whose distance from each other is less than a specified amount 2. adds the entries above (in a sorted manner) to the filtered coords list

      • refilterCoordsFile

         static void refilterCoordsFile(String outputDeltaFilePrefix, String coordsDelta, String coordsDelta1, String coordsGNoEmbedded, String chrom, int scoreThreshold)

        Takes a mummer delta file filtered via the -1 option, determines which entries to keep. From the remaining, it removes embedded entries.

      • filterCoordsOverlaps

         static void filterCoordsOverlaps(String coordsNoEmbedded, String snpFile, String coordsFinal)

        Takes a mummer coords file and searches for overlaps. All of the overlap goes to the first entry. If splitting in this manner results in a split in the middle or an assembly deletion, then split is adjusted so the deletion is contained in the first entry of the overlapped pair. The snpFile is used to determine indel positions. For now, assembly insertions may be split. This has not been a problem for variant context processing.

        Parameters:
        snpFile - SnpFIle used to determine assembly indels
      • splitOverlappingCoordsEntries

         static Tuple<List<String>, Boolean> splitOverlappingCoordsEntries(List<String> sortedList, List<String> snpList, boolean splitByRef)

        Splits overlapping entries, The mummer4 coords file entries will have these tab-delimited columns: S1 E1 S2 E2 Len1 Len2 %ID refID asmID The files processed were sorted by ref-coordinates via the show-coords -r param, so S1/E1 is ref coords and S2/E2 are the assembly coordinates. When 2 entries are found to overlap, the first entry keeps its coordinates. The second will be truncated by the amount of the overlap. It is understood this may not be completely accurate as the position of indels is not considered.

        Parameters:
        sortedList - File to be filtered
      • checkForEmbedded

         static List<String> checkForEmbedded(List<String> sortedList, boolean splitByRef)

        Check entries in a list of mummer4 coords file entries and removed those that are embedded

        Parameters:
        sortedList - List of sorted mummer coords file entries.
        splitByRef - Boolean - if true, check ref embedded.
      • findAlignmentsToReturnFromAfterMinusGFilter

         static List<String> findAlignmentsToReturnFromAfterMinusGFilter(List<String> removedList, List<String> gList)

        Takes a list of mummer4 coordinates removed during delta-filter with -g and determines which coordinates should be returned. Those returned have reference range coordinates that fall between 2 existing kept alignment reference range coordinates, and whose assembly coordinates fall between the assembly coordinates of the 2 ranges. Most often, these will be alignments where the assembly aligned on the reverse strand. Nothing is added back that falls before the start of the coordinates created with delta-filter -g, or after the end of the delta-filter -g list of coordinates. If a removed entry falls before the first of the pair of gList entries, and extends into the second gList entry, it is not added. The entries returned are limited to alignments where the ref length is at least 1000 bps. Any entries added that overlap each other will be handled in later processing when all overlaps are handled. The mummer4 delta-filter with -g performed 1-1 global alignment with no rearrangements. This purpose of findAlignmentsToReturn() is to return inversions that appear on the diagonal. NOTE: This method is obsolete, but kept to facilitate user desire to process with mummer -G filtering

      • findAlignmentsToRemove_fromMinus1Filter

         static List<String> findAlignmentsToRemove_fromMinus1Filter(List<String> origList)

        Traverse a list of mummer coords entries. Create lists of alignments that have a minimum of 3 in a row where assembly is in the same direction (forward or reverse) and the distance between the end of one entry and the beginning of the next is >0.01 percent. Return a concatenated list of all the entries kept from the original file. This method is expected to be used with a mummer coords file created from a delta file filtered with the -1 option.

      • findNonOverlappingGlistPair

         static int findNonOverlappingGlistPair(List<String> gList, int glistIdx)

        Given a list of mummer coords entries and an index, finds the entry where this entry and the one before it have no overlaps in either the ref or assembly coordinates. The index returned is for the second of the 2 entries. This method is used when searching for mummer delta file entries removed during -g delta filtering that we would like to return.

      • runShowSNPsWithCat

         static void runShowSNPsWithCat(String deltaFilePrefix, String coordsForShowSnps, String outputDir, String mummer4Path, String chrom)

        Fun the mummer4 show-snps entry against a delta file, using a coords file as additional input. From the command line, this would look like: cat coords_file | show-snps -T -r -H -S deltaFile

      • runShowSNPs

         static void runShowSNPs(String deltaFilePrefix, String outputDir, String mummer4Path, String chrom)

        This method calls show-snps using only a delta file as input

      • verifySNPEntries

         static List<String> verifySNPEntries(List<String> deltaSNPs, String coordsFile, String chrom)

        This method takes a list of Mummer SNP file entries and verifies the SNP positions are represented in the filtered/overlap-merged coords file. Additional filtering is performed to remove SNPs that occurred in overlapped coordinates entries, resulting in duplicate SNPs with differing assembly positions for the same reference position. A list of "represented" SNPs is returned. NOTE: Snp entries must be added in order.

        Parameters:
        deltaSNPs - - list of SNPs from a Mummer filtered delta file
        coordsFile - - coordinates file to use when checking for positions.
      • checkNoListEntriesInRange

         static boolean checkNoListEntriesInRange(HashSet<Integer> hashSet, List<Integer> testList)

        Method takes a RangeSet of Integers and a list of integers. It returns "false" if any value from the list is present in the RangeSet It returns "true" if no value from the list is present in the RangeSet