-
- All Implemented Interfaces:
public class FindRampSeqContigsInAssemblies
This method takes a fasta of ramp seq short sequences, and looks for them in an assembly genome. This is for Dan. Looking for exact matches of the 9000 across all entries in the fasta file. Look for both orig sequence, and reverse complement of sequence. This one works well - it runs each assembly in sequence. When processing the assemblies, it parallelizes over every rampSeq contig in the rampSeq map (file read into map). This speeds things up considerably from parallel processing just over the assemblies. Using indexOf(seq,startPos) still seems quicker than knuth-morris-pratt method, perhaps because of overhead of the latter. INPUT: - fasta of rampSeq short contigs - directory path, including trailing / where assembly genome fasta files live - directory path, including trailing / to which output files will be written OUTPUT: - tab-delimited files without headers, but the columns are BED file positions (0-based, inclusive/exclusive). ContigName AssemblyIDLine startPos endPos Strand In the above, Strand is whether the forward (as presented in file) or reverse-compliment of the strand matched in the assembly file. THe start/end positions are 0-based, inclusive/exclusive as for bedfiles. There is 1 tab-delimited file generated for each assembly. The file name reflects the assembly name.
-
-
Method Summary
Modifier and Type Method Description static void
processData(String shortSeqs, String assemblyDir, String outputDir)
static void
searchSeqsInFasta(Map<String, String> shortSeqs, String fastaFile, String outputDir)
static List<Tuple<String, String>>
checkForSeqMatch(String idLine, String shortSeq, String assemblySequence)
static void
writeDataToFile(List<Tuple<String, String>> results, String assemblyIDLine, int seqLen, BufferedWriter writer)
static void
main(Array<String> args)
-
-
Method Detail
-
processData
static void processData(String shortSeqs, String assemblyDir, String outputDir)
-
searchSeqsInFasta
static void searchSeqsInFasta(Map<String, String> shortSeqs, String fastaFile, String outputDir)
-
checkForSeqMatch
static List<Tuple<String, String>> checkForSeqMatch(String idLine, String shortSeq, String assemblySequence)
-
writeDataToFile
static void writeDataToFile(List<Tuple<String, String>> results, String assemblyIDLine, int seqLen, BufferedWriter writer)
-
-
-
-