Class ReferenceSequenceFileFactory

java.lang.Object
htsjdk.samtools.reference.ReferenceSequenceFileFactory

public class ReferenceSequenceFileFactory extends Object
Factory class for creating ReferenceSequenceFile instances for reading reference sequences store in various formats.
  • Field Details

  • Constructor Details

    • ReferenceSequenceFileFactory

      public ReferenceSequenceFileFactory()
  • Method Details

    • getReferenceSequenceFile

      public static ReferenceSequenceFile getReferenceSequenceFile(File file)
      Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it. Sequence names will be truncated at first whitespace, if any.
      Parameters:
      file - the reference sequence file on disk
    • getReferenceSequenceFile

      public static ReferenceSequenceFile getReferenceSequenceFile(File file, boolean truncateNamesAtWhitespace)
      Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it.
      Parameters:
      file - the reference sequence file on disk
      truncateNamesAtWhitespace - if true, only include the first word of the sequence name
    • getReferenceSequenceFile

      public static ReferenceSequenceFile getReferenceSequenceFile(File file, boolean truncateNamesAtWhitespace, boolean preferIndexed)
      Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it.
      Parameters:
      file - the reference sequence file on disk
      truncateNamesAtWhitespace - if true, only include the first word of the sequence name
      preferIndexed - if true attempt to return an indexed reader that supports non-linear traversal, else return the non-indexed reader
    • getReferenceSequenceFile

      public static ReferenceSequenceFile getReferenceSequenceFile(Path path)
      Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it. Sequence names will be truncated at first whitespace, if any.
      Parameters:
      path - the reference sequence file on disk
    • getReferenceSequenceFile

      public static ReferenceSequenceFile getReferenceSequenceFile(Path path, boolean truncateNamesAtWhitespace)
      Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it.
      Parameters:
      path - the reference sequence file on disk
      truncateNamesAtWhitespace - if true, only include the first word of the sequence name
    • getReferenceSequenceFile

      public static ReferenceSequenceFile getReferenceSequenceFile(Path path, boolean truncateNamesAtWhitespace, boolean preferIndexed)
      Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it.
      Parameters:
      path - the reference sequence file path
      truncateNamesAtWhitespace - if true, only include the first word of the sequence name
      preferIndexed - if true attempt to return an indexed reader that supports non-linear traversal, else return the non-indexed reader
    • canCreateIndexedFastaReader

      public static boolean canCreateIndexedFastaReader(Path fastaFile)
      Checks if the provided FASTA file can be open as indexed.

      For a FASTA file to be indexed, it requires to have:

      Parameters:
      fastaFile - the reference sequence file path.
      Returns:
      true if the file can be open as indexed; false otherwise.
    • getReferenceSequenceFile

      public static ReferenceSequenceFile getReferenceSequenceFile(String source, SeekableStream in, FastaSequenceIndex index)
      Return an instance of ReferenceSequenceFile using the given fasta sequence file stream, optional index stream, and no sequence dictionary
      Parameters:
      source - The named source of the reference file (used in error messages).
      in - The input stream to read the fasta file from.
      index - The index, or null to return a non-indexed reader.
    • getReferenceSequenceFile

      public static ReferenceSequenceFile getReferenceSequenceFile(String source, SeekableStream in, FastaSequenceIndex index, SAMSequenceDictionary dictionary, boolean truncateNamesAtWhitespace)
      Return an instance of ReferenceSequenceFile using the given fasta sequence file stream and optional index stream and sequence dictionary.
      Parameters:
      source - The named source of the reference file (used in error messages).
      in - The input stream to read the fasta file from.
      index - The index, or null to return a non-indexed reader.
      dictionary - The sequence dictionary, or null if there isn't one.
      truncateNamesAtWhitespace - if true, only include the first word of the sequence name
    • getDefaultDictionaryForReferenceSequence

      public static File getDefaultDictionaryForReferenceSequence(File file)
      Returns the default dictionary name for a FASTA file.
      Parameters:
      file - the reference sequence file on disk.
    • getDefaultDictionaryForReferenceSequence

      public static Path getDefaultDictionaryForReferenceSequence(Path path)
      Returns the default dictionary name for a FASTA file.
      Parameters:
      path - the reference sequence file path.
    • loadDictionary

      public static SAMSequenceDictionary loadDictionary(InputStream in)
      Loads the sequence dictionary from a FASTA file input stream.
      Parameters:
      in - the FASTA file input stream.
      Returns:
      the sequence dictionary, or null if the header has no dictionary or it was empty.
    • getFastaExtension

      public static String getFastaExtension(Path path)
      Returns the FASTA extension for the path.
      Parameters:
      path - the reference sequence file path.
      Throws:
      IllegalArgumentException - if the file is not a supported reference file.
    • getFastaIndexFileName

      public static Path getFastaIndexFileName(Path fastaFile)
      Returns the index name for a FASTA file.
      Parameters:
      fastaFile - the reference sequence file path.