Package htsjdk.samtools.reference
Class ReferenceSequenceFileFactory
java.lang.Object
htsjdk.samtools.reference.ReferenceSequenceFileFactory
Factory class for creating ReferenceSequenceFile instances for reading reference
sequences store in various formats.
-
Field Summary
Modifier and TypeFieldDescriptionDeprecated.static final String
Deprecated.since June 2019 UseFileExtensions.FASTA_INDEX
instead. -
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic boolean
canCreateIndexedFastaReader
(Path fastaFile) Checks if the provided FASTA file can be open as indexed.static File
Returns the default dictionary name for a FASTA file.static Path
Returns the default dictionary name for a FASTA file.static String
getFastaExtension
(Path path) Returns the FASTA extension for the path.static Path
getFastaIndexFileName
(Path fastaFile) Returns the index name for a FASTA file.static ReferenceSequenceFile
getReferenceSequenceFile
(File file) Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it.static ReferenceSequenceFile
getReferenceSequenceFile
(File file, boolean truncateNamesAtWhitespace) Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it.static ReferenceSequenceFile
getReferenceSequenceFile
(File file, boolean truncateNamesAtWhitespace, boolean preferIndexed) Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it.static ReferenceSequenceFile
getReferenceSequenceFile
(String source, SeekableStream in, FastaSequenceIndex index) Return an instance of ReferenceSequenceFile using the given fasta sequence file stream, optional index stream, and no sequence dictionarystatic ReferenceSequenceFile
getReferenceSequenceFile
(String source, SeekableStream in, FastaSequenceIndex index, SAMSequenceDictionary dictionary, boolean truncateNamesAtWhitespace) Return an instance of ReferenceSequenceFile using the given fasta sequence file stream and optional index stream and sequence dictionary.static ReferenceSequenceFile
getReferenceSequenceFile
(Path path) Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it.static ReferenceSequenceFile
getReferenceSequenceFile
(Path path, boolean truncateNamesAtWhitespace) Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it.static ReferenceSequenceFile
getReferenceSequenceFile
(Path path, boolean truncateNamesAtWhitespace, boolean preferIndexed) Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it.static SAMSequenceDictionary
Loads the sequence dictionary from a FASTA file input stream.
-
Field Details
-
FASTA_EXTENSIONS
Deprecated.since June 2019 UseFileExtensions.FASTA
instead. -
FASTA_INDEX_EXTENSION
Deprecated.since June 2019 UseFileExtensions.FASTA_INDEX
instead.- See Also:
-
-
Constructor Details
-
ReferenceSequenceFileFactory
public ReferenceSequenceFileFactory()
-
-
Method Details
-
getReferenceSequenceFile
Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it. Sequence names will be truncated at first whitespace, if any.- Parameters:
file
- the reference sequence file on disk
-
getReferenceSequenceFile
public static ReferenceSequenceFile getReferenceSequenceFile(File file, boolean truncateNamesAtWhitespace) Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it.- Parameters:
file
- the reference sequence file on disktruncateNamesAtWhitespace
- if true, only include the first word of the sequence name
-
getReferenceSequenceFile
public static ReferenceSequenceFile getReferenceSequenceFile(File file, boolean truncateNamesAtWhitespace, boolean preferIndexed) Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it.- Parameters:
file
- the reference sequence file on disktruncateNamesAtWhitespace
- if true, only include the first word of the sequence namepreferIndexed
- if true attempt to return an indexed reader that supports non-linear traversal, else return the non-indexed reader
-
getReferenceSequenceFile
Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it. Sequence names will be truncated at first whitespace, if any.- Parameters:
path
- the reference sequence file on disk
-
getReferenceSequenceFile
public static ReferenceSequenceFile getReferenceSequenceFile(Path path, boolean truncateNamesAtWhitespace) Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it.- Parameters:
path
- the reference sequence file on disktruncateNamesAtWhitespace
- if true, only include the first word of the sequence name
-
getReferenceSequenceFile
public static ReferenceSequenceFile getReferenceSequenceFile(Path path, boolean truncateNamesAtWhitespace, boolean preferIndexed) Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it.- Parameters:
path
- the reference sequence file pathtruncateNamesAtWhitespace
- if true, only include the first word of the sequence namepreferIndexed
- if true attempt to return an indexed reader that supports non-linear traversal, else return the non-indexed reader
-
canCreateIndexedFastaReader
Checks if the provided FASTA file can be open as indexed.For a FASTA file to be indexed, it requires to have:
- Associated .fai index (
FastaSequenceIndex
). - Associated .gzi index if it is block-compressed (
GZIIndex
).
- Parameters:
fastaFile
- the reference sequence file path.- Returns:
true
if the file can be open as indexed;false
otherwise.
- Associated .fai index (
-
getReferenceSequenceFile
public static ReferenceSequenceFile getReferenceSequenceFile(String source, SeekableStream in, FastaSequenceIndex index) Return an instance of ReferenceSequenceFile using the given fasta sequence file stream, optional index stream, and no sequence dictionary- Parameters:
source
- The named source of the reference file (used in error messages).in
- The input stream to read the fasta file from.index
- The index, or null to return a non-indexed reader.
-
getReferenceSequenceFile
public static ReferenceSequenceFile getReferenceSequenceFile(String source, SeekableStream in, FastaSequenceIndex index, SAMSequenceDictionary dictionary, boolean truncateNamesAtWhitespace) Return an instance of ReferenceSequenceFile using the given fasta sequence file stream and optional index stream and sequence dictionary.- Parameters:
source
- The named source of the reference file (used in error messages).in
- The input stream to read the fasta file from.index
- The index, or null to return a non-indexed reader.dictionary
- The sequence dictionary, or null if there isn't one.truncateNamesAtWhitespace
- if true, only include the first word of the sequence name
-
getDefaultDictionaryForReferenceSequence
Returns the default dictionary name for a FASTA file.- Parameters:
file
- the reference sequence file on disk.
-
getDefaultDictionaryForReferenceSequence
Returns the default dictionary name for a FASTA file.- Parameters:
path
- the reference sequence file path.
-
loadDictionary
Loads the sequence dictionary from a FASTA file input stream.- Parameters:
in
- the FASTA file input stream.- Returns:
- the sequence dictionary, or
null
if the header has no dictionary or it was empty.
-
getFastaExtension
Returns the FASTA extension for the path.- Parameters:
path
- the reference sequence file path.- Throws:
IllegalArgumentException
- if the file is not a supported reference file.
-
getFastaIndexFileName
Returns the index name for a FASTA file.- Parameters:
fastaFile
- the reference sequence file path.
-
FileExtensions.FASTA
instead.