All Classes and Interfaces
Class
Description
Factory class to get Providers for substitution matrices that are provided by
the AAINDEX database.
Title: ABITrace
The details of a Compound
A feature is currently any descriptive item that can be associated with a sequence position(s)
A feature has a type and a source which is currently a string to allow flexibility for the user
Ideally well defined features should have a class to describe attributes of that feature
Base abstraction of a location which encodes for the majority of important
features about a location such as the start, end and strand
The base class for DNA, RNA and Protein sequences.
A location which is bound to an AccessionID.
Indicates an entity is accessioned
Used in Sequences as the unique identifier.
Defines a data structure for a
Sequence
within an alignment.Defines an alignment step in order to pass alignment information from an
Aligner
to a constructor.Ambiguity set for hybrid DNA/RNA sequences.
Used to describe an Amino Acid.
Set of proteinogenic amino acids.
Stores a Sequence as a collection of compounds in an ArrayList
Bare bones version of the Sequence object to be used sparingly.
An implementation of the popular bit encodings.
The logic of working with a bit has been separated out into this class
to help developers create the bit data structures without having to
put the code into an intermediate format and to also use the format
without the need to copy this code.
Designed by Paolo Pavan.
Designed by Paolo Pavan.
This class models a Blast/Blast plus result.
Designed by Paolo Pavan.
Designed by Paolo Pavan.
Re-designed by Paolo Pavan on the footprint of:
org.biojava.nbio.genome.query.BlastXMLQuery by Scooter Willis
You may want to find my contacts on Github and LinkedIn for code info
or discuss major changes.
Need to keep track of actual bytes read and take advantage of buffered reader
performance.
Attempts to wrap compounds so it is possible to view them
in a case insensitive manner
A sequence creator which preserves the case of its input string in
the user collection of the returned ProteinSequence.
Represents a exon or coding sequence in a gene.
A ChromosomeSequence is a DNASequence but keeps track of geneSequences
This object represents a classpath resource on the local system.
Define a codon
For a given sequence this class will create a view over the top of it
and for every request the code will return the complement of the underlying
base e.g.
Static utility to easily share a thread pool for concurrent/parallel/lazy execution.
Utility class that calculates a CRC64 checksum on a stream of bytes.
If a SequenceProxyReader implements this interface then that external source
has a list of cross reference id(s)
GenBank gi|gi-number|gb|accession|locus
ENA Data Library gi|gi-number|emb|accession|locus
DDBJ, DNA Database of Japan gi|gi-number|dbj|accession|locus
NBRF PIR pir||entry
Protein Research Foundation prf||name
SWISS-PROT UNIPROT sp|accession|name
Brookhaven Protein Data Bank (1) pdb|entry|chain
Brookhaven Protein Data Bank (2) entry:chain|PDBID|CHAIN|SEQUENCE
Patents pat|country|number
GenInfo Backbone Id bbs|number
General database identifier gnl|database|identifier
NCBI Reference Sequence ref|accession|locus
Local Sequence identifier lcl|identifier
If you have a uniprot ID then it is possible to get a collection
of other id(s) that the protein is known by.
The default provider for AAINDEX loads substitution matrices from the AAINDEX file in the resources directory
Created by andreas on 8/10/15.
This is class should model the attributes associated with a DNA sequence
The type of DNA sequence
A helper class that allows different ways to read a string and create a DNA sequence.
Performs the first stage of transcription by going from DNA to RNA.
Interface for carrying out edit operations on a Sequence.
Abstract class which defines all edit operations as a call to discover
what 5' and 3' ends of an editing Sequence should be joined together
with a target Sequence.
Implementation which allows for the deletion of bases from a Sequence
Edit implementation which allows us to insert a base at any position
in a Sequence.
Allows for the substitution of bases into an existing Sequence.
This class contains the processed data of embl file
Primary accession number
Sequence version number
Topology: 'circular' or 'linear'
Molecule type
Data class
Taxonomic division
Sequence length
This class should process the data of embl file
this class contains the parsed data of embl file
This class contains the processed data of embl file that
contains the referenceNumber, referenceComment, referencePosition
referenceCrossReference, referenceGroup, referenceAuthor
referenceTitle, referenceLocation
A set of helper methods which return true if the two parameters are
equal to each other.
Sort Exon where it is a little confusing if exons should always be ordered left to right
where a negative stranded gene should go the other direction.
A gene contains a collection of Exon sequences
A Gene sequence has a Positive or Negative Strand where we want to write out to a stream the 5 to 3 prime version.
Use FastaReaderHelper as an example of how to use this class where FastaReaderHelper should be the
primary class used to read Fasta files
Used to parse a stream of a fasta file to get the sequence
The FastaWriter writes a collection of sequences to an outputStream.
The class that should be used to write out fasta file of a sequence collection
It is
DBReferenceInfo
which implements FeatureInterface
.Interface class to handle describing arbitrary features.
If a SequenceProxyReader implements this interface then that external source
has a list features
Models the keywords that are annotated for a protein sequence at Uniprot.
This class is a good example of using the SequenceCreatorInterface where during parsing of the stream
the sequence and the offset index are passed to create a Protein sequence that will be loaded in lazily.
This class is a good example of using the SequenceCreatorInterface where during parsing of the stream
the sequence and the offset index are passed to create a Protein sequence that will be loaded in lazily.
This class is a good example of using the SequenceCreatorInterface where during parsing of the stream
the sequence and the offset index are passed to create a Protein sequence that will be loaded in lazily.
Provides a cache for storing multiple small files in memory.
Four bit encoding of the bit formats.
A four bit per compound implementation of the bit array worker code.
Indicates a way of translating a sequence.
Implementation for resolving fuzzy locations.
Use
GenbankReaderHelper
as an example of how to use this class where GenbankReaderHelper
should be the
primary class used to read Genbank filesFor Genbank format file only.
The class that should be used to write out genbank file of a sequence
collection
We store the original header if the sequence is parsed from a fasta file and will use that exact
sequence if we write out the sequences to a fasta file.
The default fasta header parser where some headers are well defined based on the source
database which allows us to set the source of the protein sequence and the identifier
that can be used in future implementations to load features from external sources
If the user has a custom header with local data then they can create their own implementation
of a FastaHeaderParserInterface
Contains helper methods for generating a HashCode without having to resort to
the commons lang hashcode builders.
This class models a search Hit.
This class models a search Hsp.
A class that provides an InputStream from a File.
A collection of locations which are used whenever we work with INSDC; some
of which could be deprecated (from INSDC's point of view) yet appear
in records.
Used to represent bond locations equivalent to bond(7,8) or bond(7).
Deprecated in INSDC yet still appears; equivalent to the order()
directive except no 5' to 3' ordering is defined.
Deprecated in INSDC; refers to a set of locations of which one
location could be valid e.g.
Used to describe a 5' to 3' ordering but no firm assurance it is correct
Parser for working with INSDC style locations.
Closure interface used when working with
IOUtils#processReader(String)
.Available translations
1 - UNIVERSAL
2 - VERTEBRATE_MITOCHONDRIAL
3 - YEAST_MITOCHONDRIAL
4 - MOLD_MITOCHONDRIAL
5 - INVERTEBRATE_MITOCHONDRIAL
6 - CILIATE_NUCLEAR
9 - ECHINODERM_MITOCHONDRIAL
10 - EUPLOTID_NUCLEAR
11 - BACTERIAL
12 - ALTERNATIVE_YEAST_NUCLEAR
13 - ASCIDIAN_MITOCHONDRIAL
14 - FLATWORM_MITOCHONDRIAL
15 - BLEPHARISMA_MACRONUCLEAR
16 - 2CHLOROPHYCEAN_MITOCHONDRIAL
21 - TREMATODE_MITOCHONDRIAL
23 - SCENEDESMUS_MITOCHONDRIAL
Taken from NCBI with slight modification and put into the classpath resource.
Holds the concept of a codon table from the IUPAC format
This reader actually proxies onto multiple types of sequence in order
to allow a number of sequence objects to act as if they are one sequence.
Defines a minimal data structure for reading and writing a sequence alignment.
List of output formats.
Sets of integers used to represent the location of features on sequence.
Helper methods for use with the Location classes.
Helper methods for use with the Location classes.
Implements a minimal data structure for reading and writing a sequence alignment.
Defines a mutable (editable) data structure for an
AlignedSequence
.Defines a mutable (editable) data structure for a
Profile
.Defines a mutable (editable) data structure for a
ProfilePair
.Defines a mutable (editable) data structure for the results of pairwise sequence alignment.
Created by andreas on 6/17/15.
General abstraction of different parsing errors
The plain fasta header takes everything in the header as a single entity.
Holds a single point part of a location
Used to resolve a position about a point
Implementation of XMLWriter which emits nicely formatted documents
to a PrintWriter.
Defines a data structure for the results of sequence alignment.
List of output formats.
Defines a data structure for the results of the alignment of a pair of
Profile
s.Defines a data structure for a view of sequence alignment.
The representation of a ProteinSequence
Used to create a ProteinSequence from a String to allow for details
about the location of the sequence etc.
DNA Sequences produced by modern sequencers usually have quality informaion
attached to them.
It is common to have a numerical value or values associated with a feature.
This class models a search result.
Designed by Paolo Pavan.
For a given sequence this class will return the base at the reversed
position i.e.
RNASequence where RNACompoundSet are the allowed values
Used to create a RNA sequence
Attempts to do on the fly translation of RNA by not requesting the compounds
until asked.
Takes a
Sequence
of NucleotideCompound
which should represent
an RNA sequence (RNASequence
is good for this) and returns a list of
Sequence
which hold AminoAcidCompound
.The biojava-alignment module represents substitution matrices with short
values.
Designed by Paolo Pavan.
Main interface for defining a collection of Compounds and accessing them
using biological indexes
This is a common method that can be used across multiple storage/proxy implementations to
handle Negative strand and other interesting elements of sequence data.
Used to sort sequences in ascending order of bioBegin property.
This class represents the storage container of a sequence stored in a fasta file where
the initial parsing of the file we store the offset and length of the sequence.
A location in a sequence that keeps a reference to its parent sequence
Provides a set of static methods to be used as static imports when needed
across multiple Sequence implementations but inheritance gets in the way.
A basic sequence iterator which iterates over the given Sequence by
biological index.
A static class that provides optimization hints for memory or performance handling of sequence data.
Defines a data structure for the results of pairwise sequence alignment.
Implements a data structure for a
Sequence
within an alignment.Very basic implementation of the Location interface which defines a series
of simple constructors.
Basic implementation of the Point interface.
Implements a data structure for the results of sequence alignment.
Implements a data structure for the results of the alignment of a pair of
Profile
s.Implements a data structure for the results of pairwise sequence alignment.
Implements a data structure which holds the score (penalty or bonus) given during alignment for the exchange of one
Compound
in a sequence for another.An implementation of the SequenceReader interface which for every
call will return only 1 compound (given to it during construction; a String
is also valid but will require a CompoundSet).
An implementation of a single linkage clusterer
See http://en.wikipedia.org/wiki/Single-linkage_clustering
A in memory cache using soft references.
Used to map the start codon feature on a gene
Used to map the stop codon sequence on a gene
Provides a way of representing the strand of a sequence, location
hit or feature.
A utility class for common
String
manipulation tasks.An example of a ProxySequenceReader that is created from a String.
Defines a data structure which holds the score (penalty or bonus) given during alignment for the exchange of one
Compound
in a sequence for another.Static utility to access substitution matrices that come bundled with BioJava.
Provides a way of separating us from the specific
IUPACParser.IUPACTable
even
though this is the only implementing class for the interface.Class used to hold three nucleotides together and allow for equality
to be assessed in a case insensitive manner.
Instance of a Codon which is 3
NucleotideCompound
s, its
corresponding AminoAcidCompound
and if it is a start or stop codon.A sequence can be associated with a species or Taxonomy ID
A implmentation of AbstractFeature
Used as a way of encapsulating the data structures required to parse DNA to a
Protein sequence.
This class is the way to create a
TranslationEngine
.This is the sequence if you want to go from a gene sequence to a protein sequence.
Thrown from AbstractCompundTranslator
Implementation of the 2bit encoding.
Extension of the BitArrayWorker which provides the 2bit implementation
code.
Uncompresses a single tarred or zipped file, writing output to stdandard out
This class decompresses an input stream containing data compressed with
the unix "compress" utility (LZC, a LZW variant).
Pass in a Uniprot ID and this ProxySequenceReader when passed to a ProteinSequence will get the sequence data and other data elements
associated with the ProteinSequence by Uniprot.
A sliding window view of a sequence which does not implement any
interfaces like
Sequence
because they do not fit how this works.Helper methods to simplify boilerplate XML parsing code for org.w3c.dom XML objects
Simple interface for building XML documents.