Class Gff3Codec

All Implemented Interfaces:
FeatureCodec<Gff3Feature,LineIterator>

public class Gff3Codec extends AbstractFeatureCodec<Gff3Feature,LineIterator>
Codec for parsing Gff3 files, as defined in https://github.com/The-Sequence-Ontology/Specifications/blob/31f62ad469b31769b43af42e0903448db1826925/gff3.md Note that while spec states that all feature types must be defined in sequence ontology, this implementation makes no check on feature types, and allows any string as feature type Each feature line in the Gff3 file will be emitted as a separate feature. Features linked together through the "Parent" attribute will be linked through Gff3Feature.getParents(), Gff3Feature.getChildren(), Gff3Feature.getAncestors(), Gff3Feature.getDescendents(), amd Gff3Feature.flatten(). This linking is not guaranteed to be comprehensive when the file is read for only features overlapping a particular region, using a tribble index. In this case, a particular feature will only be linked to the subgroup of features it is linked to in the input file which overlap the given region.
  • Constructor Details

    • Gff3Codec

      public Gff3Codec()
    • Gff3Codec

      public Gff3Codec(Gff3Codec.DecodeDepth decodeDepth)
    • Gff3Codec

      public Gff3Codec(Gff3Codec.DecodeDepth decodeDepth, Predicate<String> filterOutAttribute)
      Parameters:
      decodeDepth - a value from DecodeDepth
      filterOutAttribute - filter to remove keys from the EXTRA_FIELDS column
  • Method Details

    • decode

      public Gff3Feature decode(LineIterator lineIterator) throws IOException
      Description copied from interface: FeatureCodec
      Decode a single Feature from the FeatureCodec, reading no further in the underlying source than beyond that feature.
      Parameters:
      lineIterator - the input stream from which to decode the next record
      Returns:
      Return the Feature encoded by the line, or null if the line does not represent a feature (e.g. is a comment)
      Throws:
      IOException
    • getSequenceRegions

      public List<SequenceRegion> getSequenceRegions()
      Get list of sequence regions parsed by the codec.
      Returns:
      list of sequence regions
    • getCommentsWithLineNumbers

      public Map<Integer,String> getCommentsWithLineNumbers()
      Gets map from line number to comment found on that line. The text of the comment EXCLUDES the leading # which indicates a comment line.
      Returns:
      Map from line number to comment found on line
    • getCommentTexts

      public List<String> getCommentTexts()
      Gets list of comments parsed by the codec. Excludes leading # which indicates a comment line.
      Returns:
    • decodeLoc

      public Feature decodeLoc(LineIterator lineIterator) throws IOException
      Description copied from interface: FeatureCodec
      Decode a line to obtain just its FeatureLoc for indexing -- contig, start, and stop.
      Specified by:
      decodeLoc in interface FeatureCodec<Gff3Feature,LineIterator>
      Overrides:
      decodeLoc in class AbstractFeatureCodec<Gff3Feature,LineIterator>
      Parameters:
      lineIterator - the input stream from which to decode the next record
      Returns:
      Return the FeatureLoc encoded by the line, or null if the line does not represent a feature (e.g. is a comment)
      Throws:
      IOException
    • canDecode

      public boolean canDecode(String inputFilePath)
      Description copied from interface: FeatureCodec

      This function returns true iff the File potentialInput can be parsed by this codec. Note that checking the file's extension is a perfectly acceptable implementation of this method and file contents only rarely need to be checked.

      There is an assumption that there's never a situation where two different Codecs return true for the same file. If this occurs, the recommendation would be to error out.

      Note this function must never throw an error. All errors should be trapped and false returned.
      Parameters:
      inputFilePath - the file to test for parsability with this codec
      Returns:
      true if potentialInput can be parsed, false otherwise
    • readHeader

      public FeatureCodecHeader readHeader(LineIterator lineIterator)
      Description copied from interface: FeatureCodec
      Read and return the header, or null if there is no header. Note: Implementers of this method must be careful to read exactly as much from FeatureCodec as needed to parse the header, and no more. Otherwise, data that might otherwise be fed into parsing a Feature may be lost.
      Parameters:
      lineIterator - the source from which to decode the header
      Returns:
      header object
    • makeSourceFromStream

      public LineIterator makeSourceFromStream(InputStream bufferedInputStream)
      Description copied from interface: FeatureCodec
      Generates a reader of type FeatureCodec appropriate for use by this codec from the generic input stream. Implementers should assume the stream is buffered.
    • makeIndexableSourceFromStream

      public LocationAware makeIndexableSourceFromStream(InputStream bufferedInputStream)
      Description copied from interface: FeatureCodec
      Return a FeatureCodec for this FeatureCodec that implements LocationAware, and is thus suitable for use during indexing. Like FeatureCodec.makeSourceFromStream(java.io.InputStream), except the LocationAware compatibility is required for creating indexes.

      Implementers of this method must return a type that is both LocationAware as well as FeatureCodec. Note that this requirement cannot be enforced via the method signature due to limitations in Java's generic typing system. Instead, consumers should cast the call result into a FeatureCodec when applicable.

      NOTE: During the indexing process, the indexer passes the FeatureCodec to the codec to consume Features from the underlying FeatureCodec, one at a time, recording the Feature location via the FeatureCodec's LocationAware interface. Therefore, it is essential that the FeatureCodec implementation, the FeatureCodec.readHeader(SOURCE) method, and the FeatureCodec.decodeLoc(SOURCE) method, which are used during indexing, not introduce any buffering that would that would advance the FeatureCodec more than a single feature (or the more than the size of the header, in the case of FeatureCodec.readHeader(SOURCE)).
    • isDone

      public boolean isDone(LineIterator lineIterator)
      Description copied from interface: FeatureCodec
      Adapter method that assesses whether the provided FeatureCodec has more data. True if it does, false otherwise.
    • close

      public void close(LineIterator lineIterator)
      Description copied from interface: FeatureCodec
      Adapter method that closes the provided FeatureCodec.
    • getTabixFormat

      public TabixFormat getTabixFormat()
      Description copied from interface: FeatureCodec
      Define the tabix format for the feature, used for indexing. Default implementation throws an exception. Note that only AsciiFeatureCodec could read tabix files as defined in AbstractFeatureReader.getFeatureReader(String, String, FeatureCodec, boolean, java.util.function.Function, java.util.function.Function)
      Returns:
      the format to use with tabix