Tribble
Codec to read data from a GENCODE GTF file.
GENCODE GTF Files are defined here: https://www.gencodegenes.org/pages/data_format.html
This codec will scan through a GENCODE GTF file and return
GencodeGtfFeature
objects.
GencodeGtfFeature
objects contain fields that have sub-features. All features are
grouped by gene (this is the natural formatting of a GENCODE GTF file).
All fields exist in the Abstract
GencodeGtfFeature
. The subclasses contain representations of the logical
data hierarchy that reflect how the data were presented in the feature file itself (to preserve the natural
grouping by gene).
The
GencodeGtfFeature
logical data hierarchy (NOT the class hierarchy) is as follows
(with | representing a "has a" relationship)
+-->
GencodeGtfGeneFeature
|
+-->
GencodeGtfTranscriptFeature
|
+-->
GencodeGtfSelenocysteineFeature
+-->
GencodeGtfUTRFeature
+-->
GencodeGtfExonFeature
|
+-->
GencodeGtfCDSFeature
+-->
GencodeGtfStartCodonFeature
+-->
GencodeGtfStopCodonFeature
Tribble
indexing has been tested and works as expected.
Does not support
TabixIndex
indexing.
Unlike many other
Tribble
codecs, this one scans multiple input file lines to produce
a single feature. This is due to how GENCODE GTF files are structured (essentially grouped by contig and gene).
For this reason,
GencodeGtfCodec
inherits from
AbstractFeatureCodec
, as opposed to
AsciiFeatureCodec
(i.e.
AsciiFeatureCodec
s read a single line at a time, and
AbstractFeatureCodec
do not have that explicit purpose).
Created by jonn on 7/21/17.