Class VCFCodec
- All Implemented Interfaces:
FeatureCodec<VariantContext,
,LineIterator> NameAwareCodec
VCF is a text file format (most likely stored in a compressed manner). It contains meta-information lines, a header line, and then data lines each containing information about a position in the genome.
One of the main uses of next-generation sequencing is to discover variation amongst large populations of related samples. Recently the format for storing next-generation read alignments has been standardised by the SAM/BAM file format specification. This has significantly improved the interoperability of next-generation tools for alignment, visualisation, and variant calling. We propose the Variant Call Format (VCF) as a standarised format for storing the most prevalent types of sequence variation, including SNPs, indels and larger structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP, or the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging and comparing, and also provides a general Perl and Python API. The VCF specification and VCFtools are available from http://vcftools.sourceforge.net.
See also: @see VCF specification
See also: @see VCF spec. publication
File format example
##fileformat=VCFv4.0 #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA12878 chr1 109 . A T 0 PASS AC=1 GT:AD:DP:GL:GQ 0/1:610,327:308:-316.30,-95.47,-803.03:99 chr1 147 . C A 0 PASS AC=1 GT:AD:DP:GL:GQ 0/1:294,49:118:-57.87,-34.96,-338.46:99
- Since:
- 2010
-
Field Summary
Fields inherited from class htsjdk.variant.vcf.AbstractVCFCodec
alleleMap, doOnTheFlyModifications, filterHash, genotypeParts, header, lineNo, locParts, MAX_ALLELE_SIZE_BEFORE_WARNING, name, NUM_STANDARD_FIELDS, parts, remappedSampleName, stringCache, validate, version, warnedAboutNoEqualsForNonFlag
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionboolean
This function returns true iff the File potentialInput can be parsed by this codec.parseFilters
(String filterString) parse the filter string, first checking to see if we already have parsed it in a previous attemptreadActualHeader
(LineIterator lineIterator) Reads all of the header from the provided iterator, but no reads no further.Methods inherited from class htsjdk.variant.vcf.AbstractVCFCodec
canDecodeFile, createGenotypeMap, decode, decodeLoc, disableOnTheFlyModifications, generateException, generateException, getAltHeaderLine, getCachedString, getHeader, getMetaHeaderLine, getName, getPedigreeHeaderLine, getSampleHeaderLine, getTabixFormat, getVersion, oneAllele, parseAlleles, parseGenotypeAlleles, parseHeaderFromLines, parseQual, setName, setRemappedSampleName, setVCFHeader
Methods inherited from class htsjdk.tribble.AsciiFeatureCodec
close, decode, isDone, makeIndexableSourceFromStream, makeSourceFromStream, readHeader
Methods inherited from class htsjdk.tribble.AbstractFeatureCodec
decodeLoc, getFeatureType
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface htsjdk.tribble.FeatureCodec
getPathToDataFile
-
Field Details
-
VCF4_MAGIC_HEADER
- See Also:
-
-
Constructor Details
-
VCFCodec
public VCFCodec()
-
-
Method Details
-
readActualHeader
Reads all of the header from the provided iterator, but no reads no further.- Specified by:
readActualHeader
in classAsciiFeatureCodec<VariantContext>
- Parameters:
lineIterator
- the line reader to take header lines from- Returns:
- The parsed header
-
parseFilters
parse the filter string, first checking to see if we already have parsed it in a previous attempt- Specified by:
parseFilters
in classAbstractVCFCodec
- Parameters:
filterString
- the string to parse- Returns:
- a set of the filters applied or null if filters were not applied to the record (e.g. as per the missing value in a VCF)
-
canDecode
Description copied from interface:FeatureCodec
This function returns true iff the File potentialInput can be parsed by this codec. Note that checking the file's extension is a perfectly acceptable implementation of this method and file contents only rarely need to be checked.
There is an assumption that there's never a situation where two different Codecs return true for the same file. If this occurs, the recommendation would be to error out.
Note this function must never throw an error. All errors should be trapped and false returned.- Parameters:
potentialInput
- the file to test for parsability with this codec- Returns:
- true if potentialInput can be parsed, false otherwise
-