public class VariantContext extends java.lang.Object implements HtsRecord, Feature, java.io.Serializable
The class system works by defining segregating alleles, creating a variant context representing the segregating information at a locus, and potentially creating and associating genotypes with individuals in the context.
All of the classes are highly validating -- call validate()
if you modify them -- so you can rely on the
self-consistency of the data once you have a VariantContext
in hand. The system has a rich set of assessor
and manipulator routines, as well as more complex static support routines in VariantContextUtils
.
The VariantContext
(and Genotype
) objects are attributed (supporting addition of arbitrary key/value pairs) and
filtered (can represent a variation that is viewed as suspect).
VariantContext
s are dynamically typed, so whether a VariantContext
is a SNP, Indel, or NoVariant depends
on the properties of the alleles in the context. See the detailed documentation on the Type
parameter below.
It's also easy to create subcontexts based on selected genotypes.
MutableVariantContext
s and MutableGenotype
s.
Allele A, Aref, T, Tref; Allele del, delRef, ATC, ATCref;
A [ref] / T at 10
GenomeLoc snpLoc = GenomeLocParser.createGenomeLoc("chr1", 10, 10);
A / ATC [ref] from 20-23
GenomeLoc delLoc = GenomeLocParser.createGenomeLoc("chr1", 20, 22);
// A [ref] / ATC immediately after 20
GenomeLoc insLoc = GenomeLocParser.createGenomeLoc("chr1", 20, 20);
Allele
class itself
Alleles can be either reference or non-reference
Examples of alleles used here:
A = new Allele("A"); Aref = new Allele("A", true); T = new Allele("T"); ATC = new Allele("ATC");
VariantContext vc = new VariantContext(name, snpLoc, Arrays.asList(Aref, T));If you want to create a non-variant site, just put in a single reference allele
VariantContext vc = new VariantContext(name, snpLoc, Arrays.asList(Aref));A deletion is just as easy:
VariantContext vc = new VariantContext(name, delLoc, Arrays.asList(ATCref, del));The only thing that distinguishes between an insertion and deletion is which is the reference allele. An insertion has a reference allele that is smaller than the non-reference allele, and vice versa for deletions.
VariantContext vc = new VariantContext("name", insLoc, Arrays.asList(delRef, ATC));
VariantContext
s VariantContextAdaptors.convertToVariantContext(name, myObject)dbSNP and VCFs, for example, can be passed in as
myObject
and a VariantContext
corresponding to that
object will be returned. A null
return value indicates that the type isn't yet supported. This is the best
and easiest way to create contexts using RODs.
List<Allele> alleles = Arrays.asList(Aref, T); Genotype g1 = new Genotype(Arrays.asList(Aref, Aref), "g1", 10); Genotype g2 = new Genotype(Arrays.asList(Aref, T), "g2", 10); Genotype g3 = new Genotype(Arrays.asList(T, T), "g3", 10); VariantContext vc = new VariantContext(snpLoc, alleles, Arrays.asList(g1, g2, g3));At this point we have 3 genotypes in our context, g1-g3. You can assess a good deal of information about the genotypes through the
VariantContext
:
vc.hasGenotypes() vc.isMonomorphicInSamples() vc.isPolymorphicInSamples() vc.getSamples().size() vc.getGenotypes() vc.getGenotypes().get("g1") vc.hasGenotype("g1") vc.getCalledChrCount() vc.getCalledChrCount(Aref) vc.getCalledChrCount(T)
Genotype
s carrying special NO_CALL alleles that aren't present in the
set of context alleles and that represent undetermined alleles in a genotype:
Genotype g4 = new Genotype(Arrays.asList(Allele.NO_CALL, Allele.NO_CALL), "NO_DATA_FOR_SAMPLE", 10);
VariantContext vc12 = vc.subContextFromGenotypes(Arrays.asList(g1,g2)); VariantContext vc1 = vc.subContextFromGenotypes(Arrays.asList(g1));
VariantContext
s support some fields, particularly those
stored as generic attributes, to be of any type. For example, a field AB might
be naturally a floating point number, 0.51, but when it's read into a VC its
not decoded into the Java presentation but left as a string "0.51". A fully
decoded VariantContext
is one where all values have been converted to their
corresponding Java object types, based on the types declared in a VCFHeader
.
The fullyDecode(...)
method takes a header object and creates a new fully decoded VariantContext
where all fields are converted to their true java representation. The VCBuilder
can be told that all fields are fully decoded, in which case no work is done when
asking for a fully decoded version of the VC.
Modifier and Type | Class and Description |
---|---|
static class |
VariantContext.Type |
static class |
VariantContext.Validation |
Modifier and Type | Field and Description |
---|---|
protected java.util.List<Allele> |
alleles
A set of the alleles segregating in this context
|
protected CommonInfo |
commonInfo |
protected java.lang.String |
contig
The location of this VariantContext
|
protected int[] |
genotypeCounts
Counts for each of the possible Genotype types in this context
|
protected GenotypesContext |
genotypes
A mapping from sampleName -> genotype objects for all genotypes associated with this context
|
static GenotypesContext |
NO_GENOTYPES |
static double |
NO_LOG10_PERROR |
static java.util.Set<java.lang.String> |
PASSES_FILTERS |
static long |
serialVersionUID |
protected long |
start |
protected long |
stop |
protected VariantContext.Type |
type
The type (cached for performance reasons) of this context
|
protected VariantContext.Type |
typeIgnoringNonRef
The type of this context, cached separately if ignoreNonRef is true
|
static java.util.regex.Pattern |
VALID_FILTER |
Modifier | Constructor and Description |
---|---|
protected |
VariantContext(java.lang.String source,
java.lang.String ID,
java.lang.String contig,
long start,
long stop,
java.util.Collection<Allele> alleles,
GenotypesContext genotypes,
double log10PError,
java.util.Set<java.lang.String> filters,
java.util.Map<java.lang.String,java.lang.Object> attributes,
boolean fullyDecoded,
java.util.EnumSet<VariantContext.Validation> validationToPerform)
the actual constructor.
|
protected |
VariantContext(VariantContext other)
Copy constructor
|
Modifier and Type | Method and Description |
---|---|
java.util.List<java.lang.String> |
calcVCFGenotypeKeys(VCFHeader header) |
boolean |
emptyID() |
void |
extraStrictValidation(Allele reportedReference,
Allele observedReference,
java.util.Set<java.lang.String> rsIDs)
Run all extra-strict validation tests on a Variant Context object
|
boolean |
filtersWereApplied() |
VariantContext |
fullyDecode(VCFHeader header,
boolean lenientDecoding)
Return a VC equivalent to this one but where all fields are fully decoded
See VariantContext document about fully decoded
|
Allele |
getAllele(byte[] allele) |
Allele |
getAllele(java.lang.String allele) |
int |
getAlleleIndex(Allele allele)
Lookup the index of allele in this variant context
|
java.util.List<java.lang.Integer> |
getAlleleIndices(java.util.Collection<Allele> alleles)
Return the allele index #getAlleleIndex for each allele in alleles
|
java.util.List<Allele> |
getAlleles()
Gets the alleles.
|
Allele |
getAltAlleleWithHighestAlleleCount() |
Allele |
getAlternateAllele(int i) |
java.util.List<Allele> |
getAlternateAlleles()
Gets the alternate alleles.
|
java.lang.Object |
getAttribute(java.lang.String key) |
java.lang.Object |
getAttribute(java.lang.String key,
java.lang.Object defaultValue) |
boolean |
getAttributeAsBoolean(java.lang.String key,
boolean defaultValue) |
double |
getAttributeAsDouble(java.lang.String key,
double defaultValue) |
java.util.List<java.lang.Double> |
getAttributeAsDoubleList(java.lang.String key,
double defaultValue) |
int |
getAttributeAsInt(java.lang.String key,
int defaultValue) |
java.util.List<java.lang.Integer> |
getAttributeAsIntList(java.lang.String key,
int defaultValue) |
java.util.List<java.lang.Object> |
getAttributeAsList(java.lang.String key)
returns the value as an empty list if the key was not found,
as a java.util.List if the value is a List or an Array,
as a Collections.singletonList if there is only one value
|
java.lang.String |
getAttributeAsString(java.lang.String key,
java.lang.String defaultValue) |
java.util.List<java.lang.String> |
getAttributeAsStringList(java.lang.String key,
java.lang.String defaultValue) |
java.util.Map<java.lang.String,java.lang.Object> |
getAttributes() |
int |
getCalledChrCount()
Returns the number of chromosomes carrying any allele in the genotypes (i.e., excluding NO_CALLS)
|
int |
getCalledChrCount(Allele a)
Returns the number of chromosomes carrying allele A in the genotypes
|
int |
getCalledChrCount(Allele a,
java.util.Set<java.lang.String> sampleIds)
Returns the number of chromosomes carrying allele A in the genotypes
|
int |
getCalledChrCount(java.util.Set<java.lang.String> sampleIds)
Returns the number of chromosomes carrying any allele in the genotypes (i.e., excluding NO_CALLS)
|
CommonInfo |
getCommonInfo() |
java.lang.String |
getContig()
Gets the contig name for the contig this is mapped to.
|
int |
getEnd() |
java.util.Set<java.lang.String> |
getFilters() |
java.util.Set<java.lang.String> |
getFiltersMaybeNull() |
Genotype |
getGenotype(int ith) |
Genotype |
getGenotype(java.lang.String sample) |
GenotypesContext |
getGenotypes() |
protected GenotypesContext |
getGenotypes(java.util.Collection<java.lang.String> sampleNames)
Returns a map from sampleName -> Genotype for each sampleName in sampleNames.
|
GenotypesContext |
getGenotypes(java.util.Set<java.lang.String> sampleNames) |
GenotypesContext |
getGenotypes(java.lang.String sampleName)
Returns a map from sampleName -> Genotype for the genotype associated with sampleName.
|
java.lang.Iterable<Genotype> |
getGenotypesOrderedBy(java.lang.Iterable<java.lang.String> sampleOrdering) |
java.lang.Iterable<Genotype> |
getGenotypesOrderedByName() |
int[] |
getGLIndecesOfAlternateAllele(Allele targetAllele)
Deprecated.
7/18 use
getGLIndicesOfAlternateAllele(Allele) instead |
int[] |
getGLIndicesOfAlternateAllele(Allele targetAllele) |
int |
getHetCount()
Genotype-specific functions -- how many het calls are there in the genotypes?
|
int |
getHomRefCount()
Genotype-specific functions -- how many hom ref calls are there in the genotypes?
|
int |
getHomVarCount()
Genotype-specific functions -- how many hom var calls are there in the genotypes?
|
java.lang.String |
getID() |
java.util.List<java.lang.Integer> |
getIndelLengths()
Gets the sizes of the alternate alleles if they are insertion/deletion events, and returns a list of their sizes
|
double |
getLog10PError() |
int |
getMaxPloidy(int defaultPloidy)
Returns the maximum ploidy of all samples in this VC, or default if there are no genotypes
This function is caching, so it's only expensive on the first call
|
int |
getMixedCount()
Genotype-specific functions -- how many mixed calls are there in the genotypes?
|
int |
getNAlleles() |
int |
getNoCallCount()
Genotype-specific functions -- how many no-calls are there in the genotypes?
|
int |
getNSamples() |
double |
getPhredScaledQual() |
Allele |
getReference() |
java.util.Set<java.lang.String> |
getSampleNames() |
java.util.List<java.lang.String> |
getSampleNamesOrderedByName() |
java.lang.String |
getSource() |
int |
getStart()
Returns 1-based inclusive start position of the variant.
|
StructuralVariantType |
getStructuralVariantType()
Search for the INFO=SVTYPE and return the type of Structural Variant
|
VariantContext.Type |
getType()
Determines (if necessary) and returns the type of this variation by examining the alleles it contains.
|
VariantContext.Type |
getType(boolean ignoreNonRef)
Determines (if necessary) and returns the type of this variation by examining the alleles it contains.
|
boolean |
hasAllele(Allele allele) |
boolean |
hasAllele(Allele allele,
boolean ignoreRefState) |
boolean |
hasAlternateAllele(Allele allele) |
boolean |
hasAlternateAllele(Allele allele,
boolean ignoreRefState) |
boolean |
hasAttribute(java.lang.String key) |
boolean |
hasGenotype(java.lang.String sample) |
boolean |
hasGenotypes() |
boolean |
hasGenotypes(java.util.Collection<java.lang.String> sampleNames) |
boolean |
hasID() |
boolean |
hasLog10PError() |
boolean |
hasSameAllelesAs(VariantContext other) |
boolean |
hasSameAlternateAllelesAs(VariantContext other) |
boolean |
hasSymbolicAlleles() |
static boolean |
hasSymbolicAlleles(java.util.List<Allele> alleles) |
boolean |
isBiallelic() |
boolean |
isComplexIndel() |
boolean |
isFiltered() |
boolean |
isFullyDecoded()
See VariantContext document about fully decoded
|
boolean |
isIndel()
convenience method for indels
|
boolean |
isMixed()
convenience method for indels
|
boolean |
isMNP() |
boolean |
isMonomorphicInSamples()
Genotype-specific functions -- are the genotypes monomorphic w.r.t.
|
boolean |
isNotFiltered() |
boolean |
isPointEvent()
convenience method for point events
|
boolean |
isPolymorphicInSamples()
Genotype-specific functions -- are the genotypes polymorphic w.r.t.
|
boolean |
isReferenceBlock() |
boolean |
isSimpleDeletion() |
boolean |
isSimpleIndel() |
boolean |
isSimpleInsertion() |
boolean |
isSNP()
convenience method for SNPs
|
boolean |
isStructuralIndel() |
boolean |
isSymbolic() |
boolean |
isSymbolicOrSV() |
boolean |
isVariant()
convenience method for variants
|
VariantContext |
subContextFromSample(java.lang.String sampleName) |
VariantContext |
subContextFromSamples(java.util.Set<java.lang.String> sampleNames) |
VariantContext |
subContextFromSamples(java.util.Set<java.lang.String> sampleNames,
boolean rederiveAllelesFromGenotypes)
This method subsets down to a set of samples.
|
java.lang.String |
toString() |
java.lang.String |
toStringDecodeGenotypes() |
java.lang.String |
toStringWithoutGenotypes() |
void |
validateAlternateAlleles() |
void |
validateChromosomeCounts() |
void |
validateReferenceBases(Allele reportedReference,
Allele observedReference) |
void |
validateRSIDs(java.util.Set<java.lang.String> rsIDs) |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
contains, contigsMatch, getLengthOnReference, overlaps, withinDistanceOf
public static final long serialVersionUID
protected CommonInfo commonInfo
public static final double NO_LOG10_PERROR
public static final java.util.Set<java.lang.String> PASSES_FILTERS
protected final java.lang.String contig
protected final long start
protected final long stop
protected VariantContext.Type type
protected VariantContext.Type typeIgnoringNonRef
protected final java.util.List<Allele> alleles
protected GenotypesContext genotypes
protected int[] genotypeCounts
public static final GenotypesContext NO_GENOTYPES
public static final java.util.regex.Pattern VALID_FILTER
protected VariantContext(VariantContext other)
other
- the VariantContext to copyprotected VariantContext(java.lang.String source, java.lang.String ID, java.lang.String contig, long start, long stop, java.util.Collection<Allele> alleles, GenotypesContext genotypes, double log10PError, java.util.Set<java.lang.String> filters, java.util.Map<java.lang.String,java.lang.Object> attributes, boolean fullyDecoded, java.util.EnumSet<VariantContext.Validation> validationToPerform)
source
- sourcecontig
- the contigstart
- the start base (one based)stop
- the stop reference base (one based)alleles
- allelesgenotypes
- genotypes maplog10PError
- qualfilters
- filters: use null for unfiltered and empty set for passes filtersattributes
- attributesvalidationToPerform
- set of validation steps to takepublic java.util.List<java.lang.String> calcVCFGenotypeKeys(VCFHeader header)
public VariantContext subContextFromSamples(java.util.Set<java.lang.String> sampleNames, boolean rederiveAllelesFromGenotypes)
sampleNames
- the sample namesrederiveAllelesFromGenotypes
- if true, returns the alleles to just those in use by the samples, true should be defaultpublic VariantContext subContextFromSamples(java.util.Set<java.lang.String> sampleNames)
sampleNames
- with rederiveAllelesFromGenotypes = true
public VariantContext subContextFromSample(java.lang.String sampleName)
public VariantContext.Type getType()
public VariantContext.Type getType(boolean ignoreNonRef)
ignoreNonRef
- If set to true, symbolic NON_REF alleles will not be considered for the type determination,
which is required for handling GVCF files.public boolean isSNP()
public boolean isVariant()
public boolean isPointEvent()
public boolean isIndel()
public boolean isSimpleInsertion()
public boolean isSimpleDeletion()
public boolean isSimpleIndel()
public boolean isComplexIndel()
public boolean isSymbolic()
public boolean isStructuralIndel()
public boolean isSymbolicOrSV()
public boolean isMNP()
public boolean isMixed()
public boolean hasID()
public boolean emptyID()
public java.lang.String getID()
public java.lang.String getSource()
public java.util.Set<java.lang.String> getFiltersMaybeNull()
public java.util.Set<java.lang.String> getFilters()
public boolean isFiltered()
public boolean isNotFiltered()
public boolean filtersWereApplied()
public boolean hasLog10PError()
public double getLog10PError()
public double getPhredScaledQual()
public java.util.Map<java.lang.String,java.lang.Object> getAttributes()
public boolean hasAttribute(java.lang.String key)
public java.lang.Object getAttribute(java.lang.String key)
public java.lang.Object getAttribute(java.lang.String key, java.lang.Object defaultValue)
public java.lang.String getAttributeAsString(java.lang.String key, java.lang.String defaultValue)
public int getAttributeAsInt(java.lang.String key, int defaultValue)
public double getAttributeAsDouble(java.lang.String key, double defaultValue)
public boolean getAttributeAsBoolean(java.lang.String key, boolean defaultValue)
public java.util.List<java.lang.Object> getAttributeAsList(java.lang.String key)
public java.util.List<java.lang.String> getAttributeAsStringList(java.lang.String key, java.lang.String defaultValue)
public java.util.List<java.lang.Integer> getAttributeAsIntList(java.lang.String key, int defaultValue)
public java.util.List<java.lang.Double> getAttributeAsDoubleList(java.lang.String key, double defaultValue)
public CommonInfo getCommonInfo()
public Allele getReference()
public boolean isBiallelic()
public int getNAlleles()
public int getMaxPloidy(int defaultPloidy)
defaultPloidy
- the default ploidy, if all samples are no-calledpublic Allele getAllele(java.lang.String allele)
public Allele getAllele(byte[] allele)
public boolean hasAllele(Allele allele)
public boolean hasAllele(Allele allele, boolean ignoreRefState)
public boolean hasAlternateAllele(Allele allele)
public boolean hasAlternateAllele(Allele allele, boolean ignoreRefState)
public java.util.List<Allele> getAlleles()
public java.util.List<Allele> getAlternateAlleles()
public java.util.List<java.lang.Integer> getIndelLengths()
public Allele getAlternateAllele(int i)
i
- -- the ith allele (from 0 to n - 2 for a context with n alleles including a reference allele)java.lang.IllegalArgumentException
- if i is invalidpublic boolean hasSameAllelesAs(VariantContext other)
other
- VariantContext whose alleles to compare againstpublic boolean hasSameAlternateAllelesAs(VariantContext other)
other
- VariantContext whose alternate alleles to compare againstpublic int getNSamples()
public boolean hasGenotypes()
public boolean hasGenotypes(java.util.Collection<java.lang.String> sampleNames)
public GenotypesContext getGenotypes()
public java.lang.Iterable<Genotype> getGenotypesOrderedByName()
public java.lang.Iterable<Genotype> getGenotypesOrderedBy(java.lang.Iterable<java.lang.String> sampleOrdering)
public GenotypesContext getGenotypes(java.lang.String sampleName)
sampleName
- the sample namejava.lang.IllegalArgumentException
- if sampleName isn't bound to a genotypeprotected GenotypesContext getGenotypes(java.util.Collection<java.lang.String> sampleNames)
sampleNames
- a unique list of sample namesjava.lang.IllegalArgumentException
- if sampleName isn't bound to a genotypepublic GenotypesContext getGenotypes(java.util.Set<java.lang.String> sampleNames)
public java.util.Set<java.lang.String> getSampleNames()
public java.util.List<java.lang.String> getSampleNamesOrderedByName()
public Genotype getGenotype(java.lang.String sample)
sample
- the sample namepublic boolean hasGenotype(java.lang.String sample)
public Genotype getGenotype(int ith)
ith
- the sample indexpublic int getCalledChrCount()
public int getCalledChrCount(java.util.Set<java.lang.String> sampleIds)
sampleIds
- IDs of samples to take into account. If empty then all samples are included.public int getCalledChrCount(Allele a)
a
- allelepublic int getCalledChrCount(Allele a, java.util.Set<java.lang.String> sampleIds)
a
- allelesampleIds
- - IDs of samples to take into account. If empty then all samples are included.public boolean isMonomorphicInSamples()
public boolean isPolymorphicInSamples()
public int getNoCallCount()
public int getHomRefCount()
public int getHetCount()
public int getHomVarCount()
public int getMixedCount()
public void extraStrictValidation(Allele reportedReference, Allele observedReference, java.util.Set<java.lang.String> rsIDs)
reportedReference
- the reported reference alleleobservedReference
- the observed reference allelersIDs
- the true dbSNP IDspublic void validateReferenceBases(Allele reportedReference, Allele observedReference)
public void validateRSIDs(java.util.Set<java.lang.String> rsIDs)
public void validateAlternateAlleles()
public void validateChromosomeCounts()
public java.lang.String toString()
toString
in class java.lang.Object
public java.lang.String toStringDecodeGenotypes()
public java.lang.String toStringWithoutGenotypes()
public VariantContext fullyDecode(VCFHeader header, boolean lenientDecoding)
header
- containing types about all fields in this VCpublic boolean isFullyDecoded()
public java.lang.String getContig()
Locatable
public int getStart()
INDEL events usually start on the first unaltered reference base before the INDEL.
Warning: be aware that the start position of the VariantContext is defined in terms of the start position specified in the underlying vcf file, VariantContexts representing the same biological event may have different start positions depending on the specifics of the vcf file they are derived from.
Warning:
Note also that the VCF spec allows 0 and N + 1 for POS field for telomeric event,
where N is the length of the chromosome.
The "0" value returned should be interpreted as telomere, and does not violate the above "1-based" comment.
Code consuming the returned start
should be prepared for such out-of-the-ordinary values.
public int getEnd()
getEnd
in interface Locatable
public boolean isReferenceBlock()
public boolean hasSymbolicAlleles()
public static boolean hasSymbolicAlleles(java.util.List<Allele> alleles)
public Allele getAltAlleleWithHighestAlleleCount()
public int getAlleleIndex(Allele allele)
allele
- the allele whose index we want to getpublic java.util.List<java.lang.Integer> getAlleleIndices(java.util.Collection<Allele> alleles)
alleles
- the alleles we want to look up@Deprecated public int[] getGLIndecesOfAlternateAllele(Allele targetAllele)
getGLIndicesOfAlternateAllele(Allele)
insteadpublic int[] getGLIndicesOfAlternateAllele(Allele targetAllele)
public StructuralVariantType getStructuralVariantType()