Class AbstractVCFCodec

All Implemented Interfaces:
FeatureCodec<VariantContext,LineIterator>, NameAwareCodec
Direct Known Subclasses:
VCF3Codec, VCFCodec

public abstract class AbstractVCFCodec extends AsciiFeatureCodec<VariantContext> implements NameAwareCodec
  • Field Details

    • MAX_ALLELE_SIZE_BEFORE_WARNING

      public static final int MAX_ALLELE_SIZE_BEFORE_WARNING
    • NUM_STANDARD_FIELDS

      protected static final int NUM_STANDARD_FIELDS
      See Also:
    • version

      protected VCFHeaderVersion version
    • alleleMap

      protected Map<String,List<Allele>> alleleMap
    • validate

      public static boolean validate
    • parts

      protected String[] parts
    • genotypeParts

      protected String[] genotypeParts
    • locParts

      protected final String[] locParts
    • filterHash

      protected HashMap<String,List<String>> filterHash
    • name

      protected String name
    • lineNo

      protected int lineNo
    • stringCache

      protected Map<String,String> stringCache
    • warnedAboutNoEqualsForNonFlag

      protected boolean warnedAboutNoEqualsForNonFlag
    • doOnTheFlyModifications

      protected boolean doOnTheFlyModifications
      If true, then we'll magically fix up VCF headers on the fly when we read them in
    • remappedSampleName

      protected String remappedSampleName
      If non-null, we will replace the sample name read from the VCF header with this sample name. This feature works only for single-sample VCFs.
  • Constructor Details

    • AbstractVCFCodec

      protected AbstractVCFCodec()
  • Method Details

    • parseFilters

      protected abstract List<String> parseFilters(String filterString)
      parse the filter string, first checking to see if we already have parsed it in a previous attempt
      Parameters:
      filterString - the string to parse
      Returns:
      a set of the filters applied
    • parseHeaderFromLines

      protected VCFHeader parseHeaderFromLines(List<String> headerStrings, VCFHeaderVersion version)
      create a VCF header from a set of header record lines
      Parameters:
      headerStrings - a list of strings that represent all the ## and # entries
      Returns:
      a VCFHeader object
    • getHeader

      public VCFHeader getHeader()
      Returns:
      the header that was either explicitly set on this codec, or read from the file. May be null. The returned value should not be modified.
    • getVersion

      public VCFHeaderVersion getVersion()
      Returns:
      the version number that was either explicitly set on this codec, or read from the file. May be null.
    • setVCFHeader

      public VCFHeader setVCFHeader(VCFHeader newHeader, VCFHeaderVersion newVersion)
      Explicitly set the VCFHeader on this codec. This will overwrite the header read from the file and the version state stored in this instance; conversely, reading the header from a file will overwrite whatever is set here.
      Parameters:
      newHeader -
      newVersion -
      Returns:
      the actual header for this codec. The returned header may not be identical to the header argument since the header lines may be "repaired" (i.e., rewritten) if doOnTheFlyModifications is set.
      Throws:
      TribbleException - if the requested header version is not compatible with the existing version
    • getAltHeaderLine

      public VCFAltHeaderLine getAltHeaderLine(String headerLineString, VCFHeaderVersion sourceVersion)
      Create and return a VCFAltHeaderLine object from a header line string that conforms to the sourceVersion
      Parameters:
      headerLineString - VCF header line being parsed without the leading "##ALT="
      sourceVersion - the VCF header version derived from which the source was retrieved. The resulting header line object should be validate for this header version.
      Returns:
      a VCFAltHeaderLine object
    • getPedigreeHeaderLine

      public VCFPedigreeHeaderLine getPedigreeHeaderLine(String headerLineString, VCFHeaderVersion sourceVersion)
      Create and return a VCFPedigreeHeaderLine object from a header line string that conforms to the sourceVersion
      Parameters:
      headerLineString - VCF header line being parsed without the leading "##PEDIGREE="
      sourceVersion - the VCF header version derived from which the source was retrieved. The resulting header line object should be validate for this header version.
      Returns:
      a VCFPedigreeHeaderLine object
    • getMetaHeaderLine

      public VCFMetaHeaderLine getMetaHeaderLine(String headerLineString, VCFHeaderVersion sourceVersion)
      Create and return a VCFMetaHeaderLine object from a header line string that conforms to the sourceVersion
      Parameters:
      headerLineString - VCF header line being parsed without the leading "##META="
      sourceVersion - the VCF header version derived from which the source was retrieved. The resulting header line object should be validate for this header version.
      Returns:
      a VCFMetaHeaderLine object
    • getSampleHeaderLine

      public VCFSampleHeaderLine getSampleHeaderLine(String headerLineString, VCFHeaderVersion sourceVersion)
      Create and return a VCFSampleHeaderLine object from a header line string that conforms to the sourceVersion
      Parameters:
      headerLineString - VCF header line being parsed without the leading "##SAMPLE="
      sourceVersion - the VCF header version derived from which the source was retrieved. The resulting header line object should be validate for this header version.
      Returns:
      a VCFSampleHeaderLine object
    • decodeLoc

      public Feature decodeLoc(String line)
      the fast decode function
      Parameters:
      line - the line of text for the record
      Returns:
      a feature, (not guaranteed complete) that has the correct start and stop
    • decode

      public VariantContext decode(String line)
      decode the line into a feature (VariantContext)
      Specified by:
      decode in class AsciiFeatureCodec<VariantContext>
      Parameters:
      line - the line
      Returns:
      a VariantContext
      See Also:
    • getName

      public String getName()
      get the name of this codec
      Specified by:
      getName in interface NameAwareCodec
      Returns:
      our set name
    • setName

      public void setName(String name)
      set the name of this codec
      Specified by:
      setName in interface NameAwareCodec
      Parameters:
      name - new name
    • getCachedString

      protected String getCachedString(String str)
      Return a cached copy of the supplied string.
      Parameters:
      str - string
      Returns:
      interned string
    • oneAllele

      protected static Allele oneAllele(String index, List<Allele> alleles)
      create a an allele from an index and an array of alleles
      Parameters:
      index - the index
      alleles - the alleles
      Returns:
      an Allele
    • parseGenotypeAlleles

      protected static List<Allele> parseGenotypeAlleles(String GT, List<Allele> alleles, Map<String,List<Allele>> cache)
      parse genotype alleles from the genotype string
      Parameters:
      GT - GT string
      alleles - list of possible alleles
      cache - cache of alleles for GT
      Returns:
      the allele list for the GT string
    • parseQual

      protected static Double parseQual(String qualString)
      parse out the qual value
      Parameters:
      qualString - the quality string
      Returns:
      return a double
    • parseAlleles

      protected static List<Allele> parseAlleles(String ref, String alts, int lineNo)
      parse out the alleles
      Parameters:
      ref - the reference base
      alts - a string of alternates to break into alleles
      lineNo - the line number for this record
      Returns:
      a list of alleles, and a pair of the shortest and longest sequence
    • canDecodeFile

      public static boolean canDecodeFile(String potentialInput, String MAGIC_HEADER_LINE)
    • createGenotypeMap

      public LazyGenotypesContext.LazyData createGenotypeMap(String str, List<Allele> alleles, String chr, int pos)
      create a genotype map
      Parameters:
      str - the string
      alleles - the list of alleles
      Returns:
      a mapping of sample name to genotype object
    • disableOnTheFlyModifications

      public final void disableOnTheFlyModifications()
      Forces all VCFCodecs to not perform any on the fly modifications to the VCF header of VCF records. Useful primarily for raw comparisons such as when comparing raw VCF records
    • setRemappedSampleName

      public void setRemappedSampleName(String remappedSampleName)
      Replaces the sample name read from the VCF header with the remappedSampleName. Works only for single-sample VCFs -- attempting to perform sample name remapping for multi-sample VCFs will produce an Exception.
      Parameters:
      remappedSampleName - replacement sample name for the sample specified in the VCF header
    • generateException

      protected void generateException(String message)
    • generateException

      protected static void generateException(String message, int lineNo)
    • getTabixFormat

      public TabixFormat getTabixFormat()
      Description copied from interface: FeatureCodec
      Define the tabix format for the feature, used for indexing. Default implementation throws an exception. Note that only AsciiFeatureCodec could read tabix files as defined in AbstractFeatureReader.getFeatureReader(String, String, FeatureCodec, boolean, java.util.function.Function, java.util.function.Function)
      Specified by:
      getTabixFormat in interface FeatureCodec<VariantContext,LineIterator>
      Returns:
      the format to use with tabix