java.lang.Object

htsjdk.variant.bcf2.BCF2Utils

public final class BCF2Utils extends Object

Common utilities for working with BCF2 files Includes convenience methods for encoding, decoding BCF2 type descriptors (size + type)

Since:: 5/12

Field Summary

Fields

Modifier and Type

Field

Description

static final BCF2Type[]

ID_TO_ENUM

static final BCF2Type[]

INTEGER_TYPES_BY_SIZE

static final int

MAX_ALLELES_IN_GENOTYPES

static final int

MAX_INLINE_ELEMENTS

static final int

OVERFLOW_ELEMENT_MARKER
Method Summary

Modifier and Type

Method

Description

static String

collapseStringList(List<String> strings)

Collapse multiple strings into a comma separated list ["s1", "s2", "s3"] => ",s1,s2,s3"

static int

decodeSize(byte typeDescriptor)

static BCF2Type

decodeType(byte typeDescriptor)

static int

decodeTypeID(byte typeDescriptor)

static BCF2Type

determineIntegerType(int value)

static BCF2Type

determineIntegerType(int[] values)

static BCF2Type

determineIntegerType(List<Integer> values)

static byte

encodeTypeDescriptor(int nElements, BCF2Type type)

static List<String>

explodeStringList(String collapsed)

Inverse operation of collapseStringList.

static boolean

headerLinesAreOrderedConsistently(VCFHeader outputHeader, VCFHeader genotypesBlockHeader)

Are the elements and their order in the output and input headers consistent so that we can write out the raw genotypes block without decoding and recoding it? If the order of INFO, FILTER, or contrig elements in the output header is different than in the input header we must decode the blocks using the input header and then recode them based on the new output order.

static boolean

isCollapsedString(String s)

static ArrayList<String>

makeDictionary(VCFHeader header)

Create a strings dictionary from the VCF header The dictionary is an ordered list of common VCF identifers (FILTER, INFO, and FORMAT) fields.

static BCF2Type

maxIntegerType(BCF2Type t1, BCF2Type t2)

Returns the maximum BCF2 integer size of t1 and t2 For example, if t1 == INT8 and t2 == INT16 returns INT16

static byte

readByte(InputStream stream)

static final File

shadowBCF(File vcfFile)

Returns a good name for a shadow BCF file for vcfFile.

static boolean

sizeIsOverflow(byte typeDescriptor)

static <T> List<T>

toList(Class<T> c, Object o)

Helper function that takes an object and returns a list representation of it: o == null => [] o is a list => o else => [o]

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- MAX_ALLELES_IN_GENOTYPES
  
  public static final int MAX_ALLELES_IN_GENOTYPES
  See Also:
  
  Constant Field Values
- OVERFLOW_ELEMENT_MARKER
  
  public static final int OVERFLOW_ELEMENT_MARKER
  See Also:
  
  Constant Field Values
- MAX_INLINE_ELEMENTS
  
  public static final int MAX_INLINE_ELEMENTS
  See Also:
  
  Constant Field Values
- INTEGER_TYPES_BY_SIZE
  
  public static final BCF2Type[] INTEGER_TYPES_BY_SIZE
- ID_TO_ENUM
  
  public static final BCF2Type[] ID_TO_ENUM
Method Details
- makeDictionary
  
  public static ArrayList<String> makeDictionary(VCFHeader header)
  
  Create a strings dictionary from the VCF header The dictionary is an ordered list of common VCF identifers (FILTER, INFO, and FORMAT) fields. Note that its critical that the list be dedupped and sorted in a consistent manner each time, as the BCF2 offsets are encoded relative to this dictionary, and if it isn't determined exactly the same way as in the header each time it's very bad
  
  Parameters:
  
  header - the VCFHeader from which to build the dictionary
  
  Returns:
  
  a non-null dictionary of elements, may be empty
- encodeTypeDescriptor
  
  public static byte encodeTypeDescriptor(int nElements, BCF2Type type)
- decodeSize
  
  public static int decodeSize(byte typeDescriptor)
- decodeTypeID
  
  public static int decodeTypeID(byte typeDescriptor)
- decodeType
  
  public static BCF2Type decodeType(byte typeDescriptor)
- sizeIsOverflow
  
  public static boolean sizeIsOverflow(byte typeDescriptor)
- readByte
  
  public static byte readByte(InputStream stream) throws IOException
  
  Throws:
  
  IOException
- collapseStringList
  
  public static String collapseStringList(List<String> strings)
  
  Collapse multiple strings into a comma separated list ["s1", "s2", "s3"] => ",s1,s2,s3"
  
  Parameters:
  
  strings - size > 1 list of strings
  
  Returns:
- explodeStringList
  
  public static List<String> explodeStringList(String collapsed)
  
  Inverse operation of collapseStringList. ",s1,s2,s3" => ["s1", "s2", "s3"]
  
  Parameters:
  
  collapsed -
  
  Returns:
- isCollapsedString
  
  public static boolean isCollapsedString(String s)
- shadowBCF
  
  public static final File shadowBCF(File vcfFile)
  
  Returns a good name for a shadow BCF file for vcfFile. foo.vcf => foo.bcf foo.xxx => foo.xxx.bcf If the resulting BCF file cannot be written, return null. Happens when vcfFile = /dev/null for example
  
  Parameters:
  
  vcfFile -
  
  Returns:
  
  the BCF
- determineIntegerType
  
  public static BCF2Type determineIntegerType(int value)
- determineIntegerType
  
  public static BCF2Type determineIntegerType(int[] values)
- maxIntegerType
  
  public static BCF2Type maxIntegerType(BCF2Type t1, BCF2Type t2)
  
  Returns the maximum BCF2 integer size of t1 and t2 For example, if t1 == INT8 and t2 == INT16 returns INT16
  
  Parameters:
  
  t1 -
  
  t2 -
  
  Returns:
- determineIntegerType
  
  public static BCF2Type determineIntegerType(List<Integer> values)
- toList
  
  public static <T> List<T> toList(Class<T> c, Object o)
  
  Helper function that takes an object and returns a list representation of it: o == null => [] o is a list => o else => [o]
  
  Parameters:
  
  c - the class of the object
  
  o - the object to convert to a Java List
  
  Returns:
- headerLinesAreOrderedConsistently
  
  public static boolean headerLinesAreOrderedConsistently(VCFHeader outputHeader, VCFHeader genotypesBlockHeader)
  
  Are the elements and their order in the output and input headers consistent so that we can write out the raw genotypes block without decoding and recoding it? If the order of INFO, FILTER, or contrig elements in the output header is different than in the input header we must decode the blocks using the input header and then recode them based on the new output order. If they are consistent, we can simply pass through the raw genotypes block bytes, which is a *huge* performance win for large blocks. Many common operations on BCF2 files (merging them for -nt, selecting a subset of records, etc) don't modify the ordering of the header fields and so can safely pass through the genotypes undecoded. Some operations -- those at add filters or info fields -- can change the ordering of the header fields and so produce invalid BCF2 files if the genotypes aren't decoded

Class BCF2Utils

Field Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

MAX_ALLELES_IN_GENOTYPES

OVERFLOW_ELEMENT_MARKER

MAX_INLINE_ELEMENTS

INTEGER_TYPES_BY_SIZE

ID_TO_ENUM

Method Details

makeDictionary

encodeTypeDescriptor

decodeSize

decodeTypeID

decodeType

sizeIsOverflow

readByte

collapseStringList

explodeStringList

isCollapsedString

shadowBCF

determineIntegerType

determineIntegerType

maxIntegerType

determineIntegerType

toList

headerLinesAreOrderedConsistently