CrosscheckFingerprints (gatk 4.1.4.0 API)

java.lang.Object
- picard.cmdline.CommandLineProgram
- - picard.fingerprint.CrosscheckFingerprints

Direct Known Subclasses:

CrosscheckReadGroupFingerprints
```
@DocumentedFeature
public class CrosscheckFingerprints
extends CommandLineProgram
```
Checks that all data in the set of input files appear to come from the same individual. Can be used to compare according to readgroups, libraries, samples, or files. Operates on bams/sams and vcfs (including gvcfs).
Summary
Checks if all the genetic data within a set of files appear to come from the same individual. It quickly determines whether a "group's" genotype matches that of an input SAM/BAM/VCF by selective sampling, and has been designed to work well even for low-depth SAM/BAMs.
The tool collects "fingerprints" (essentially genotype information from different parts of the genome) at the finest level available in the data (readgroup for SAM files and sample for VCF files) and then optionally aggregates it by library, sample or file, to increase power and provide results at the desired resolution. Output is in a "Moltenized" format, one row per comparison. The results will be emitted into a metric file for the class CrosscheckMetric. In this format the output will include the LOD score and also tumor-aware LOD score which can help assess identity even in the presence of a severe loss of heterozygosity with high purity (which could otherwise fail to notice that samples are from the same individual.) A matrix output is also available to facilitate visual inspection of crosscheck results.
Since there can be many rows of output in the metric file, we recommend the use of ClusterCrosscheckMetrics as a follow-up step to running CrosscheckFingerprints.
There are cases where one would like to identify a few groups out of a collection of many possible groups (say to link a bam to it's correct sample in a multi-sample vcf. In this case one would not case for the cross-checking of the various samples in the VCF against each other, but only in checking the identity of the bam against the various samples in the vcf. The SECOND_INPUT is provided for this use-case. With SECOND_INPUT provided, CrosscheckFingerprints does the following:
aggregation of data happens independently for the input files in INPUT and SECOND_INPUT.
aggregation of data happens at the SAMPLE level.
each samples from INPUT will only be compared to that same sample in INPUT.
MATRIX_OUTPUT is disabled.

Examples

Check that all the readgroups from a sample match each other:

     java -jar picard.jar CrosscheckFingerprints \
          INPUT=sample.with.many.readgroups.bam \
          HAPLOTYPE_MAP=fingerprinting_haplotype_database.txt \
          LOD_THRESHOLD=-5 \
          OUTPUT=sample.crosscheck_metrics

Check that all the readgroups match as expected when providing reads from two samples from the same individual:

     java -jar picard.jar CrosscheckFingerprints \
          INPUT=sample.one.with.many.readgroups.bam \
          INPUT=sample.two.with.many.readgroups.bam \
          HAPLOTYPE_MAP=fingerprinting_haplotype_database.txt \
          LOD_THRESHOLD=-5 \
          EXPECT_ALL_GROUPS_TO_MATCH=true \
          OUTPUT=sample.crosscheck_metrics

Detailed Explanation

CrosscheckMetric.FingerprintResult

Field Summary

Fields
Modifier and Type	Field and Description
`boolean`	`ALLOW_DUPLICATE_READS`
`boolean`	`CALCULATE_TUMOR_AWARE_RESULTS`
`CrosscheckMetric.DataType`	`CROSSCHECK_BY`
`picard.fingerprint.Fingerprint.CrosscheckMode`	`CROSSCHECK_MODE`
`int`	`EXIT_CODE_WHEN_MISMATCH`
`int`	`EXIT_CODE_WHEN_NO_VALID_CHECKS`
`boolean`	`EXPECT_ALL_GROUPS_TO_MATCH`
`double`	`GENOTYPING_ERROR_RATE`
`java.io.File`	`HAPLOTYPE_MAP`
`java.util.List<java.lang.String>`	`INPUT`
`java.io.File`	`INPUT_SAMPLE_FILE_MAP`
`java.io.File`	`INPUT_SAMPLE_MAP`
`double`	`LOD_THRESHOLD`
`double`	`LOSS_OF_HET_RATE`
`java.io.File`	`MATRIX_OUTPUT`
`int`	`NUM_THREADS`
`java.io.File`	`OUTPUT`
`boolean`	`OUTPUT_ERRORS_ONLY`
`java.util.List<java.lang.String>`	`SECOND_INPUT`
`java.io.File`	`SECOND_INPUT_SAMPLE_MAP`
`boolean`	`TEST_INPUT_READABILITY`

Fields inherited from class picard.cmdline.CommandLineProgram
COMPRESSION_LEVEL, CREATE_INDEX, CREATE_MD5_FILE, GA4GH_CLIENT_SECRETS, MAX_ALLOWABLE_ONE_LINE_SUMMARY_LENGTH, MAX_RECORDS_IN_RAM, QUIET, REFERENCE_SEQUENCE, referenceSequence, specialArgumentsCollection, TMP_DIR, USE_JDK_DEFLATER, USE_JDK_INFLATER, VALIDATION_STRINGENCY, VERBOSITY

Constructor Summary

Constructors
Constructor and Description

CrosscheckFingerprints()

Constructors
Constructor and Description
`CrosscheckFingerprints()`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`protected java.lang.String[]`	`customCommandLineValidation()` Put any custom command-line validation in an override of this method.
`protected int`	`doWork()` Do the work after command line has been parsed.

Methods inherited from class picard.cmdline.CommandLineProgram
getCommandLine, getCommandLineParser, getCommandLineParser, getDefaultHeaders, getFaqLink, getMetricsFile, getStandardUsagePreamble, getStandardUsagePreamble, getVersion, hasWebDocumentation, instanceMain, instanceMainWithExit, makeReferenceArgumentCollection, parseArgs, requiresReference, setDefaultHeaders, useLegacyParser

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

INPUT

@Argument(shortName="I",
          doc="One or more input files (or lists of files) with which to compare fingerprints.",
          minElements=1)
public java.util.List<java.lang.String> INPUT

INPUT_SAMPLE_MAP

@Argument(doc="A tsv with two columns representing the sample as it appears in the INPUT data (in column 1) and the sample as it should be used for comparisons to SECOND_INPUT (in the second column). Need only include the samples that change. Values in column 1 should be unique. Values in column 2 should be unique even in union with the remaining unmapped samples. Should only be used with SECOND_INPUT. ",
          optional=true,
          mutex="INPUT_SAMPLE_FILE_MAP")
public java.io.File INPUT_SAMPLE_MAP

INPUT_SAMPLE_FILE_MAP

@Argument(doc="A tsv with two columns representing the sample as it should be used for comparisons to SECOND_INPUT (in the first column) and  the source file (in INPUT) for the fingerprint (in the second column). Need only to include the samples that change. Values in column 1 should be unique even in union with the remaining unmapped samples. Values in column 2 should be unique in the file. Will error if more than one sample is found in a file (multi-sample vcf) pointed to in column 2. Should only be used in the presence of SECOND_INPUT. ",
          optional=true,
          mutex="INPUT_SAMPLE_MAP")
public java.io.File INPUT_SAMPLE_FILE_MAP

SECOND_INPUT

@Argument(shortName="SI",
          optional=true,
          mutex="MATRIX_OUTPUT",
          doc="A second set of input files (or lists of files) with which to compare fingerprints. If this option is provided the tool compares each sample in INPUT with the sample from SECOND_INPUT that has the same sample ID. In addition, data will be grouped by SAMPLE regardless of the value of CROSSCHECK_BY. When operating in this mode, each sample in INPUT must also have a corresponding sample in SECOND_INPUT. If this is violated, the tool will proceed to check the matching samples, but report the missing samples and return a non-zero error-code.")
public java.util.List<java.lang.String> SECOND_INPUT

SECOND_INPUT_SAMPLE_MAP

@Argument(doc="A tsv with two columns representing the sample as it appears in the SECOND_INPUT data (in column 1) and the sample as it should be used for comparisons to INPUT (in the second column). Note that in case of unrolling files (file-of-filenames) one would need to reference the final file, i.e. the file that contains the genomic data. Need only include the samples that change. Values in column 1 should be unique. Values in column 2 should be unique even in union with the remaining unmapped samples. Should only be used with SECOND_INPUT. ",
          optional=true)
public java.io.File SECOND_INPUT_SAMPLE_MAP

CROSSCHECK_MODE

@Argument(doc="An argument that controls how crosschecking with both INPUT and SECOND_INPUT should occur. ")
public picard.fingerprint.Fingerprint.CrosscheckMode CROSSCHECK_MODE

OUTPUT

@Argument(shortName="O",
          optional=true,
          doc="Optional output file to write metrics to. Default is to write to stdout.")
public java.io.File OUTPUT

MATRIX_OUTPUT

@Argument(shortName="MO",
          optional=true,
          doc="Optional output file to write matrix of LOD scores to. This is less informative than the metrics output and only contains Normal-Normal LOD score (i.e. doesn\'t account for Loss of Heterozygosity). It is however sometimes easier to use visually.",
          mutex="SECOND_INPUT")
public java.io.File MATRIX_OUTPUT

HAPLOTYPE_MAP

@Argument(shortName="H",
          doc="The file lists a set of SNPs, optionally arranged in high-LD blocks, to be used for fingerprinting. See https://software.broadinstitute.org/gatk/documentation/article?id=9526 for details.")
public java.io.File HAPLOTYPE_MAP

LOD_THRESHOLD

@Argument(shortName="LOD",
          doc="If any two groups (with the same sample name) match with a LOD score lower than the threshold the tool will exit with a non-zero code to indicate error. Program will also exit with an error if it finds two groups with different sample name that match with a LOD score greater than -LOD_THRESHOLD.\n\nLOD score 0 means equal likelihood that the groups match vs. come from different individuals, negative LOD score -N, mean 10^N time more likely that the groups are from different individuals, and +N means 10^N times more likely that the groups are from the same individual. ")
public double LOD_THRESHOLD

CROSSCHECK_BY

@Argument(doc="Specificies which data-type should be used as the basic comparison unit. Fingerprints from readgroups can be \"rolled-up\" to the LIBRARY, SAMPLE, or FILE level before being compared. Fingerprints from VCF can be be compared by SAMPLE or FILE.")
public CrosscheckMetric.DataType CROSSCHECK_BY

NUM_THREADS

@Argument(doc="The number of threads to use to process files and generate fingerprints.")
public int NUM_THREADS

CALCULATE_TUMOR_AWARE_RESULTS

@Argument(doc="specifies whether the Tumor-aware result should be calculated. These are time consuming and can roughly double the runtime of the tool. When crosschecking many groups not calculating the tumor-aware  results can result in a significant speedup.")
public boolean CALCULATE_TUMOR_AWARE_RESULTS

ALLOW_DUPLICATE_READS

@Argument(doc="Allow the use of duplicate reads in performing the comparison. Can be useful when duplicate marking has been overly aggressive and coverage is low.")
public boolean ALLOW_DUPLICATE_READS

GENOTYPING_ERROR_RATE

@Argument(doc="Assumed genotyping error rate that provides a floor on the probability that a genotype comes from the expected sample. Must be greater than zero. ")
public double GENOTYPING_ERROR_RATE

OUTPUT_ERRORS_ONLY

@Argument(doc="If true then only groups that do not relate to each other as expected will have their LODs reported.")
public boolean OUTPUT_ERRORS_ONLY

LOSS_OF_HET_RATE

@Argument(doc="The rate at which a heterozygous genotype in a normal sample turns into a homozygous (via loss of heterozygosity) in the tumor (model assumes independent events, so this needs to be larger than reality).",
          optional=true)
public double LOSS_OF_HET_RATE

EXPECT_ALL_GROUPS_TO_MATCH

@Argument(doc="Expect all groups\' fingerprints to match, irrespective of their sample names.  By default (with this value set to false), groups (readgroups, libraries, files, or samples) with different sample names are expected to mismatch, and those with the same sample name are expected to match. ")
public boolean EXPECT_ALL_GROUPS_TO_MATCH

EXIT_CODE_WHEN_MISMATCH

@Argument(doc="When one or more mismatches between groups is detected, exit with this value instead of 0.")
public int EXIT_CODE_WHEN_MISMATCH

EXIT_CODE_WHEN_NO_VALID_CHECKS

@Argument(doc="When all LOD score are zero, exit with this value.")
public int EXIT_CODE_WHEN_NO_VALID_CHECKS

TEST_INPUT_READABILITY

@Hidden
 @Argument(doc="When true code will check for readability on input files (this can be slow on cloud access)")
public boolean TEST_INPUT_READABILITY

Constructor Detail
- CrosscheckFingerprints
```
public CrosscheckFingerprints()
```

Method Detail
- customCommandLineValidation
```
protected java.lang.String[] customCommandLineValidation()
```
  Description copied from class: CommandLineProgram
  
  Put any custom command-line validation in an override of this method. clp is initialized at this point and can be used to print usage and access argv. Any options set by command-line parser can be validated.
  
  Overrides:
  
  customCommandLineValidation in class CommandLineProgram
  
  Returns:
  
  null if command line is valid. If command line is invalid, returns an array of error message to be written to the appropriate place.
- doWork
```
protected int doWork()
```
  Description copied from class: CommandLineProgram
  
  Do the work after command line has been parsed. RuntimeException may be thrown by this method, and are reported appropriately.
  
  Specified by:
  
  doWork in class CommandLineProgram
  
  Returns:
  
  program exit status.

Class CrosscheckFingerprints

Summary

Examples

Check that all the readgroups from a sample match each other:

Check that all the readgroups match as expected when providing reads from two samples from the same individual:

Detailed Explanation

Field Summary

Fields inherited from class picard.cmdline.CommandLineProgram

Constructor Summary

Method Summary

Methods inherited from class picard.cmdline.CommandLineProgram

Methods inherited from class java.lang.Object

Field Detail

INPUT

INPUT_SAMPLE_MAP

INPUT_SAMPLE_FILE_MAP

SECOND_INPUT

SECOND_INPUT_SAMPLE_MAP

CROSSCHECK_MODE

OUTPUT

MATRIX_OUTPUT

HAPLOTYPE_MAP

LOD_THRESHOLD

CROSSCHECK_BY

NUM_THREADS

CALCULATE_TUMOR_AWARE_RESULTS

ALLOW_DUPLICATE_READS

GENOTYPING_ERROR_RATE

OUTPUT_ERRORS_ONLY

LOSS_OF_HET_RATE

EXPECT_ALL_GROUPS_TO_MATCH

EXIT_CODE_WHEN_MISMATCH

EXIT_CODE_WHEN_NO_VALID_CHECKS

TEST_INPUT_READABILITY

Constructor Detail

CrosscheckFingerprints

Method Detail

customCommandLineValidation

doWork