public class VCFMetricsPlugin
This plugin calculates metrics for gvcf files generated from MAFToVCFPlugin. Summary statistics are written to a tab-delimited file, with one row for the full-gvcf summary, and subsequent rows for each chromosome. Metrics include: Number of SNPs, insertions and deletions Number of bases inserted and deleted Total bases Percentage of bases aligned relative to reference - reference blocks and snps are considered to be "aligned" Percentage of bases covered relative to reference - reference, SNPs, and indels are considered to be "covered" (Unmapped ranges are not considered to be "covered") Percentage of reference length inserted and deleted indel sizes mean, quartiles, and largest insertion and deletion sizes mean, quartiles, and largest
Optionally, an additional file can be written containing the sizes of the indels in each gvcf - again separated by chromosome Each line starts with the sample and chromosome name, and then contains a tab-delimited list of indel sizes
Assumptions: gvcf files end with .gvcf, or else are gzipped and end with .gvcf.gz
public VCFMetricsPlugin(@Nullable java.awt.Frame parentFrame, boolean isInteractive)
This plugin calculates metrics for gvcf files generated from MAFToVCFPlugin. Summary statistics are written to a tab-delimited file, with one row for the full-gvcf summary, and subsequent rows for each chromosome. Metrics include: Number of SNPs, insertions and deletions Number of bases inserted and deleted Total bases Percentage of bases aligned relative to reference - reference blocks and snps are considered to be "aligned" Percentage of bases covered relative to reference - reference, SNPs, and indels are considered to be "covered" (Unmapped ranges are not considered to be "covered") Percentage of reference length inserted and deleted indel sizes mean, quartiles, and largest insertion and deletion sizes mean, quartiles, and largest
Optionally, an additional file can be written containing the sizes of the indels in each gvcf - again separated by chromosome Each line starts with the sample and chromosome name, and then contains a tab-delimited list of indel sizes
Assumptions: gvcf files end with .gvcf, or else are gzipped and end with .gvcf.gz
public VCFMetricsPlugin()
This plugin calculates metrics for gvcf files generated from MAFToVCFPlugin. Summary statistics are written to a tab-delimited file, with one row for the full-gvcf summary, and subsequent rows for each chromosome. Metrics include: Number of SNPs, insertions and deletions Number of bases inserted and deleted Total bases Percentage of bases aligned relative to reference - reference blocks and snps are considered to be "aligned" Percentage of bases covered relative to reference - reference, SNPs, and indels are considered to be "covered" (Unmapped ranges are not considered to be "covered") Percentage of reference length inserted and deleted indel sizes mean, quartiles, and largest insertion and deletion sizes mean, quartiles, and largest
Optionally, an additional file can be written containing the sizes of the indels in each gvcf - again separated by chromosome Each line starts with the sample and chromosome name, and then contains a tab-delimited list of indel sizes
Assumptions: gvcf files end with .gvcf, or else are gzipped and end with .gvcf.gz
@Nullable public net.maizegenetics.plugindef.DataSet processData(@Nullable net.maizegenetics.plugindef.DataSet input)
public void writeSummaryToFile(@NotNull net.maizegenetics.pangenome.hapCalling.VCFMetricsPlugin.VCFSummary summary, @NotNull java.io.BufferedWriter writer)
given a vcfsummary object, calculate and write stats to file (see class description for list of stats)
public void writeIndelsToFile(@NotNull net.maizegenetics.pangenome.hapCalling.VCFMetricsPlugin.VCFSummary summary, @NotNull java.io.BufferedWriter writer)
write the lengths of indels to file. Deletions represented as negative numbers, insertions positive
@NotNull public net.maizegenetics.pangenome.hapCalling.VCFMetricsPlugin.IndelDistributions getIndelSizeStats(@NotNull java.util.List<java.lang.Integer> indelSizes)
Given list of indel sizes, return number, mean length, quantiles length, and longest for indels, insertions, and deletions
@NotNull public java.util.Map<java.lang.String,net.maizegenetics.pangenome.hapCalling.VCFMetricsPlugin.VCFSummary> getVCFStats(@NotNull java.io.File file)
Calculates summary statistics for the given gvcf file and returns them as a map
@NotNull public net.maizegenetics.pangenome.hapCalling.VCFMetricsPlugin.VCFSummary getAllChrVCFStats(@NotNull java.util.Map<java.lang.String,net.maizegenetics.pangenome.hapCalling.VCFMetricsPlugin.VCFSummary> summaries)
@Nullable public javax.swing.ImageIcon getIcon()
@NotNull public java.lang.String getButtonName()
@NotNull public java.lang.String getToolTipText()
@Nullable public java.lang.String vcfFile()
A single .gvcf file (optionally gzipped) to process. Do not include file path, file is assumed to reside within vcfDir
@NotNull public VCFMetricsPlugin vcfFile(@NotNull java.lang.String value)
Set vcf file A single .gvcf file (optionally gzipped) to process. Do not include file path, file is assumed to reside within vcfDir
value
- vcf file@Nullable public java.lang.String vcfDir()
vcf directory. Directory containing one or more gvcf files to process It is assumed that files end with .gvcf or .gvcf.gz (if gzipped) All other files in this directory will be ignored.
@NotNull public VCFMetricsPlugin vcfDir(@NotNull java.lang.String value)
Set vcf directory. vcf directory. Directory containing one or more gvcf files to process It is assumed that files end with .gvcf or .gvcf.gz (if gzipped) All other files in this directory will be ignored.
value
- vcf directory@NotNull public java.lang.String outFile()
output metrics file. The tab-delimited file of summary statistcs calcualted from one or more gvcf files Each file has a summary line in the table, as well as a line for each contig/chromosome
@NotNull public VCFMetricsPlugin outFile(@NotNull java.lang.String value)
Set output metrics file output metrics file. The tab-delimited file of summary statistcs calcualted from one or more gvcf files Each file has a summary line in the table, as well as a line for each contig/chromosome
value
- output metrics file@NotNull public java.lang.String indelFile()
Optional file to write all indel lengths to If not provided, indel lengths will not be written Each line begins with the gvcf and chromosome name (separated by underscore) And then the indel sizes, separated by tabs
@NotNull public VCFMetricsPlugin indelFile(@NotNull java.lang.String value)
Set indel file Optional file to write all indel lengths to If not provided, indel lengths will not be written Each line begins with the gvcf and chromosome name (separated by underscore) And then the indel sizes, separated by tabs
value
- indel file