public final class HDF5SVDReadCountPanelOfNormals extends java.lang.Object implements SVDReadCountPanelOfNormals
CreateReadCountPanelOfNormals
.
Data is stored in the following HDF5 paths:
Most attributes are stored as wide matrices (i.e., more columns than rows) when possible.
This avoids a very slow write time in HDF5, since HDF5 writes wide matrices much faster than tall matrices.
See HDF5Utils.writeIntervals(org.broadinstitute.hdf5.HDF5File, java.lang.String, java.util.List<T>)
for details on the representation of intervals.
Modifier and Type | Method and Description |
---|---|
static void |
create(java.io.File outFile,
java.lang.String commandLine,
htsjdk.samtools.SAMSequenceDictionary sequenceDictionary,
org.apache.commons.math3.linear.RealMatrix originalReadCounts,
java.util.List<java.lang.String> originalSampleFilenames,
java.util.List<SimpleInterval> originalIntervals,
double[] intervalGCContent,
double minimumIntervalMedianPercentile,
double maximumZerosInSamplePercentage,
double maximumZerosInIntervalPercentage,
double extremeSampleMedianPercentile,
boolean doImputeZeros,
double extremeOutlierTruncationPercentile,
int numEigensamplesRequested,
int maximumChunkSize,
org.apache.spark.api.java.JavaSparkContext ctx)
Create the panel of normals and write it to an HDF5 file.
|
double[][] |
getEigensampleVectors()
Returns a modifiable copy of an array containing the orthnonormal matrix of eigensample vectors.
|
int |
getNumEigensamples()
Returns the number of eigensamples.
|
double[] |
getOriginalIntervalGCContent()
Returns a modifiable copy of an array containing the GC content of the original intervals
(in the same order as in
SVDReadCountPanelOfNormals.getOriginalIntervals() ). |
java.util.List<SimpleInterval> |
getOriginalIntervals()
Returns a modifiable copy of the list of the original intervals that were used to build this PoN
(no filtering will have been applied).
|
double[][] |
getOriginalReadCounts()
Returns a modifiable copy of the original matrix of integer read-counts (represented as doubles) used to build the PoN
(no filtering will have been applied).
|
double[] |
getPanelIntervalFractionalMedians()
Returns a modifiable copy of an array containing the median (across all samples, before filtering)
of the fractional coverage at each panel interval (in the same order as in
SVDReadCountPanelOfNormals.getPanelIntervals() ). |
java.util.List<SimpleInterval> |
getPanelIntervals()
Returns a modifiable copy of the list of the intervals contained in this PoN after all filtering has been applied.
|
htsjdk.samtools.SAMSequenceDictionary |
getSequenceDictionary()
Returns the sequence dictionary common to all of the read counts used to build the PoN.
|
double[] |
getSingularValues()
Returns a modifiable copy of an array of the singular values of the eigensamples in decreasing order.
|
double |
getVersion()
Returns the PoN version.
|
static HDF5SVDReadCountPanelOfNormals |
read(org.broadinstitute.hdf5.HDF5File file)
Create an interface to an HDF5 file.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
denoise
public double getVersion()
SVDReadCountPanelOfNormals
getVersion
in interface SVDReadCountPanelOfNormals
public int getNumEigensamples()
SVDReadCountPanelOfNormals
getNumEigensamples
in interface SVDReadCountPanelOfNormals
public htsjdk.samtools.SAMSequenceDictionary getSequenceDictionary()
SVDReadCountPanelOfNormals
getSequenceDictionary
in interface SVDReadCountPanelOfNormals
public double[][] getOriginalReadCounts()
SVDReadCountPanelOfNormals
M_original x N_original
,
where M_original
is the number of original intervals and N_original
is the number of
original samples.getOriginalReadCounts
in interface SVDReadCountPanelOfNormals
public java.util.List<SimpleInterval> getOriginalIntervals()
SVDReadCountPanelOfNormals
M_original
.getOriginalIntervals
in interface SVDReadCountPanelOfNormals
public double[] getOriginalIntervalGCContent()
SVDReadCountPanelOfNormals
SVDReadCountPanelOfNormals.getOriginalIntervals()
). This array has length M_original
.getOriginalIntervalGCContent
in interface SVDReadCountPanelOfNormals
public java.util.List<SimpleInterval> getPanelIntervals()
SVDReadCountPanelOfNormals
M
.getPanelIntervals
in interface SVDReadCountPanelOfNormals
public double[] getPanelIntervalFractionalMedians()
SVDReadCountPanelOfNormals
SVDReadCountPanelOfNormals.getPanelIntervals()
).
This is used to standardize samples. This array has length M
.getPanelIntervalFractionalMedians
in interface SVDReadCountPanelOfNormals
public double[] getSingularValues()
SVDReadCountPanelOfNormals
K
.getSingularValues
in interface SVDReadCountPanelOfNormals
public double[][] getEigensampleVectors()
SVDReadCountPanelOfNormals
M x K
,
where M
is the number of panel intervals (after filtering)
and K
is the number of eigensamples.
Columns are sorted by singular value in decreasing order.getEigensampleVectors
in interface SVDReadCountPanelOfNormals
public static HDF5SVDReadCountPanelOfNormals read(org.broadinstitute.hdf5.HDF5File file)
public static void create(java.io.File outFile, java.lang.String commandLine, htsjdk.samtools.SAMSequenceDictionary sequenceDictionary, org.apache.commons.math3.linear.RealMatrix originalReadCounts, java.util.List<java.lang.String> originalSampleFilenames, java.util.List<SimpleInterval> originalIntervals, double[] intervalGCContent, double minimumIntervalMedianPercentile, double maximumZerosInSamplePercentage, double maximumZerosInIntervalPercentage, double extremeSampleMedianPercentile, boolean doImputeZeros, double extremeOutlierTruncationPercentile, int numEigensamplesRequested, int maximumChunkSize, org.apache.spark.api.java.JavaSparkContext ctx)
originalReadCounts
should be samples x intervals.
To reduce memory footprint, originalReadCounts
is modified in place.
If intervalGCContent
is null, GC-bias correction will not be performed.