org.bdgenomics.adam.rdd.read

AlignedReadRDD

case class AlignedReadRDD(rdd: RDD[AlignmentRecord], sequences: SequenceDictionary, recordGroups: RecordGroupDictionary) extends AvroReadGroupGenomicRDD[AlignmentRecord, AlignmentRecordRDD] with AlignmentRecordRDD with Product with Serializable

Linear Supertypes
Product, Equals, AlignmentRecordRDD, AvroReadGroupGenomicRDD[AlignmentRecord, AlignmentRecordRDD], AvroGenomicRDD[AlignmentRecord, AlignmentRecordRDD], GenomicRDD[AlignmentRecord, AlignmentRecordRDD], ADAMRDDFunctions[AlignmentRecord], Logging, Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. AlignedReadRDD
  2. Product
  3. Equals
  4. AlignmentRecordRDD
  5. AvroReadGroupGenomicRDD
  6. AvroGenomicRDD
  7. GenomicRDD
  8. ADAMRDDFunctions
  9. Logging
  10. Serializable
  11. Serializable
  12. AnyRef
  13. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new AlignedReadRDD(rdd: RDD[AlignmentRecord], sequences: SequenceDictionary, recordGroups: RecordGroupDictionary)

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  7. def broadcastRegionJoin[X, Y <: GenomicRDD[X, Y], Z <: GenomicRDD[(AlignmentRecord, X), Z]](genomicRdd: GenomicRDD[X, Y])(implicit tTag: ClassTag[AlignmentRecord], xTag: ClassTag[X]): GenomicRDD[(AlignmentRecord, X), Z]

    Definition Classes
    GenomicRDD
  8. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  9. def convertToSam(isSorted: Boolean = false): (RDD[SAMRecordWritable], SAMFileHeader)

    Converts an RDD of ADAM read records into SAM records.

    Converts an RDD of ADAM read records into SAM records.

    returns

    Returns a SAM/BAM formatted RDD of reads, as well as the file header.

    Definition Classes
    AlignmentRecordRDD
  10. def countKmers(kmerLength: Int): RDD[(String, Long)]

    Cuts reads into _k_-mers, and then counts the number of occurrences of each _k_-mer.

    Cuts reads into _k_-mers, and then counts the number of occurrences of each _k_-mer.

    kmerLength

    The value of _k_ to use for cutting _k_-mers.

    returns

    Returns an RDD containing k-mer/count pairs.

    Definition Classes
    AlignmentRecordRDD
  11. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  12. def filterByOverlappingRegion(query: ReferenceRegion): AlignmentRecordRDD

    Definition Classes
    GenomicRDD
  13. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  14. def flagStat(): (FlagStatMetrics, FlagStatMetrics)

    Runs a quality control pass akin to the Samtools FlagStat tool.

    Runs a quality control pass akin to the Samtools FlagStat tool.

    returns

    Returns a tuple of (failedQualityMetrics, passedQualityMetrics)

    Definition Classes
    AlignmentRecordRDD
  15. def flattenRddByRegions(): RDD[(ReferenceRegion, AlignmentRecord)]

    Attributes
    protected
    Definition Classes
    GenomicRDD
  16. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  17. def getReferenceRegions(elem: AlignmentRecord): Seq[ReferenceRegion]

    Returns all reference regions that overlap this read.

    Returns all reference regions that overlap this read.

    If a read is unaligned, it covers no reference region. If a read is aligned we expect it to cover a single region. A chimeric read would cover multiple regions, but we store chimeric reads in a way similar to BAM, where the split alignments are stored in multiple separate reads.

    elem

    Read to produce regions for.

    returns

    The seq of reference regions this read covers.

    Attributes
    protected
    Definition Classes
    AlignmentRecordRDDGenomicRDD
  18. def groupReadsByFragment(): RDD[SingleReadBucket]

    Groups all reads by record group and read name.

    Groups all reads by record group and read name.

    returns

    SingleReadBuckets with primary, secondary and unmapped reads

    Definition Classes
    AlignmentRecordRDD
  19. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  20. def isTraceEnabled(): Boolean

    Attributes
    protected
    Definition Classes
    Logging
  21. lazy val jrdd: JavaRDD[AlignmentRecord]

    Definition Classes
    GenomicRDD
  22. def log: Logger

    Attributes
    protected
    Definition Classes
    Logging
  23. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  24. def logDebug(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  25. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  26. def logError(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  27. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  28. def logInfo(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  29. def logName: String

    Attributes
    protected
    Definition Classes
    Logging
  30. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  31. def logTrace(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  32. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  33. def logWarning(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  34. def markDuplicates(): AlignmentRecordRDD

    Marks reads as possible fragment duplicates.

    Marks reads as possible fragment duplicates.

    returns

    A new RDD where reads have the duplicate read flag set. Duplicate reads are NOT filtered out.

    Definition Classes
    AlignmentRecordRDD
  35. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  36. final def notify(): Unit

    Definition Classes
    AnyRef
  37. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  38. val rdd: RDD[AlignmentRecord]

    Definition Classes
    AlignedReadRDDGenomicRDD → ADAMRDDFunctions
  39. def realignIndels(consensusModel: ConsensusGenerator = new ConsensusGeneratorFromReads, isSorted: Boolean = false, maxIndelSize: Int = 500, maxConsensusNumber: Int = 30, lodThreshold: Double = 5.0, maxTargetSize: Int = 3000): AlignmentRecordRDD

    Realigns indels using a concensus-based heuristic.

    Realigns indels using a concensus-based heuristic.

    consensusModel

    The model to use for generating consensus sequences to realign against.

    isSorted

    If the input data is sorted, setting this parameter to true avoids a second sort.

    maxIndelSize

    The size of the largest indel to use for realignment.

    maxConsensusNumber

    The maximum number of consensus sequences to realign against per target region.

    lodThreshold

    Log-odds threshold to use when realigning; realignments are only finalized if the log-odds threshold is exceeded.

    maxTargetSize

    The maximum width of a single target region for realignment.

    returns

    Returns an RDD of mapped reads which have been realigned.

    Definition Classes
    AlignmentRecordRDD
  40. def reassembleReadPairs(secondPairRdd: RDD[AlignmentRecord], validationStringency: ValidationStringency = ValidationStringency.LENIENT): AlignmentRecordRDD

    Reassembles read pairs from two sets of unpaired reads.

    Reassembles read pairs from two sets of unpaired reads. The assumption is that the two sets were _originally_ paired together.

    secondPairRdd

    The rdd containing the second read from the pairs.

    validationStringency

    How stringently to validate the reads.

    returns

    Returns an RDD with the pair information recomputed.

    Definition Classes
    AlignmentRecordRDD
    Note

    The RDD that this is called on should be the RDD with the first read from the pair.

  41. def recalibateBaseQualities(knownSnps: Broadcast[SnpTable], observationDumpFile: Option[String] = None, validationStringency: ValidationStringency = ValidationStringency.LENIENT): AlignmentRecordRDD

    Runs base quality score recalibration on a set of reads.

    Runs base quality score recalibration on a set of reads. Uses a table of known SNPs to mask true variation during the recalibration process.

    knownSnps

    A table of known SNPs to mask valid variants.

    observationDumpFile

    An optional local path to dump recalibration observations to.

    returns

    Returns an RDD of recalibrated reads.

    Definition Classes
    AlignmentRecordRDD
  42. val recordGroups: RecordGroupDictionary

  43. def replaceRdd(newRdd: RDD[AlignmentRecord]): AlignedReadRDD

    Attributes
    protected
    Definition Classes
    AlignedReadRDDGenomicRDD
  44. def replaceRddAndSequences(newRdd: RDD[AlignmentRecord], newSequences: SequenceDictionary): AlignmentRecordRDD

    Replaces the underlying RDD and SequenceDictionary and emits a new object.

    Replaces the underlying RDD and SequenceDictionary and emits a new object.

    newRdd

    New RDD to replace current RDD.

    newSequences

    New sequence dictionary to replace current dictionary.

    returns

    Returns a new AlignmentRecordRDD.

    Attributes
    protected
    Definition Classes
    AlignedReadRDDAlignmentRecordRDD
  45. def save(filePath: String, isSorted: Boolean): Boolean

    Saves this RDD to disk, with the type identified by the extension.

    Saves this RDD to disk, with the type identified by the extension.

    filePath

    Path to save the file at.

    isSorted

    Whether the file is sorted or not.

    returns

    Returns true if saving succeeded.

    Definition Classes
    AlignmentRecordRDD
  46. def save(args: ADAMSaveAnyArgs, isSorted: Boolean = false): Boolean

    Saves AlignmentRecords as a directory of Parquet files or as SAM/BAM.

    Saves AlignmentRecords as a directory of Parquet files or as SAM/BAM.

    This method infers the output format from the file extension. Filenames ending in .sam/.bam are saved as SAM/BAM, and all other files are saved as Parquet.

    args

    Save configuration arguments.

    isSorted

    If the output is sorted, this will modify the SAM/BAM header.

    returns

    Returns true if saving succeeded.

    Definition Classes
    AlignmentRecordRDD
  47. def saveAsFastq(fileName: String, fileName2Opt: Option[String] = None, outputOriginalBaseQualities: Boolean = false, sort: Boolean = false, validationStringency: ValidationStringency = ValidationStringency.LENIENT, persistLevel: Option[StorageLevel] = None): Unit

    Saves reads in FASTQ format.

    Saves reads in FASTQ format.

    fileName

    Path to save files at.

    fileName2Opt

    Optional second path for saving files. If set, two files will be saved.

    outputOriginalBaseQualities

    If true, writes out reads with the base qualities from the original qualities (SAM "OQ") field. If false, writes out reads with the base qualities from the qual field. Default is false.

    sort

    Whether to sort the FASTQ files by read name or not. Defaults to false. Sorting the output will recover pair order, if desired.

    validationStringency

    Iff strict, throw an exception if any read in this RDD is not accompanied by its mate.

    persistLevel

    An optional persistance level to set. If this level is set, then reads will be cached (at the given persistance) level between passes.

    Definition Classes
    AlignmentRecordRDD
  48. def saveAsPairedFastq(fileName1: String, fileName2: String, outputOriginalBaseQualities: Boolean = false, validationStringency: ValidationStringency = ValidationStringency.LENIENT, persistLevel: Option[StorageLevel] = None): Unit

    Saves these AlignmentRecords to two FASTQ files.

    Saves these AlignmentRecords to two FASTQ files.

    The files are one for the first mate in each pair, and the other for the second mate in the pair.

    fileName1

    Path at which to save a FASTQ file containing the first mate of each pair.

    fileName2

    Path at which to save a FASTQ file containing the second mate of each pair.

    outputOriginalBaseQualities

    If true, writes out reads with the base qualities from the original qualities (SAM "OQ") field. If false, writes out reads with the base qualities from the qual field. Default is false.

    validationStringency

    Iff strict, throw an exception if any read in this RDD is not accompanied by its mate.

    persistLevel

    An optional persistance level to set. If this level is set, then reads will be cached (at the given persistance) level between passes.

    Definition Classes
    AlignmentRecordRDD
  49. def saveAsParquet(filePath: String): Unit

    Saves this RDD to disk as a Parquet file.

    Saves this RDD to disk as a Parquet file.

    filePath

    Path to save the file at.

    Definition Classes
    AvroGenomicRDD
  50. def saveAsParquet(filePath: String, blockSize: Integer, pageSize: Integer, compressCodec: CompressionCodecName, disableDictionaryEncoding: Boolean): Unit

    Saves this RDD to disk as a Parquet file.

    Saves this RDD to disk as a Parquet file.

    filePath

    Path to save the file at.

    blockSize

    Size per block.

    pageSize

    Size per page.

    compressCodec

    Name of the compression codec to use.

    disableDictionaryEncoding

    Whether or not to disable bit-packing.

    Definition Classes
    AvroGenomicRDD
  51. def saveAsParquet(filePath: String, blockSize: Int = 128 * 1024 * 1024, pageSize: Int = 1 * 1024 * 1024, compressCodec: CompressionCodecName = CompressionCodecName.GZIP, disableDictionaryEncoding: Boolean = false): Unit

    Saves this RDD to disk as a Parquet file.

    Saves this RDD to disk as a Parquet file.

    filePath

    Path to save the file at.

    blockSize

    Size per block.

    pageSize

    Size per page.

    compressCodec

    Name of the compression codec to use.

    disableDictionaryEncoding

    Whether or not to disable bit-packing. Default is false.

    Definition Classes
    AvroGenomicRDD
  52. def saveAsParquet(args: SaveArgs): Unit

    Saves RDD as a directory of Parquet files.

    Saves RDD as a directory of Parquet files.

    The RDD is written as a directory of Parquet files, with Parquet configuration described by the input param args. The provided sequence dictionary is written at args.outputPath/_seqdict.avro as Avro binary.

    args

    Save configuration arguments.

    Definition Classes
    AvroGenomicRDD
  53. def saveAsSam(filePath: String, asSam: Boolean, asSingleFile: Boolean, isSorted: Boolean): Unit

    Saves this RDD to disk as a SAM/BAM file.

    Saves this RDD to disk as a SAM/BAM file.

    filePath

    Path to save the file at.

    asSam

    If true, saves as SAM. If false, saves as BAM.

    asSingleFile

    If true, saves output as a single file.

    isSorted

    If the output is sorted, this will modify the header.

    Definition Classes
    AlignmentRecordRDD
  54. def saveAsSam(filePath: String, asSam: Boolean = true, asSingleFile: Boolean = false, isSorted: Boolean = false): Unit

    Saves an RDD of ADAM read data into the SAM/BAM format.

    Saves an RDD of ADAM read data into the SAM/BAM format.

    filePath

    Path to save files to.

    asSam

    Selects whether to save as SAM or BAM. The default value is true (save in SAM format).

    asSingleFile

    If true, saves output as a single file.

    isSorted

    If the output is sorted, this will modify the header.

    Definition Classes
    AlignmentRecordRDD
  55. def saveAsSamString(): String

    Converts an RDD into the SAM spec string it represents.

    Converts an RDD into the SAM spec string it represents.

    This method converts an RDD of AlignmentRecords back to an RDD of SAMRecordWritables and a SAMFileHeader, and then maps this RDD into a string on the driver that represents this file in SAM.

    returns

    A string on the driver representing this RDD of reads in SAM format.

    Definition Classes
    AlignmentRecordRDD
  56. def saveAvro[U <: SpecificRecordBase](filename: String, sc: SparkContext, schema: Schema, avro: Seq[U])(implicit tUag: ClassTag[U]): Unit

    Saves Avro data to a Hadoop file system.

    Saves Avro data to a Hadoop file system.

    This method uses a SparkContext to identify our underlying file system, which we then save to.

    Frustratingly enough, although all records generated by the Avro IDL compiler have a static SCHEMA$ field, this field does not belong to the SpecificRecordBase abstract class, or the SpecificRecord interface. As such, we must force the user to pass in the schema.

    U

    The type of the specific record we are saving.

    filename

    Path to save records to.

    sc

    SparkContext used for identifying underlying file system.

    schema

    Schema of records we are saving.

    avro

    Seq of records we are saving.

    Attributes
    protected
    Definition Classes
    ADAMRDDFunctions
  57. def saveMetadata(filePath: String): Unit

    Called in saveAsParquet after saving RDD to Parquet to save metadata.

    Called in saveAsParquet after saving RDD to Parquet to save metadata.

    Writes any necessary metadata to disk. If not overridden, writes the sequence dictionary to disk as Avro.

    Attributes
    protected
    Definition Classes
    AvroReadGroupGenomicRDDAvroGenomicRDD
  58. def saveRddAsParquet(filePath: String, blockSize: Int = 128 * 1024 * 1024, pageSize: Int = 1 * 1024 * 1024, compressCodec: CompressionCodecName = CompressionCodecName.GZIP, disableDictionaryEncoding: Boolean = false, schema: Option[Schema] = None): Unit

    Attributes
    protected
    Definition Classes
    ADAMRDDFunctions
  59. def saveRddAsParquet(args: SaveArgs): Unit

    Attributes
    protected
    Definition Classes
    ADAMRDDFunctions
  60. val sequences: SequenceDictionary

    Definition Classes
    AlignedReadRDDGenomicRDD
  61. def shuffleRegionJoin[X, Y <: GenomicRDD[X, Y], Z <: GenomicRDD[(AlignmentRecord, X), Z]](genomicRdd: GenomicRDD[X, Y], optPartitions: Option[Int] = None)(implicit tTag: ClassTag[AlignmentRecord], xTag: ClassTag[X]): GenomicRDD[(AlignmentRecord, X), Z]

    Definition Classes
    GenomicRDD
  62. def sortReadsByReferencePosition(): AlignmentRecordRDD

    Sorts our read data by reference positions, with contigs ordered by name.

    Sorts our read data by reference positions, with contigs ordered by name.

    Sorts reads by the location where they are aligned. Unaligned reads are put at the end and sorted by read name. Contigs are ordered lexicographically.

    returns

    Returns a new RDD containing sorted reads.

    Definition Classes
    AlignmentRecordRDD
    See also

    sortReadsByReferencePositionAndIndex

  63. def sortReadsByReferencePositionAndIndex(): AlignmentRecordRDD

    Sorts our read data by reference positions, with contigs ordered by index.

    Sorts our read data by reference positions, with contigs ordered by index.

    Sorts reads by the location where they are aligned. Unaligned reads are put at the end and sorted by read name. Contigs are ordered by index that they are ordered in the SequenceDictionary.

    returns

    Returns a new RDD containing sorted reads.

    Definition Classes
    AlignmentRecordRDD
    See also

    sortReadsByReferencePosition

  64. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  65. def toCoverage(collapse: Boolean = true): CoverageRDD

    Converts this set of reads into a corresponding CoverageRDD.

    Converts this set of reads into a corresponding CoverageRDD.

    collapse

    Determines whether to merge adjacent coverage elements with the same score a single coverage.

    returns

    CoverageRDD containing mapped RDD of Coverage.

    Definition Classes
    AlignmentRecordRDD
  66. def toFragments: FragmentRDD

    Convert this set of reads into fragments.

    Convert this set of reads into fragments.

    returns

    Returns a FragmentRDD where all reads have been grouped together by the original sequence fragment they come from.

    Definition Classes
    AlignmentRecordRDD
  67. def transform(tFn: (RDD[AlignmentRecord]) ⇒ RDD[AlignmentRecord]): AlignmentRecordRDD

    Definition Classes
    GenomicRDD
  68. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  69. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  70. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Product

Inherited from Equals

Inherited from AlignmentRecordRDD

Inherited from AvroReadGroupGenomicRDD[AlignmentRecord, AlignmentRecordRDD]

Inherited from AvroGenomicRDD[AlignmentRecord, AlignmentRecordRDD]

Inherited from GenomicRDD[AlignmentRecord, AlignmentRecordRDD]

Inherited from ADAMRDDFunctions[AlignmentRecord]

Inherited from Logging

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped