org.bdgenomics.adam.rdd

ADAMContext

class ADAMContext extends Serializable with Logging

The ADAMContext provides functions on top of a SparkContext for loading genomic data.

Linear Supertypes
Logging, Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. ADAMContext
  2. Logging
  3. Serializable
  4. Serializable
  5. AnyRef
  6. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  7. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  9. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  10. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  11. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  12. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  13. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  14. def isTraceEnabled(): Boolean

    Attributes
    protected
    Definition Classes
    Logging
  15. def loadAlignments(filePath: String, projection: Option[Schema] = None, filePath2Opt: Option[String] = None, recordGroupOpt: Option[String] = None, stringency: ValidationStringency = ValidationStringency.STRICT): AlignmentRecordRDD

    Loads alignments from a given path, and infers the input type.

    Loads alignments from a given path, and infers the input type.

    This method can load:

    * AlignmentRecords via Parquet (default) * SAM/BAM/CRAM (.sam, .bam, .cram) * FASTQ (interleaved, single end, paired end) (.ifq, .fq/.fastq) * FASTA (.fa, .fasta) * NucleotideContigFragments via Parquet (.contig.adam)

    As hinted above, the input type is inferred from the file path extension.

    filePath

    Path to load data from.

    projection

    The fields to project; ignored if not Parquet.

    filePath2Opt

    The path to load a second end of FASTQ data from. Ignored if not FASTQ.

    recordGroupOpt

    Optional record group name to set if loading FASTQ.

    stringency

    Validation stringency used on FASTQ import/merging.

    returns

    Returns an AlignmentRecordRDD which wraps the RDD of reads, sequence dictionary representing the contigs these reads are aligned to if the reads are aligned, and the record group dictionary for the reads if one is available.

    See also

    loadFasta

    loadFastq

    loadInterleavedFastq

    loadParquetAlignments

    loadBam

  16. def loadBam(filePath: String, validationStringency: ValidationStringency = ValidationStringency.STRICT): AlignmentRecordRDD

    Loads a SAM/BAM file.

    Loads a SAM/BAM file.

    This reads the sequence and record group dictionaries from the SAM/BAM file header. SAMRecords are read from the file and converted to the AlignmentRecord schema.

    filePath

    Path to the file on disk.

    returns

    Returns an AlignmentRecordRDD which wraps the RDD of reads, sequence dictionary representing the contigs these reads are aligned to if the reads are aligned, and the record group dictionary for the reads if one is available.

    See also

    loadAlignments

  17. def loadBed(filePath: String, minPartitions: Option[Int] = None, stringency: ValidationStringency = ValidationStringency.LENIENT): FeatureRDD

    Loads features stored in BED6/12 format.

    Loads features stored in BED6/12 format.

    filePath

    The path to the file to load.

    minPartitions

    An optional minimum number of partitions to load. If not set, falls back to the configured Spark default parallelism.

    stringency

    Optional stringency to pass. LENIENT stringency will warn when a malformed line is encountered, SILENT will ignore the malformed line, STRICT will throw an exception.

    returns

    Returns a FeatureRDD.

  18. def loadCoverage(filePath: String): CoverageRDD

    Loads file of Features to a CoverageRDD.

    Loads file of Features to a CoverageRDD. Coverage is stored in the score attribute of Feature.

    filePath

    File path to load coverage from.

    returns

    CoverageRDD containing an RDD of Coverage

  19. def loadFasta(filePath: String, fragmentLength: Long): NucleotideContigFragmentRDD

    Loads a FASTA file.

    Loads a FASTA file.

    filePath

    The path to load from.

    fragmentLength

    The length to split contigs into. This sets the parallelism achievable.

    returns

    Returns a NucleotideContigFragmentRDD containing the contigs.

  20. def loadFastq(filePath1: String, filePath2Opt: Option[String], recordGroupOpt: Option[String] = None, stringency: ValidationStringency = ValidationStringency.STRICT): AlignmentRecordRDD

    Loads (possibly paired) FASTQ data.

    Loads (possibly paired) FASTQ data.

    filePath1

    The path where the first set of reads are.

    filePath2Opt

    The path where the second set of reads are, if provided.

    recordGroupOpt

    The optional record group name to associate to the reads.

    stringency

    The validation stringency to use when validating the reads.

    returns

    Returns the reads as an unaligned AlignmentRecordRDD.

    See also

    loadUnpairedFastq

    loadPairedFastq

  21. def loadFeatures(filePath: String, projection: Option[Schema] = None, minPartitions: Option[Int] = None): FeatureRDD

    Loads Features from a file, autodetecting the file type.

    Loads Features from a file, autodetecting the file type.

    Loads files ending in .bed as BED6/12, .gff3 as GFF3, .gtf/.gff as GTF/GFF2, .narrow[pP]eak as NarrowPeak, and .interval_list as IntervalList. If none of these match, we fall back to Parquet.

    filePath

    The path to the file to load.

    projection

    An optional projection to push down.

    minPartitions

    An optional minimum number of partitions to use. For textual formats, if this is None, we fall back to the Spark default parallelism.

    returns

    Returns a FeatureRDD.

    See also

    loadParquetFeatures

    loadIntervalList

    loadNarrowPeak

    loadGff3

    loadGtf

    loadBed

  22. def loadFragments(filePath: String): FragmentRDD

    Auto-detects the file type and loads a FragmentRDD.

    Auto-detects the file type and loads a FragmentRDD.

    This method can load:

    * Fragments via Parquet (default) * SAM/BAM/CRAM (.sam, .bam, .cram) * FASTQ (interleaved only --> .ifq) * Autodetects AlignmentRecord as Parquet with .reads.adam extension.

    filePath

    Path to load data from.

    returns

    Returns the loaded data as a FragmentRDD.

  23. def loadGenotypes(filePath: String, projection: Option[Schema] = None): GenotypeRDD

    Auto-detects the file type and loads a GenotypeRDD.

    Auto-detects the file type and loads a GenotypeRDD.

    If the file has a .vcf/.vcf.gz/.vcf.bgzf/.vcf.bgz extension, loads as VCF. Else, falls back to Parquet.

    filePath

    The path to load.

    projection

    An optional subset of fields to load.

    returns

    Returns a GenotypeRDD.

    See also

    loadParquetGenotypes

    loadVcf

  24. def loadGff3(filePath: String, minPartitions: Option[Int] = None, stringency: ValidationStringency = ValidationStringency.LENIENT): FeatureRDD

    Loads features stored in GFF3 format.

    Loads features stored in GFF3 format.

    filePath

    The path to the file to load.

    minPartitions

    An optional minimum number of partitions to load. If not set, falls back to the configured Spark default parallelism.

    stringency

    Optional stringency to pass. LENIENT stringency will warn when a malformed line is encountered, SILENT will ignore the malformed line, STRICT will throw an exception.

    returns

    Returns a FeatureRDD.

  25. def loadGtf(filePath: String, minPartitions: Option[Int] = None, stringency: ValidationStringency = ValidationStringency.LENIENT): FeatureRDD

    Loads features stored in GFF2/GTF format.

    Loads features stored in GFF2/GTF format.

    filePath

    The path to the file to load.

    minPartitions

    An optional minimum number of partitions to load. If not set, falls back to the configured Spark default parallelism.

    stringency

    Optional stringency to pass. LENIENT stringency will warn when a malformed line is encountered, SILENT will ignore the malformed line, STRICT will throw an exception.

    returns

    Returns a FeatureRDD.

  26. def loadIndexedBam(filePath: String, viewRegion: ReferenceRegion): AlignmentRecordRDD

    Functions like loadBam, but uses bam index files to look at fewer blocks, and only returns records within a specified ReferenceRegion.

    Functions like loadBam, but uses bam index files to look at fewer blocks, and only returns records within a specified ReferenceRegion. Bam index file required.

    filePath

    The path to the input data. Currently this path must correspond to a single Bam file. The bam index file associated needs to have the same name.

    viewRegion

    The ReferenceRegion we are filtering on

  27. def loadIndexedBam(filePath: String, viewRegions: Iterable[ReferenceRegion])(implicit s: DummyImplicit): AlignmentRecordRDD

  28. def loadIndexedBam(filePath: String, parsedLoci: ParsedLoci, includeUnmappedMates: Boolean = false)(implicit s: DummyImplicit): AlignmentRecordRDD

    Functions like loadBam, but uses bam index files to look at fewer blocks, and only returns records within the specified ReferenceRegions.

    Functions like loadBam, but uses bam index files to look at fewer blocks, and only returns records within the specified ReferenceRegions. Bam index file required.

    filePath

    The path to the input data. Currently this path must correspond to a single Bam file. The bam index file associated needs to have the same name.

    parsedLoci

    Iterable of ReferenceRegions we are filtering on

  29. def loadIndexedVcf(filePath: String, viewRegions: Iterable[ReferenceRegion], stringency: ValidationStringency = ValidationStringency.STRICT)(implicit s: DummyImplicit): VariantContextRDD

    Loads a VCF file indexed by a tabix (tbi) file into an RDD.

    Loads a VCF file indexed by a tabix (tbi) file into an RDD.

    filePath

    The file to load.

    viewRegions

    Iterator of ReferenceRegions we are filtering on.

    stringency

    The validation stringency to use when validating the VCF.

    returns

    Returns a VariantContextRDD.

  30. def loadIndexedVcf(filePath: String, viewRegion: ReferenceRegion): VariantContextRDD

    Loads a VCF file indexed by a tabix (tbi) file into an RDD.

    Loads a VCF file indexed by a tabix (tbi) file into an RDD.

    filePath

    The file to load.

    viewRegion

    ReferenceRegions we are filtering on.

    returns

    Returns a VariantContextRDD.

  31. def loadInterleavedFastq(filePath: String): AlignmentRecordRDD

    Loads reads from interleaved FASTQ.

    Loads reads from interleaved FASTQ.

    In interleaved FASTQ, the two reads from a paired sequencing protocol are interleaved in a single file. This is a zipped representation of the typical paired FASTQ.

    filePath

    Path to load.

    returns

    Returns the file as an unaligned AlignmentRecordRDD.

  32. def loadInterleavedFastqAsFragments(filePath: String): FragmentRDD

    Loads interleaved FASTQ data as Fragments.

    Loads interleaved FASTQ data as Fragments.

    Fragments represent all of the reads from a single sequenced fragment as a single object, which is a useful representation for some tasks.

    filePath

    The path to load.

    returns

    Returns a FragmentRDD containing the paired reads grouped by sequencing fragment.

  33. def loadIntervalList(filePath: String, minPartitions: Option[Int] = None, stringency: ValidationStringency = ValidationStringency.LENIENT): FeatureRDD

    Loads features stored in IntervalList format.

    Loads features stored in IntervalList format.

    filePath

    The path to the file to load.

    minPartitions

    An optional minimum number of partitions to load. If not set, falls back to the configured Spark default parallelism.

    stringency

    Optional stringency to pass. LENIENT stringency will warn when a malformed line is encountered, SILENT will ignore the malformed line, STRICT will throw an exception.

    returns

    Returns a FeatureRDD.

  34. def loadNarrowPeak(filePath: String, minPartitions: Option[Int] = None, stringency: ValidationStringency = ValidationStringency.LENIENT): FeatureRDD

    Loads features stored in NarrowPeak format.

    Loads features stored in NarrowPeak format.

    filePath

    The path to the file to load.

    minPartitions

    An optional minimum number of partitions to load. If not set, falls back to the configured Spark default parallelism.

    stringency

    Optional stringency to pass. LENIENT stringency will warn when a malformed line is encountered, SILENT will ignore the malformed line, STRICT will throw an exception.

    returns

    Returns a FeatureRDD.

  35. def loadPairedFastq(filePath1: String, filePath2: String, recordGroupOpt: Option[String], stringency: ValidationStringency): AlignmentRecordRDD

    Loads paired FASTQ data from two files.

    Loads paired FASTQ data from two files.

    filePath1

    The path where the first set of reads are.

    filePath2

    The path where the second set of reads are.

    recordGroupOpt

    The optional record group name to associate to the reads.

    stringency

    The validation stringency to use when validating the reads.

    returns

    Returns the reads as an unaligned AlignmentRecordRDD.

    See also

    loadFastq

  36. def loadParquet[T](filePath: String, predicate: Option[FilterPredicate] = None, projection: Option[Schema] = None)(implicit ev1: (T) ⇒ SpecificRecord, ev2: Manifest[T]): RDD[T]

    This method will create a new RDD.

    This method will create a new RDD.

    T

    The type of records to return

    filePath

    The path to the input data

    predicate

    An optional pushdown predicate to use when reading the data

    projection

    An option projection schema to use when reading the data

    returns

    An RDD with records of the specified type

  37. def loadParquetAlignments(filePath: String, predicate: Option[FilterPredicate] = None, projection: Option[Schema] = None): AlignmentRecordRDD

    Loads alignment data from a Parquet file.

    Loads alignment data from a Parquet file.

    filePath

    The path of the file to load.

    predicate

    An optional predicate to push down into the file.

    projection

    An optional schema designating the fields to project.

    returns

    Returns an AlignmentRecordRDD which wraps the RDD of reads, sequence dictionary representing the contigs these reads are aligned to if the reads are aligned, and the record group dictionary for the reads if one is available.

    Note

    The sequence dictionary is read from an avro file stored at filePath/_seqdict.avro and the record group dictionary is read from an avro file stored at filePath/_rgdict.avro. These files are pure avro, not Parquet.

    See also

    loadAlignments

  38. def loadParquetContigFragments(filePath: String, predicate: Option[FilterPredicate] = None, projection: Option[Schema] = None): NucleotideContigFragmentRDD

    Loads NucleotideContigFragments stored in Parquet, with metadata.

    Loads NucleotideContigFragments stored in Parquet, with metadata.

    filePath

    The path to load files from.

    predicate

    An optional predicate to push down into the file.

    projection

    An optional projection to use for reading.

    returns

    Returns a NucleotideContigFragmentRDD.

  39. def loadParquetCoverage(filePath: String, predicate: Option[FilterPredicate] = None): CoverageRDD

    Loads Parquet file of Features to a CoverageRDD.

    Loads Parquet file of Features to a CoverageRDD. Coverage is stored in the score attribute of Feature.

    filePath

    File path to load coverage from.

    predicate

    An optional predicate to push down into the file.

    returns

    CoverageRDD containing an RDD of Coverage

  40. def loadParquetFeatures(filePath: String, predicate: Option[FilterPredicate] = None, projection: Option[Schema] = None): FeatureRDD

    Loads Features stored in Parquet, with accompanying metadata.

    Loads Features stored in Parquet, with accompanying metadata.

    filePath

    The path to load files from.

    predicate

    An optional predicate to push down into the file.

    projection

    An optional projection to use for reading.

    returns

    Returns a FeatureRDD.

  41. def loadParquetFragments(filePath: String, predicate: Option[FilterPredicate] = None, projection: Option[Schema] = None): FragmentRDD

    Loads Fragments stored in Parquet, with accompanying metadata.

    Loads Fragments stored in Parquet, with accompanying metadata.

    filePath

    The path to load files from.

    predicate

    An optional predicate to push down into the file.

    projection

    An optional projection to use for reading.

    returns

    Returns a FragmentRDD.

  42. def loadParquetGenotypes(filePath: String, predicate: Option[FilterPredicate] = None, projection: Option[Schema] = None): GenotypeRDD

    Loads Genotypes stored in Parquet with accompanying metadata.

    Loads Genotypes stored in Parquet with accompanying metadata.

    filePath

    The path to load files from.

    predicate

    An optional predicate to push down into the file.

    projection

    An optional projection to use for reading.

    returns

    Returns a GenotypeRDD.

  43. def loadParquetVariantAnnotations(filePath: String, predicate: Option[FilterPredicate] = None, projection: Option[Schema] = None): VariantAnnotationRDD

    Loads VariantAnnotations stored in Parquet, with metadata.

    Loads VariantAnnotations stored in Parquet, with metadata.

    filePath

    The path to load files from.

    predicate

    An optional predicate to push down into the file.

    projection

    An optional projection to use for reading.

    returns

    Returns VariantAnnotationRDD.

  44. def loadParquetVariants(filePath: String, predicate: Option[FilterPredicate] = None, projection: Option[Schema] = None): VariantRDD

    Loads Variants stored in Parquet with accompanying metadata.

    Loads Variants stored in Parquet with accompanying metadata.

    filePath

    The path to load files from.

    predicate

    An optional predicate to push down into the file.

    projection

    An optional projection to use for reading.

    returns

    Returns a VariantRDD.

  45. def loadReferenceFile(filePath: String, fragmentLength: Long): ReferenceFile

    Auto-detects the file type and loads a broadcastable ReferenceFile.

    Auto-detects the file type and loads a broadcastable ReferenceFile.

    If the file type is 2bit, loads a 2bit file. Else, uses loadSequences to load the reference as an RDD, which is then collected to the driver.

    filePath

    The path to load.

    fragmentLength

    The length of fragment to use for splitting.

    returns

    Returns a broadcastable ReferenceFile.

    See also

    loadSequences

  46. def loadSequences(filePath: String, projection: Option[Schema] = None, fragmentLength: Long = 10000): NucleotideContigFragmentRDD

    Auto-detects the file type and loads contigs as a NucleotideContigFragmentRDD.

    Auto-detects the file type and loads contigs as a NucleotideContigFragmentRDD.

    Loads files ending in .fa/.fasta/.fa.gz/.fasta.gz as FASTA, else, falls back to Parquet.

    filePath

    The path to load.

    projection

    An optional subset of fields to load.

    fragmentLength

    The length of fragment to use for splitting.

    returns

    Returns a NucleotideContigFragmentRDD.

    See also

    loadReferenceFile

    loadParquetContigFragments

    loadFasta

  47. def loadUnpairedFastq(filePath: String, recordGroupOpt: Option[String] = None, setFirstOfPair: Boolean = false, setSecondOfPair: Boolean = false, stringency: ValidationStringency = ValidationStringency.STRICT): AlignmentRecordRDD

    Loads unpaired FASTQ data from two files.

    Loads unpaired FASTQ data from two files.

    filePath

    The path where the first set of reads are.

    recordGroupOpt

    The optional record group name to associate to the reads.

    setFirstOfPair

    If true, sets the read as first from the fragment.

    setSecondOfPair

    If true, sets the read as second from the fragment.

    stringency

    The validation stringency to use when validating the reads.

    returns

    Returns the reads as an unaligned AlignmentRecordRDD.

    See also

    loadFastq

  48. def loadVariantAnnotations(filePath: String, projection: Option[Schema] = None): VariantAnnotationRDD

    Loads VariantAnnotations into an RDD, and automatically detects the underlying storage format.

    Loads VariantAnnotations into an RDD, and automatically detects the underlying storage format.

    Can load variant annotations from either Parquet or VCF.

    filePath

    The path to load files from.

    projection

    An optional projection to use for reading.

    returns

    Returns VariantAnnotationRDD.

    See also

    loadParquetVariantAnnotations

    loadVcfAnnotations

  49. def loadVariants(filePath: String, projection: Option[Schema] = None): VariantRDD

    Auto-detects the file type and loads a VariantRDD.

    Auto-detects the file type and loads a VariantRDD.

    If the file has a .vcf/.vcf.gz/.vcf.bgzf/.vcf.bgz extension, loads as VCF. Else, falls back to Parquet.

    filePath

    The path to load.

    projection

    An optional subset of fields to load.

    returns

    Returns a VariantRDD.

    See also

    loadParquetVariants

    loadVcf

  50. def loadVcf(filePath: String, stringency: ValidationStringency = ValidationStringency.STRICT): VariantContextRDD

    Loads a VCF file into an RDD.

    Loads a VCF file into an RDD.

    filePath

    The file to load.

    stringency

    The validation stringency to use when validating the VCF.

    returns

    Returns a VariantContextRDD.

    See also

    loadVcfAnnotations

  51. def loadVcfAnnotations(filePath: String): VariantAnnotationRDD

    Loads variant annotations stored in VCF format.

    Loads variant annotations stored in VCF format.

    filePath

    The path to the VCF file(s) to load annotations from.

    returns

    Returns VariantAnnotationRDD.

  52. def log: Logger

    Attributes
    protected
    Definition Classes
    Logging
  53. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  54. def logDebug(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  55. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  56. def logError(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  57. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  58. def logInfo(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  59. def logName: String

    Attributes
    protected
    Definition Classes
    Logging
  60. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  61. def logTrace(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  62. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  63. def logWarning(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  64. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  65. final def notify(): Unit

    Definition Classes
    AnyRef
  66. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  67. val sc: SparkContext

    The SparkContext to wrap.

  68. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  69. def toString(): String

    Definition Classes
    AnyRef → Any
  70. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  71. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  72. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Logging

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped