Package

com.johnsnowlabs.nlp

pretrained

Permalink

package pretrained

Visibility
  1. Public
  2. All

Type Members

  1. case class PretrainedPipeline(downloadName: String, lang: String = "en", source: String = ResourceDownloader.publicLoc, parseEmbeddingsVectors: Boolean = false, diskLocation: Option[String] = None) extends Product with Serializable

    Permalink

    Represents a fully constructed and trained Spark NLP pipeline, ready to be used.

    Represents a fully constructed and trained Spark NLP pipeline, ready to be used. This way, a whole pipeline can be defined in 1 line. Additionally, the LightPipeline version of the model can be retrieved with member lightModel.

    For more extended examples see the Pipelines page and our Github Model Repository for available pipeline models.

    Example

    import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline
    import com.johnsnowlabs.nlp.SparkNLP
    val testData = spark.createDataFrame(Seq(
    (1, "Google has announced the release of a beta version of the popular TensorFlow machine learning library"),
    (2, "Donald John Trump (born June 14, 1946) is the 45th and current president of the United States")
    )).toDF("id", "text")
    
    val pipeline = PretrainedPipeline("explain_document_dl", lang="en")
    
    val annotation = pipeline.transform(testData)
    
    annotation.select("entities.result").show(false)
    
    /*
    +----------------------------------+
    |result                            |
    +----------------------------------+
    |[Google, TensorFlow]              |
    |[Donald John Trump, United States]|
    +----------------------------------+
    */
    downloadName

    Name of the Pipeline Model

    lang

    Language of the defined pipeline (Default: "en")

    source

    Source where to get the Pipeline Model

  2. case class RepositoryMetadata(metadataFile: String, repoFolder: String, version: String, lastMetadataDownloaded: Timestamp, metadata: List[ResourceMetadata]) extends Product with Serializable

    Permalink

    Describes state of repository Repository could be any s3 folder that has metadata.json describing list of resources inside

  3. trait ResourceDownloader extends AnyRef

    Permalink
  4. case class ResourceMetadata(name: String, language: Option[String], libVersion: Option[Version], sparkVersion: Option[Version], readyToUse: Boolean, time: Timestamp, isZipped: Boolean = false, category: Option[ResourceType] = Some(ResourceType.NOT_DEFINED), checksum: String = "") extends Product with Serializable

    Permalink
  5. case class ResourceRequest(name: String, language: Option[String] = None, folder: String = ResourceDownloader.publicLoc, libVersion: Version = ResourceDownloader.libVersion, sparkVersion: Version = ResourceDownloader.sparkVersion) extends Product with Serializable

    Permalink
  6. class S3ResourceDownloader extends ResourceDownloader

    Permalink

Value Members

  1. object PretrainedPipeline extends Serializable

    Permalink
  2. object PythonResourceDownloader

    Permalink
  3. object ResourceDownloader

    Permalink
  4. object ResourceMetadata extends Serializable

    Permalink
  5. object ResourceType extends Enumeration

    Permalink

Ungrouped