embeddings

Type Members

class AlbertEmbeddings extends AnnotatorModel[AlbertEmbeddings] with HasSimpleAnnotate[AlbertEmbeddings] with WriteTensorflowModel with WriteSentencePieceModel with HasEmbeddingsProperties with HasStorageRef with HasCaseSensitiveProperties

ALBERT: A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS - Google Research, Toyota Technological Institute at Chicago This these embeddings represent the outputs generated by the Albert model.
ALBERT: A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS - Google Research, Toyota Technological Institute at Chicago This these embeddings represent the outputs generated by the Albert model. All official Albert releases by google in TF-HUB are supported with this Albert Wrapper:
TF-hub Models :
albert_base = https://tfhub.dev/google/albert_base/3 | 768-embed-dim, 12-layer, 12-heads, 12M parameters
albert_large = https://tfhub.dev/google/albert_large/3 | 1024-embed-dim, 24-layer, 16-heads, 18M parameters
albert_xlarge = https://tfhub.dev/google/albert_xlarge/3 | 2048-embed-dim, 24-layer, 32-heads, 60M parameters
albert_xxlarge = https://tfhub.dev/google/albert_xxlarge/3 | 4096-embed-dim, 12-layer, 64-heads, 235M parameters
This model requires input tokenization with SentencePiece model, which is provided by Spark-NLP (See tokenizers package)
Sources :
https://arxiv.org/pdf/1909.11942.pdf
https://github.com/google-research/ALBERT
https://tfhub.dev/s?q=albert
Paper abstract :
Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations and longer training times. To address these problems, we present two parameterreduction techniques to lower memory consumption and increase the training speed of BERT (Devlin et al., 2019). Comprehensive empirical evidence shows that our proposed methods lead to models that scale much better compared to the original BERT. We also use a self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream tasks with multi-sentence inputs. As a result, our best model establishes new state-of-the-art results on the GLUE, RACE, and SQuAD benchmarks while having fewer parameters compared to BERT-large.
Tips : ALBERT uses repeating layers which results in a small memory footprint, however the computational cost remains similar to a BERT-like architecture with the same number of hidden layers as it has to iterate through the same number of (repeating) layers.
class BertEmbeddings extends AnnotatorModel[BertEmbeddings] with HasBatchedAnnotate[BertEmbeddings] with WriteTensorflowModel with HasEmbeddingsProperties with HasStorageRef with HasCaseSensitiveProperties

BERT (Bidirectional Encoder Representations from Transformers) provides dense vector representations for natural language by using a deep, pre-trained neural network with the Transformer architecture
BERT (Bidirectional Encoder Representations from Transformers) provides dense vector representations for natural language by using a deep, pre-trained neural network with the Transformer architecture
See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/embeddings/BertEmbeddingsTestSpec.scala for further reference on how to use this API. Sources:
Sources :
https://arxiv.org/abs/1810.04805
https://github.com/google-research/bert
Paper abstract
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).
class BertSentenceEmbeddings extends AnnotatorModel[BertSentenceEmbeddings] with HasBatchedAnnotate[BertSentenceEmbeddings] with WriteTensorflowModel with HasEmbeddingsProperties with HasStorageRef with HasCaseSensitiveProperties

BERT (Bidirectional Encoder Representations from Transformers) provides dense vector representations for natural language by using a deep, pre-trained neural network with the Transformer architecture
BERT (Bidirectional Encoder Representations from Transformers) provides dense vector representations for natural language by using a deep, pre-trained neural network with the Transformer architecture
See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/embeddings/BertSentenceEmbeddingsTestSpec.scala for further reference on how to use this API. Sources:
Sources :
https://arxiv.org/abs/1810.04805
https://github.com/google-research/bert
Paper abstract
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).
class ChunkEmbeddings extends AnnotatorModel[ChunkEmbeddings] with HasSimpleAnnotate[ChunkEmbeddings]

This annotator utilizes WordEmbeddings or BertEmbeddings to generate chunk embeddings from either Chunker, NGramGenerator, or NerConverter outputs.
This annotator utilizes WordEmbeddings or BertEmbeddings to generate chunk embeddings from either Chunker, NGramGenerator, or NerConverter outputs.
TIP:
How to explode and convert these embeddings into Vectors or what’s known as Feature column so it can be used in Spark ML regression or clustering functions:
```
import org.apache.spark.ml.linalg.{Vector, Vectors}

// Let's create a UDF to take array of embeddings and output Vectors
val convertToVectorUDF = udf((matrix : Seq[Float]) => {
    Vectors.dense(matrix.toArray.map(_.toDouble))
})

// Now let's explode the sentence_embeddings column and have a new feature column for Spark ML
pipelineDF.select(explode($"chunk_embeddings.embeddings").as("chunk_embeddings_exploded"))
.withColumn("features", convertToVectorUDF($"chunk_embeddings_exploded"))
```
See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/embeddings/ChunkEmbeddingsTestSpec.scala for further reference on how to use this API.
class DistilBertEmbeddings extends AnnotatorModel[DistilBertEmbeddings] with HasBatchedAnnotate[DistilBertEmbeddings] with WriteTensorflowModel with HasEmbeddingsProperties with HasStorageRef with HasCaseSensitiveProperties

The DistilBERT model was proposed in the paper DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter https://arxiv.org/abs/1910.01108.
The DistilBERT model was proposed in the paper DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter https://arxiv.org/abs/1910.01108. DistilBERT is a small, fast, cheap and light Transformer model trained by distilling BERT base. It has 40% less parameters than bert-base-uncased, runs 60% faster while preserving over 95% of BERT's performances as measured on the GLUE language understanding benchmark.
The abstract from the paper is the following:
As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains challenging. In this work, we propose a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can then be fine-tuned with good performances on a wide range of tasks like its larger counterparts. While most prior work investigated the use of distillation for building task-specific models, we leverage knowledge distillation during the pretraining phase and show that it is possible to reduce the size of a BERT model by 40%, while retaining 97% of its language understanding capabilities and being 60% faster. To leverage the inductive biases learned by larger models during pretraining, we introduce a triple loss combining language modeling, distillation and cosine-distance losses. Our smaller, faster and lighter model is cheaper to pre-train and we demonstrate its capabilities for on-device computations in a proof-of-concept experiment and a comparative on-device study.
Tips:
- DistilBERT doesn't have :obj:token_type_ids, you don't need to indicate which token belongs to which segment. Just separate your segments with the separation token :obj:tokenizer.sep_token (or :obj:[SEP]).
- DistilBERT doesn't have options to select the input positions (:obj:position_ids input). This could be added if necessary though, just let us know if you need this option.
class ElmoEmbeddings extends AnnotatorModel[ElmoEmbeddings] with HasSimpleAnnotate[ElmoEmbeddings] with WriteTensorflowModel with HasEmbeddingsProperties with HasStorageRef with HasCaseSensitiveProperties

Embeddings from a language model trained on the 1 Billion Word Benchmark.
Embeddings from a language model trained on the 1 Billion Word Benchmark.
Note that this is a very computationally expensive module compared to word embedding modules that only perform embedding lookups. The use of an accelerator is recommended.
word_emb: the character-based word representations with shape [batch_size, max_length, 512]. == word_emb
lstm_outputs1: the first LSTM hidden state with shape [batch_size, max_length, 1024]. === lstm_outputs1
lstm_outputs2: the second LSTM hidden state with shape [batch_size, max_length, 1024]. === lstm_outputs2
elmo: the weighted sum of the 3 layers, where the weights are trainable. This tensor has shape [batch_size, max_length, 1024] == elmo
See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/embeddings/ElmoEmbeddingsTestSpec.scala for further reference on how to use this API.
Sources :
https://tfhub.dev/google/elmo/3
https://arxiv.org/abs/1802.05365
Paper abstract :
We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy). Our word vectors are learned functions of the internal states of a deep bidirectional language model (biLM), which is pre-trained on a large text corpus. We show that these representations can be easily added to existing models and significantly improve the state of the art across six challenging NLP problems, including question answering, textual entailment and sentiment analysis. We also present an analysis showing that exposing the deep internals of the pre-trained network is crucial, allowing downstream models to mix different types of semi-supervision signals.
trait EmbeddingsCoverage extends AnyRef
trait HasEmbeddingsProperties extends Params
trait ReadAlbertTensorflowModel extends ReadTensorflowModel with ReadSentencePieceModel
trait ReadBertSentenceTensorflowModel extends ReadTensorflowModel
trait ReadBertTensorflowModel extends ReadTensorflowModel
trait ReadDistilBertTensorflowModel extends ReadTensorflowModel
trait ReadElmoTensorflowModel extends ReadTensorflowModel
trait ReadRobertaTensorflowModel extends ReadTensorflowModel
trait ReadUSETensorflowModel extends ReadTensorflowModel
trait ReadXlmRobertaTensorflowModel extends ReadTensorflowModel with ReadSentencePieceModel
trait ReadXlnetTensorflowModel extends ReadTensorflowModel with ReadSentencePieceModel
trait ReadablePretrainedAlbertModel extends ParamsAndFeaturesReadable[AlbertEmbeddings] with HasPretrained[AlbertEmbeddings]
trait ReadablePretrainedBertModel extends ParamsAndFeaturesReadable[BertEmbeddings] with HasPretrained[BertEmbeddings]
trait ReadablePretrainedBertSentenceModel extends ParamsAndFeaturesReadable[BertSentenceEmbeddings] with HasPretrained[BertSentenceEmbeddings]
trait ReadablePretrainedDistilBertModel extends ParamsAndFeaturesReadable[DistilBertEmbeddings] with HasPretrained[DistilBertEmbeddings]
trait ReadablePretrainedElmoModel extends ParamsAndFeaturesReadable[ElmoEmbeddings] with HasPretrained[ElmoEmbeddings]
trait ReadablePretrainedRobertaModel extends ParamsAndFeaturesReadable[RoBertaEmbeddings] with HasPretrained[RoBertaEmbeddings]
trait ReadablePretrainedUSEModel extends ParamsAndFeaturesReadable[UniversalSentenceEncoder] with HasPretrained[UniversalSentenceEncoder]
trait ReadablePretrainedWordEmbeddings extends StorageReadable[WordEmbeddingsModel] with HasPretrained[WordEmbeddingsModel]
trait ReadablePretrainedXlmRobertaModel extends ParamsAndFeaturesReadable[XlmRoBertaEmbeddings] with HasPretrained[XlmRoBertaEmbeddings]
trait ReadablePretrainedXlnetModel extends ParamsAndFeaturesReadable[XlnetEmbeddings] with HasPretrained[XlnetEmbeddings]
trait ReadsFromBytes extends AnyRef
class RoBertaEmbeddings extends AnnotatorModel[RoBertaEmbeddings] with HasBatchedAnnotate[RoBertaEmbeddings] with WriteTensorflowModel with HasEmbeddingsProperties with HasStorageRef with HasCaseSensitiveProperties

The RoBERTa model was proposed in RoBERTa: A Robustly Optimized BERT Pretraining Approach https://arxiv.org/abs/1907.11692> by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
The RoBERTa model was proposed in RoBERTa: A Robustly Optimized BERT Pretraining Approach https://arxiv.org/abs/1907.11692> by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. It is based on Google's BERT model released in 2018.
It builds on BERT and modifies key hyperparameters, removing the next-sentence pretraining objective and training with much larger mini-batches and learning rates. The abstract from the paper is the following:
Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes, and, as we will show, hyperparameter choices have significant impact on the final results. We present a replication study of BERT pretraining (Devlin et al., 2019) that carefully measures the impact of many key hyperparameters and training data size. We find that BERT was significantly undertrained, and can match or exceed the performance of every model published after it. Our best model achieves state-of-the-art results on GLUE, RACE and SQuAD. These results highlight the importance of previously overlooked design choices, and raise questions about the source of recently reported improvements. We release our models and code.*
Tips:
- RoBERTa has the same architecture as BERT, but uses a byte-level BPE as a tokenizer (same as GPT-2) and uses a different pretraining scheme.
- RoBERTa doesn't have :obj:token_type_ids, you don't need to indicate which token belongs to which segment. Just separate your segments with the separation token :obj:tokenizer.sep_token (or :obj:</s>)
The original code can be found here https://github.com/pytorch/fairseq/tree/master/examples/roberta.
class SentenceEmbeddings extends AnnotatorModel[SentenceEmbeddings] with HasSimpleAnnotate[SentenceEmbeddings] with HasEmbeddingsProperties with HasStorageRef

This annotator converts the results from WordEmbeddings, BertEmbeddings, or ElmoEmbeddings into sentence or document embeddings by either summing up or averaging all the word embeddings in a sentence or a document (depending on the inputCols).
This annotator converts the results from WordEmbeddings, BertEmbeddings, or ElmoEmbeddings into sentence or document embeddings by either summing up or averaging all the word embeddings in a sentence or a document (depending on the inputCols).
See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/embeddings/SentenceEmbeddingsTestSpec.scala for further reference on how to use this API.
class UniversalSentenceEncoder extends AnnotatorModel[UniversalSentenceEncoder] with HasSimpleAnnotate[UniversalSentenceEncoder] with HasEmbeddingsProperties with HasStorageRef with WriteTensorflowModel

The Universal Sentence Encoder encodes text into high dimensional vectors that can be used for text classification, semantic similarity, clustering and other natural language tasks.
The Universal Sentence Encoder encodes text into high dimensional vectors that can be used for text classification, semantic similarity, clustering and other natural language tasks.
See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/embeddings/UniversalSentenceEncoderTestSpec.scala for further reference on how to use this API.
Sources :
https://arxiv.org/abs/1803.11175
https://tfhub.dev/google/universal-sentence-encoder/2
Paper abstract: We present models for encoding sentences into embedding vectors that specifically target transfer learning to other NLP tasks. The models are efficient and result in accurate performance on diverse transfer tasks. Two variants of the encoding models allow for trade-offs between accuracy and compute resources. For both variants, we investigate and report the relationship between model complexity, resource consumption, the availability of transfer task training data, and task performance. Comparisons are made with baselines that use word level transfer learning via pretrained word embeddings as well as baselines do not use any transfer learning. We find that transfer learning using sentence embeddings tends to outperform word level transfer. With transfer learning via sentence embeddings, we observe surprisingly good performance with minimal amounts of supervised training data for a transfer task. We obtain encouraging results on Word Embedding Association Tests (WEAT) targeted at detecting model bias. Our pre-trained sentence encoding models are made freely available for download and on TF Hub.
class WordEmbeddings extends AnnotatorApproach[WordEmbeddingsModel] with HasStorage with HasEmbeddingsProperties

Word Embeddings lookup annotator that maps tokens to vectors.
Word Embeddings lookup annotator that maps tokens to vectors. See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/embeddings/WordEmbeddingsTestSpec.scala for further reference on how to use this API.
There are also two convenient functions to retrieve the embeddings coverage with respect to the transformed dataset:
withCoverageColumn(dataset, embeddingsCol, outputCol): Adds a custom column with word coverage stats for the embedded field: (coveredWords, totalWords, coveragePercentage). This creates a new column with statistics for each row.
overallCoverage(dataset, embeddingsCol): Calculates overall word coverage for the whole data in the embedded field. This returns a single coverage object considering all rows in the field.
class WordEmbeddingsModel extends AnnotatorModel[WordEmbeddingsModel] with HasSimpleAnnotate[WordEmbeddingsModel] with HasEmbeddingsProperties with HasStorageModel with ParamsAndFeaturesWritable

Word Embeddings lookup annotator that maps tokens to vectors
Word Embeddings lookup annotator that maps tokens to vectors
See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/ner/NerConverterTest.scala for example usage of this API.
class WordEmbeddingsReader extends StorageReader[Array[Float]] with ReadsFromBytes
class WordEmbeddingsWriter extends StorageBatchWriter[Array[Float]] with ReadsFromBytes
class XlmRoBertaEmbeddings extends AnnotatorModel[XlmRoBertaEmbeddings] with HasBatchedAnnotate[XlmRoBertaEmbeddings] with WriteTensorflowModel with WriteSentencePieceModel with HasEmbeddingsProperties with HasStorageRef with HasCaseSensitiveProperties

The XLM-RoBERTa model was proposed in Unsupervised Cross-lingual Representation Learning at Scale https://arxiv.org/abs/1911.02116 by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco GuzmÃ¡n, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
The XLM-RoBERTa model was proposed in Unsupervised Cross-lingual Representation Learning at Scale https://arxiv.org/abs/1911.02116 by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco GuzmÃ¡n, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. It is based on Facebook's RoBERTa model released in 2019. It is a large multi-lingual language model, trained on 2.5TB of filtered CommonCrawl data.
The abstract from the paper is the following:
This paper shows that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks. We train a Transformer-based masked language model on one hundred languages, using more than two terabytes of filtered CommonCrawl data. Our model, dubbed XLM-R, significantly outperforms multilingual BERT (mBERT) on a variety of cross-lingual benchmarks, including +13.8% average accuracy on XNLI, +12.3% average F1 score on MLQA, and +2.1% average F1 score on NER. XLM-R performs particularly well on low-resource languages, improving 11.8% in XNLI accuracy for Swahili and 9.2% for Urdu over the previous XLM model. We also present a detailed empirical evaluation of the key factors that are required to achieve these gains, including the trade-offs between (1) positive transfer and capacity dilution and (2) the performance of high and low resource languages at scale. Finally, we show, for the first time, the possibility of multilingual modeling without sacrificing per-language performance; XLM-Ris very competitive with strong monolingual models on the GLUE and XNLI benchmarks. We will make XLM-R code, data, and models publicly available.
Tips:
- XLM-RoBERTa is a multilingual model trained on 100 different languages. Unlike some XLM multilingual models, it does not require lang parameter to understand which language is used, and should be able to determine the correct language from the input ids. - This implementation is the same as RoBERTa. Refer to the com.johnsnowlabs.nlp.embeddings.RoBertaEmbeddings for usage examples as well as the information relative to the inputs and outputs.
class XlnetEmbeddings extends AnnotatorModel[XlnetEmbeddings] with HasSimpleAnnotate[XlnetEmbeddings] with WriteTensorflowModel with WriteSentencePieceModel with HasEmbeddingsProperties with HasStorageRef with HasCaseSensitiveProperties

XlnetEmbeddings (XLNet): Generalized Autoregressive Pretraining for Language Understanding
XlnetEmbeddings (XLNet): Generalized Autoregressive Pretraining for Language Understanding
Note that this is a very computationally expensive module compared to word embedding modules that only perform embedding lookups. The use of an accelerator is recommended.
XLNet is a new unsupervised language representation learning method based on a novel generalized permutation language modeling objective. Additionally, XLNet employs Transformer-XL as the backbone model, exhibiting excellent performance for language tasks involving long context. Overall, XLNet achieves state-of-the-art (SOTA) results on various downstream language tasks including question answering, natural language inference, sentiment analysis, and document ranking.
XLNet-Large = https://storage.googleapis.com/xlnet/released_models/cased_L-24_H-1024_A-16.zip | 24-layer, 1024-hidden, 16-heads XLNet-Base = https://storage.googleapis.com/xlnet/released_models/cased_L-12_H-768_A-12.zip | 12-layer, 768-hidden, 12-heads. This model is trained on full data (different from the one in the paper).

Value Members

object AlbertEmbeddings extends ReadablePretrainedAlbertModel with ReadAlbertTensorflowModel with ReadSentencePieceModel with Serializable
object BertEmbeddings extends ReadablePretrainedBertModel with ReadBertTensorflowModel with Serializable
object BertSentenceEmbeddings extends ReadablePretrainedBertSentenceModel with ReadBertSentenceTensorflowModel with Serializable
object ChunkEmbeddings extends DefaultParamsReadable[ChunkEmbeddings] with Serializable
object DistilBertEmbeddings extends ReadablePretrainedDistilBertModel with ReadDistilBertTensorflowModel with Serializable
object ElmoEmbeddings extends ReadablePretrainedElmoModel with ReadElmoTensorflowModel with Serializable
object PoolingStrategy
object RoBertaEmbeddings extends ReadablePretrainedRobertaModel with ReadRobertaTensorflowModel with Serializable
object SentenceEmbeddings extends DefaultParamsReadable[SentenceEmbeddings] with Serializable
object UniversalSentenceEncoder extends ReadablePretrainedUSEModel with ReadUSETensorflowModel with Serializable
object WordEmbeddings extends DefaultParamsReadable[WordEmbeddings] with Serializable
object WordEmbeddingsBinaryIndexer
object WordEmbeddingsModel extends ReadablePretrainedWordEmbeddings with EmbeddingsCoverage with Serializable
object WordEmbeddingsTextIndexer
object XlmRoBertaEmbeddings extends ReadablePretrainedXlmRobertaModel with ReadXlmRobertaTensorflowModel with Serializable
object XlnetEmbeddings extends ReadablePretrainedXlnetModel with ReadXlnetTensorflowModel with ReadSentencePieceModel with Serializable

package embeddings

Type Members

class AlbertEmbeddings extends AnnotatorModel[AlbertEmbeddings] with HasSimpleAnnotate[AlbertEmbeddings] with WriteTensorflowModel with WriteSentencePieceModel with HasEmbeddingsProperties with HasStorageRef with HasCaseSensitiveProperties

class BertEmbeddings extends AnnotatorModel[BertEmbeddings] with HasBatchedAnnotate[BertEmbeddings] with WriteTensorflowModel with HasEmbeddingsProperties with HasStorageRef with HasCaseSensitiveProperties

class BertSentenceEmbeddings extends AnnotatorModel[BertSentenceEmbeddings] with HasBatchedAnnotate[BertSentenceEmbeddings] with WriteTensorflowModel with HasEmbeddingsProperties with HasStorageRef with HasCaseSensitiveProperties

class ChunkEmbeddings extends AnnotatorModel[ChunkEmbeddings] with HasSimpleAnnotate[ChunkEmbeddings]

class DistilBertEmbeddings extends AnnotatorModel[DistilBertEmbeddings] with HasBatchedAnnotate[DistilBertEmbeddings] with WriteTensorflowModel with HasEmbeddingsProperties with HasStorageRef with HasCaseSensitiveProperties

class ElmoEmbeddings extends AnnotatorModel[ElmoEmbeddings] with HasSimpleAnnotate[ElmoEmbeddings] with WriteTensorflowModel with HasEmbeddingsProperties with HasStorageRef with HasCaseSensitiveProperties

trait EmbeddingsCoverage extends AnyRef

trait HasEmbeddingsProperties extends Params

trait ReadAlbertTensorflowModel extends ReadTensorflowModel with ReadSentencePieceModel

trait ReadBertSentenceTensorflowModel extends ReadTensorflowModel

trait ReadBertTensorflowModel extends ReadTensorflowModel

trait ReadDistilBertTensorflowModel extends ReadTensorflowModel

trait ReadElmoTensorflowModel extends ReadTensorflowModel

trait ReadRobertaTensorflowModel extends ReadTensorflowModel

trait ReadUSETensorflowModel extends ReadTensorflowModel

trait ReadXlmRobertaTensorflowModel extends ReadTensorflowModel with ReadSentencePieceModel

trait ReadXlnetTensorflowModel extends ReadTensorflowModel with ReadSentencePieceModel

trait ReadablePretrainedAlbertModel extends ParamsAndFeaturesReadable[AlbertEmbeddings] with HasPretrained[AlbertEmbeddings]

trait ReadablePretrainedBertModel extends ParamsAndFeaturesReadable[BertEmbeddings] with HasPretrained[BertEmbeddings]

trait ReadablePretrainedBertSentenceModel extends ParamsAndFeaturesReadable[BertSentenceEmbeddings] with HasPretrained[BertSentenceEmbeddings]

trait ReadablePretrainedDistilBertModel extends ParamsAndFeaturesReadable[DistilBertEmbeddings] with HasPretrained[DistilBertEmbeddings]

trait ReadablePretrainedElmoModel extends ParamsAndFeaturesReadable[ElmoEmbeddings] with HasPretrained[ElmoEmbeddings]

trait ReadablePretrainedRobertaModel extends ParamsAndFeaturesReadable[RoBertaEmbeddings] with HasPretrained[RoBertaEmbeddings]

trait ReadablePretrainedUSEModel extends ParamsAndFeaturesReadable[UniversalSentenceEncoder] with HasPretrained[UniversalSentenceEncoder]

trait ReadablePretrainedWordEmbeddings extends StorageReadable[WordEmbeddingsModel] with HasPretrained[WordEmbeddingsModel]

trait ReadablePretrainedXlmRobertaModel extends ParamsAndFeaturesReadable[XlmRoBertaEmbeddings] with HasPretrained[XlmRoBertaEmbeddings]

trait ReadablePretrainedXlnetModel extends ParamsAndFeaturesReadable[XlnetEmbeddings] with HasPretrained[XlnetEmbeddings]

trait ReadsFromBytes extends AnyRef

class RoBertaEmbeddings extends AnnotatorModel[RoBertaEmbeddings] with HasBatchedAnnotate[RoBertaEmbeddings] with WriteTensorflowModel with HasEmbeddingsProperties with HasStorageRef with HasCaseSensitiveProperties

class SentenceEmbeddings extends AnnotatorModel[SentenceEmbeddings] with HasSimpleAnnotate[SentenceEmbeddings] with HasEmbeddingsProperties with HasStorageRef

class UniversalSentenceEncoder extends AnnotatorModel[UniversalSentenceEncoder] with HasSimpleAnnotate[UniversalSentenceEncoder] with HasEmbeddingsProperties with HasStorageRef with WriteTensorflowModel

class WordEmbeddings extends AnnotatorApproach[WordEmbeddingsModel] with HasStorage with HasEmbeddingsProperties

class WordEmbeddingsModel extends AnnotatorModel[WordEmbeddingsModel] with HasSimpleAnnotate[WordEmbeddingsModel] with HasEmbeddingsProperties with HasStorageModel with ParamsAndFeaturesWritable

class WordEmbeddingsReader extends StorageReader[Array[Float]] with ReadsFromBytes

class WordEmbeddingsWriter extends StorageBatchWriter[Array[Float]] with ReadsFromBytes

class XlmRoBertaEmbeddings extends AnnotatorModel[XlmRoBertaEmbeddings] with HasBatchedAnnotate[XlmRoBertaEmbeddings] with WriteTensorflowModel with WriteSentencePieceModel with HasEmbeddingsProperties with HasStorageRef with HasCaseSensitiveProperties

class XlnetEmbeddings extends AnnotatorModel[XlnetEmbeddings] with HasSimpleAnnotate[XlnetEmbeddings] with WriteTensorflowModel with WriteSentencePieceModel with HasEmbeddingsProperties with HasStorageRef with HasCaseSensitiveProperties

Value Members

object AlbertEmbeddings extends ReadablePretrainedAlbertModel with ReadAlbertTensorflowModel with ReadSentencePieceModel with Serializable

object BertEmbeddings extends ReadablePretrainedBertModel with ReadBertTensorflowModel with Serializable

object BertSentenceEmbeddings extends ReadablePretrainedBertSentenceModel with ReadBertSentenceTensorflowModel with Serializable

object ChunkEmbeddings extends DefaultParamsReadable[ChunkEmbeddings] with Serializable

object DistilBertEmbeddings extends ReadablePretrainedDistilBertModel with ReadDistilBertTensorflowModel with Serializable

object ElmoEmbeddings extends ReadablePretrainedElmoModel with ReadElmoTensorflowModel with Serializable

object PoolingStrategy

object RoBertaEmbeddings extends ReadablePretrainedRobertaModel with ReadRobertaTensorflowModel with Serializable

object SentenceEmbeddings extends DefaultParamsReadable[SentenceEmbeddings] with Serializable

object UniversalSentenceEncoder extends ReadablePretrainedUSEModel with ReadUSETensorflowModel with Serializable

object WordEmbeddings extends DefaultParamsReadable[WordEmbeddings] with Serializable

object WordEmbeddingsBinaryIndexer

object WordEmbeddingsModel extends ReadablePretrainedWordEmbeddings with EmbeddingsCoverage with Serializable

object WordEmbeddingsTextIndexer

object XlmRoBertaEmbeddings extends ReadablePretrainedXlmRobertaModel with ReadXlmRobertaTensorflowModel with Serializable

object XlnetEmbeddings extends ReadablePretrainedXlnetModel with ReadXlnetTensorflowModel with ReadSentencePieceModel with Serializable

Ungrouped