Class/Object

org.clulab.processors.clu

CluProcessor

Related Docs: object CluProcessor | package clu

Permalink

class CluProcessor extends Processor with Configured

Processor that uses only tools that are under Apache License Currently supports: tokenization (in-house), lemmatization (Morpha, copied in our repo to minimize dependencies), POS tagging, NER, chunking, dependency parsing - using our MTL architecture (dep parsing coming soon)

Linear Supertypes
Configured, Processor, AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. CluProcessor
  2. Configured
  3. Processor
  4. AnyRef
  5. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new CluProcessor(config: Config = ConfigFactory.load("cluprocessor"), optionalNER: Option[LexiconNER] = None, seasonPathOpt: Option[String] = None)

    Permalink
  2. new CluProcessor(config: Config, optionalNER: Option[LexiconNER], numericEntityRecognizerOpt: Option[NumericEntityRecognizer], internStringsOpt: Option[Boolean], localTokenizerOpt: Option[Tokenizer], lemmatizerOpt: Option[Lemmatizer], mtlPosChunkSrlpOpt: Option[Metal], mtlNerOpt: Option[Metal], mtlSrlaOpt: Option[Metal], mtlDepsHeadOpt: Option[Metal], mtlDepsLabelOpt: Option[Metal])

    Permalink
    Attributes
    protected

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def annotate(doc: Document): Document

    Permalink

    Annotate the given document, returning an annotated document.

    Annotate the given document, returning an annotated document. The default implementation is an NLP pipeline of side-effecting calls.

    Definition Classes
    CluProcessorProcessor
  5. def annotate(text: String, keepText: Boolean = false): Document

    Permalink

    Annotate the given text string, specify whether to retain the text in the resultant Document.

    Annotate the given text string, specify whether to retain the text in the resultant Document.

    Definition Classes
    CluProcessorProcessor
  6. def annotateFromSentences(sentences: Iterable[String], keepText: Boolean = false): Document

    Permalink

    Annotate the given sentences, specify whether to retain the text in the resultant Document.

    Annotate the given sentences, specify whether to retain the text in the resultant Document.

    Definition Classes
    Processor
  7. def annotateFromTokens(sentences: Iterable[Iterable[String]], keepText: Boolean = false): Document

    Permalink

    Annotate the given tokens, specify whether to retain the text in the resultant Document.

    Annotate the given tokens, specify whether to retain the text in the resultant Document.

    Definition Classes
    Processor
  8. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  9. def basicSanityCheck(doc: Document): Unit

    Permalink
  10. def cheapLemmatize(doc: Document): Unit

    Permalink

    Generates cheap lemmas with the word in lower case, for languages where a lemmatizer is not available

  11. def chunking(doc: Document): Unit

    Permalink

    Shallow parsing; modifies the document in place

    Shallow parsing; modifies the document in place

    Definition Classes
    CluProcessorProcessor
  12. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  13. def combineDocuments(documents: IndexedSeq[Document], combinedTextOpt: Option[String]): Document

    Permalink
    Attributes
    protected
    Definition Classes
    Processor
  14. val config: Config

    Permalink
  15. def contains(argPath: String): Boolean

    Permalink
    Definition Classes
    Configured
  16. def copy(configOpt: Option[Config] = None, optionalNEROpt: Option[Option[LexiconNER]] = None, numericEntityRecognizerOptOpt: Option[Option[NumericEntityRecognizer]] = None, internStringsOptOpt: Option[Option[Boolean]] = None, localTokenizerOptOpt: Option[Option[Tokenizer]] = None, lemmatizerOptOpt: Option[Option[Lemmatizer]] = None, mtlPosChunkSrlpOptOpt: Option[Option[Metal]] = None, mtlNerOptOpt: Option[Option[Metal]] = None, mtlSrlaOptOpt: Option[Option[Metal]] = None, mtlDepsHeadOptOpt: Option[Option[Metal]] = None, mtlDepsLabelOptOpt: Option[Option[Metal]] = None): CluProcessor

    Permalink
  17. def discourse(doc: Document): Unit

    Permalink

    Discourse parsing; modifies the document in place

    Discourse parsing; modifies the document in place

    Definition Classes
    CluProcessorProcessor
  18. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  19. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  20. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  21. def getArgBoolean(argPath: String, defaultValue: Option[Boolean]): Boolean

    Permalink
    Definition Classes
    Configured
  22. def getArgFloat(argPath: String, defaultValue: Option[Float]): Float

    Permalink
    Definition Classes
    Configured
  23. def getArgInt(argPath: String, defaultValue: Option[Int]): Int

    Permalink
    Definition Classes
    Configured
  24. def getArgString(argPath: String, defaultValue: Option[String]): String

    Permalink
    Definition Classes
    Configured
  25. def getArgStrings(argPath: String, defaultValue: Option[Seq[String]]): Seq[String]

    Permalink
    Definition Classes
    Configured
  26. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  27. def getConf: Config

    Permalink
    Definition Classes
    CluProcessorConfigured
  28. def getEmbeddings(doc: Document): ConstEmbeddingParameters

    Permalink
  29. def getPredicateIndexes(preds: IndexedSeq[String]): IndexedSeq[Int]

    Permalink

    Gets the index of all predicates in this sentence

  30. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  31. val internStrings: Boolean

    Permalink
  32. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  33. lazy val isPreparedToAnnotate: Boolean

    Permalink
    Attributes
    protected
  34. val lazyLemmatizer: Lazy[Lemmatizer]

    Permalink
    Attributes
    protected
  35. val lazyMtlDepsHead: Lazy[Metal]

    Permalink
    Attributes
    protected
  36. val lazyMtlDepsLabel: Lazy[Metal]

    Permalink
    Attributes
    protected
  37. val lazyMtlNer: Lazy[Metal]

    Permalink
    Attributes
    protected
  38. val lazyMtlPosChunkSrlp: Lazy[Metal]

    Permalink
    Attributes
    protected
  39. val lazyMtlSrla: Lazy[Metal]

    Permalink
    Attributes
    protected
  40. val lazyNumericEntityRecognizer: Lazy[NumericEntityRecognizer]

    Permalink
    Attributes
    protected
  41. val lazyTokenizer: Lazy[Tokenizer]

    Permalink
    Attributes
    protected
  42. def lemmatize(doc: Document): Unit

    Permalink

    Lematization; modifies the document in place

    Lematization; modifies the document in place

    Definition Classes
    CluProcessorProcessor
  43. def lemmatizer: Lemmatizer

    Permalink
  44. def mkCombinedDocument(texts: IndexedSeq[String], trailers: IndexedSeq[String], keepText: Boolean = false): Document

    Permalink
    Definition Classes
    Processor
  45. def mkConstEmbeddings(doc: Document): Unit

    Permalink
  46. def mkDocument(text: String, keepText: Boolean = false): Document

    Permalink

    Constructs a document of tokens from free text; includes sentence splitting and tokenization

    Constructs a document of tokens from free text; includes sentence splitting and tokenization

    Definition Classes
    CluProcessorProcessor
  47. def mkDocumentFromSentences(sentences: Iterable[String], keepText: Boolean = false, charactersBetweenSentences: Int = 1): Document

    Permalink

    Constructs a document of tokens from an array of untokenized sentences

    Constructs a document of tokens from an array of untokenized sentences

    Definition Classes
    CluProcessorProcessor
  48. def mkDocumentFromTokens(sentences: Iterable[Iterable[String]], keepText: Boolean = false, charactersBetweenSentences: Int = 1, charactersBetweenTokens: Int = 1): Document

    Permalink

    Constructs a document of tokens from an array of tokenized sentences

    Constructs a document of tokens from an array of tokenized sentences

    Definition Classes
    CluProcessorProcessor
  49. lazy val mtlCase: Metal

    Permalink
  50. def mtlDepsHead: Metal

    Permalink
  51. def mtlDepsLabel: Metal

    Permalink
  52. def mtlNer: Metal

    Permalink
  53. def mtlPosChunkSrlp: Metal

    Permalink
  54. def mtlSrla: Metal

    Permalink
  55. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  56. def nerSentence(words: Array[String], lemmas: Option[Array[String]], tags: Array[String], startCharOffsets: Array[Int], endCharOffsets: Array[Int], docDateOpt: Option[String], embeddings: ConstEmbeddingParameters): (IndexedSeq[String], Option[IndexedSeq[String]])

    Permalink

    Produces NE labels for one sentence

  57. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  58. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  59. def numericEntityRecognizer: NumericEntityRecognizer

    Permalink
  60. val optionalNER: Option[LexiconNER]

    Permalink
  61. def parse(doc: Document): Unit

    Permalink

    Syntactic parsing; modifies the document in place

    Syntactic parsing; modifies the document in place

    Definition Classes
    CluProcessorProcessor
  62. def parseSentenceWithEisner(words: IndexedSeq[String], posTags: IndexedSeq[String], nerLabels: IndexedSeq[String], embeddings: ConstEmbeddingParameters): Array[(Int, String)]

    Permalink

    Dependency parsing with the Eisner algorithm

  63. def parserPostProcessing(sentence: Sentence, headsWithLabels: Array[(Int, String)]): Unit

    Permalink

    Deterministic corrections for dependency parsing

  64. def recognizeNamedEntities(doc: Document): Unit

    Permalink

    NER; modifies the document in place

    NER; modifies the document in place

    Definition Classes
    CluProcessorProcessor
  65. def relationExtraction(doc: Document): Unit

    Permalink

    Relation extraction; modifies the document in place.

    Relation extraction; modifies the document in place.

    Definition Classes
    CluProcessorProcessor
  66. def removeNumericLabels(allLabels: Array[String]): Array[String]

    Permalink
  67. def resolveCoreference(doc: Document): Unit

    Permalink

    Coreference resolution; modifies the document in place

    Coreference resolution; modifies the document in place

    Definition Classes
    CluProcessorProcessor
  68. def restoreCase(doc: Document): Unit

    Permalink

    Restores the correct case for all words in a given document

  69. def srl(doc: Document): Unit

    Permalink

    Semantic role labeling

    Semantic role labeling

    Definition Classes
    CluProcessorProcessor
  70. def srlSentence(words: IndexedSeq[String], posTags: IndexedSeq[String], nerLabels: IndexedSeq[String], predicateIndexes: IndexedSeq[Int], embeddings: ConstEmbeddingParameters): DirectedGraph[String]

    Permalink

    Produces semantic role frames for one sentence

  71. def srlSentence(sent: Sentence, predicateIndexes: IndexedSeq[Int], embeddings: ConstEmbeddingParameters): DirectedGraph[String]

    Permalink

    Dependency parsing - OLD greedy algorithm

  72. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  73. def tagPartsOfSpeech(doc: Document): Unit

    Permalink

    Part of speech tagging + chunking + SRL (predicates), jointly

    Part of speech tagging + chunking + SRL (predicates), jointly

    Definition Classes
    CluProcessorProcessor
  74. def tagSentence(words: IndexedSeq[String], embeddings: ConstEmbeddingParameters): (IndexedSeq[String], IndexedSeq[String], IndexedSeq[String])

    Permalink

    Produces POS tags, chunks, and semantic role predicates for one sentence

  75. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  76. def tokenizer: Tokenizer

    Permalink
  77. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  78. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  79. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Configured

Inherited from Processor

Inherited from AnyRef

Inherited from Any

Ungrouped