Class DocumentProcessor
- java.lang.Object
-
- com.yahoo.component.AbstractComponent
-
- com.yahoo.component.chain.ChainedComponent
-
- com.yahoo.docproc.DocumentProcessor
-
- All Implemented Interfaces:
com.yahoo.component.Component
,com.yahoo.component.Deconstructable
,Comparable<com.yahoo.component.Component>
- Direct Known Subclasses:
JoinerDocumentProcessor
,SimpleDocumentProcessor
,SplitterDocumentProcessor
public abstract class DocumentProcessor extends com.yahoo.component.chain.ChainedComponent
A document processor is a component which performs some operation on a document or document update. Document processors are asynchronous, they may request some data and then return. The processing framework is responsible for calling processors again at unspecified times until they are done processing the document or document update.
Document processor instances are chained together by the framework to realize a complete processing pipeline. The processing chain is represented by the processor instances themselves, see getNext/setNext. Document processors may optionally control the routing through the chain by setting the next processor on ongoing processings.
A processing may contain one or multiple documents or document updates. Document processors may optionally handle collections of processors in some other way than just processing each one in order.
A document processor must have an empty constructor. When instantiated from Vespa config (as opposed to being instantiated programmatically in a stand-alone Docproc system), the framework is responsible for configuring the processor using setConfig(). If a document processor wants to do some initial setup after configuration has been set, but before it has begun processing documents or document updates, it should override initialize().
Document processors must be thread safe. To ensure this, make sure that access to any mutable, thread-unsafe state held in a field by the processor is synchronized.
- Author:
- bratseth
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
DocumentProcessor.LaterProgress
static class
DocumentProcessor.Progress
An enumeration of possible results of calling a process method
-
Constructor Summary
Constructors Constructor Description DocumentProcessor()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description Map<String,String>
getDocMap(String docType)
Map<com.yahoo.collections.Pair<String,String>,String>
getFieldMap()
Schema map for field names (doctype,from)→toabstract DocumentProcessor.Progress
process(Processing processing)
Processes a processing, which can contain zero or more document bases.void
setFieldMap(Map<com.yahoo.collections.Pair<String,String>,String> fieldMap)
Sets the schema map for field namesString
toString()
-
Methods inherited from class com.yahoo.component.chain.ChainedComponent
getAnnotatedDependencies, getDefaultAnnotatedDependencies, getDependencies, initDependencies
-
-
-
-
Method Detail
-
process
public abstract DocumentProcessor.Progress process(Processing processing)
Processes a processing, which can contain zero or more document bases. The implementing document processor is free to modify, replace or delete elements in the list inside processing.- Parameters:
processing
- the processing to process- Returns:
- the outcome of this processing
-
toString
public String toString()
- Overrides:
toString
in classcom.yahoo.component.AbstractComponent
-
setFieldMap
public void setFieldMap(Map<com.yahoo.collections.Pair<String,String>,String> fieldMap)
Sets the schema map for field names
-
getFieldMap
public Map<com.yahoo.collections.Pair<String,String>,String> getFieldMap()
Schema map for field names (doctype,from)→to
-
-