|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.lucene.index.LiveIndexWriterConfig org.apache.lucene.index.IndexWriterConfig
public final class IndexWriterConfig
Holds all the configuration that is used to create an IndexWriter
.
Once IndexWriter
has been created with this object, changes to this
object will not affect the IndexWriter
instance. For that, use
LiveIndexWriterConfig
that is returned from IndexWriter.getConfig()
.
All setter methods return IndexWriterConfig
to allow chaining
settings conveniently, for example:
IndexWriterConfig conf = new IndexWriterConfig(analyzer); conf.setter1().setter2();
IndexWriter.getConfig()
Nested Class Summary | |
---|---|
static class |
IndexWriterConfig.OpenMode
Specifies the open mode for IndexWriter . |
Field Summary | |
---|---|
static int |
DEFAULT_MAX_BUFFERED_DELETE_TERMS
Disabled by default (because IndexWriter flushes by RAM usage by default). |
static int |
DEFAULT_MAX_BUFFERED_DOCS
Disabled by default (because IndexWriter flushes by RAM usage by default). |
static int |
DEFAULT_MAX_THREAD_STATES
The maximum number of simultaneous threads that may be indexing documents at once in IndexWriter; if more than this many threads arrive they will wait for others to finish. |
static double |
DEFAULT_RAM_BUFFER_SIZE_MB
Default value is 16 MB (which means flush when buffered docs consume approximately 16 MB RAM). |
static int |
DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
Default value is 1945. |
static boolean |
DEFAULT_READER_POOLING
Default setting for setReaderPooling(boolean) . |
static int |
DEFAULT_READER_TERMS_INDEX_DIVISOR
Default value is 1. |
static int |
DEFAULT_TERM_INDEX_INTERVAL
Default value is 32. |
static int |
DISABLE_AUTO_FLUSH
Denotes a flush trigger is disabled. |
static long |
WRITE_LOCK_TIMEOUT
Default value for the write lock timeout (1,000 ms). |
Fields inherited from class org.apache.lucene.index.LiveIndexWriterConfig |
---|
codec, commit, delPolicy, flushPolicy, indexerThreadPool, indexingChain, infoStream, matchVersion, mergePolicy, mergeScheduler, openMode, perThreadHardLimitMB, readerPooling, similarity, writeLockTimeout |
Constructor Summary | |
---|---|
IndexWriterConfig(Version matchVersion,
Analyzer analyzer)
Creates a new config that with defaults that match the specified Version as well as the default Analyzer . |
Method Summary | |
---|---|
IndexWriterConfig |
clone()
|
Analyzer |
getAnalyzer()
Returns the default analyzer to use for indexing documents. |
Codec |
getCodec()
Returns the current Codec . |
static long |
getDefaultWriteLockTimeout()
Returns the default write lock timeout for newly instantiated IndexWriterConfigs. |
IndexCommit |
getIndexCommit()
Returns the IndexCommit as specified in
setIndexCommit(IndexCommit) or the default,
null which specifies to open the latest index commit point. |
IndexDeletionPolicy |
getIndexDeletionPolicy()
Returns the IndexDeletionPolicy specified in
setIndexDeletionPolicy(IndexDeletionPolicy) or
the default KeepOnlyLastCommitDeletionPolicy / |
InfoStream |
getInfoStream()
Returns InfoStream used for debugging. |
int |
getMaxBufferedDeleteTerms()
Returns the number of buffered deleted terms that will trigger a flush of all buffered deletes if enabled. |
int |
getMaxBufferedDocs()
Returns the number of buffered added documents that will trigger a flush if enabled. |
int |
getMaxThreadStates()
Returns the max number of simultaneous threads that may be indexing documents at once in IndexWriter. |
IndexWriter.IndexReaderWarmer |
getMergedSegmentWarmer()
Returns the current merged segment warmer. |
MergePolicy |
getMergePolicy()
Returns the current MergePolicy in use by this writer. |
MergeScheduler |
getMergeScheduler()
Returns the MergeScheduler that was set by
setMergeScheduler(MergeScheduler) . |
IndexWriterConfig.OpenMode |
getOpenMode()
Returns the IndexWriterConfig.OpenMode set by setOpenMode(OpenMode) . |
double |
getRAMBufferSizeMB()
Returns the value set by LiveIndexWriterConfig.setRAMBufferSizeMB(double) if enabled. |
int |
getRAMPerThreadHardLimitMB()
Returns the max amount of memory each DocumentsWriterPerThread can
consume until forcefully flushed. |
boolean |
getReaderPooling()
Returns true if IndexWriter should pool readers even if
DirectoryReader.open(IndexWriter, boolean) has not been called. |
int |
getReaderTermsIndexDivisor()
Returns the termInfosIndexDivisor . |
Similarity |
getSimilarity()
Expert: returns the Similarity implementation used by this
IndexWriter . |
int |
getTermIndexInterval()
Returns the interval between indexed terms. |
long |
getWriteLockTimeout()
Returns allowed timeout when acquiring the write lock. |
IndexWriterConfig |
setCodec(Codec codec)
Set the Codec . |
static void |
setDefaultWriteLockTimeout(long writeLockTimeout)
Sets the default (for any instance) maximum time to wait for a write lock (in milliseconds). |
IndexWriterConfig |
setIndexCommit(IndexCommit commit)
Expert: allows to open a certain commit point. |
IndexWriterConfig |
setIndexDeletionPolicy(IndexDeletionPolicy delPolicy)
Expert: allows an optional IndexDeletionPolicy implementation to be
specified. |
IndexWriterConfig |
setInfoStream(InfoStream infoStream)
If non-null, information about merges, deletes and a message when maxFieldLength is reached will be printed to this. |
IndexWriterConfig |
setInfoStream(PrintStream printStream)
Convenience method that uses PrintStreamInfoStream |
IndexWriterConfig |
setMaxBufferedDeleteTerms(int maxBufferedDeleteTerms)
Determines the minimal number of delete terms required before the buffered in-memory delete terms and queries are applied and flushed. |
IndexWriterConfig |
setMaxBufferedDocs(int maxBufferedDocs)
Determines the minimal number of documents required before the buffered in-memory documents are flushed as a new Segment. |
IndexWriterConfig |
setMaxThreadStates(int maxThreadStates)
Sets the max number of simultaneous threads that may be indexing documents at once in IndexWriter. |
IndexWriterConfig |
setMergedSegmentWarmer(IndexWriter.IndexReaderWarmer mergeSegmentWarmer)
Set the merged segment warmer. |
IndexWriterConfig |
setMergePolicy(MergePolicy mergePolicy)
Expert: MergePolicy is invoked whenever there are changes to the
segments in the index. |
IndexWriterConfig |
setMergeScheduler(MergeScheduler mergeScheduler)
Expert: sets the merge scheduler used by this writer. |
IndexWriterConfig |
setOpenMode(IndexWriterConfig.OpenMode openMode)
Specifies IndexWriterConfig.OpenMode of the index. |
IndexWriterConfig |
setRAMBufferSizeMB(double ramBufferSizeMB)
Determines the amount of RAM that may be used for buffering added documents and deletions before they are flushed to the Directory. |
IndexWriterConfig |
setRAMPerThreadHardLimitMB(int perThreadHardLimitMB)
Expert: Sets the maximum memory consumption per thread triggering a forced flush if exceeded. |
IndexWriterConfig |
setReaderPooling(boolean readerPooling)
By default, IndexWriter does not pool the SegmentReaders it must open for deletions and merging, unless a near-real-time reader has been obtained by calling DirectoryReader.open(IndexWriter, boolean) . |
IndexWriterConfig |
setReaderTermsIndexDivisor(int divisor)
Sets the termsIndexDivisor passed to any readers that IndexWriter opens, for example when applying deletes or creating a near-real-time reader in DirectoryReader.open(IndexWriter, boolean) . |
IndexWriterConfig |
setSimilarity(Similarity similarity)
Expert: set the Similarity implementation used by this IndexWriter. |
IndexWriterConfig |
setTermIndexInterval(int interval)
Expert: set the interval between indexed terms. |
IndexWriterConfig |
setWriteLockTimeout(long writeLockTimeout)
Sets the maximum time to wait for a write lock (in milliseconds) for this instance. |
Methods inherited from class org.apache.lucene.index.LiveIndexWriterConfig |
---|
toString |
Methods inherited from class java.lang.Object |
---|
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public static final int DEFAULT_TERM_INDEX_INTERVAL
setTermIndexInterval(int)
.
public static final int DISABLE_AUTO_FLUSH
public static final int DEFAULT_MAX_BUFFERED_DELETE_TERMS
public static final int DEFAULT_MAX_BUFFERED_DOCS
public static final double DEFAULT_RAM_BUFFER_SIZE_MB
public static long WRITE_LOCK_TIMEOUT
setDefaultWriteLockTimeout(long)
public static final boolean DEFAULT_READER_POOLING
setReaderPooling(boolean)
.
public static final int DEFAULT_READER_TERMS_INDEX_DIVISOR
setReaderTermsIndexDivisor(int)
.
public static final int DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
setRAMPerThreadHardLimitMB(int)
public static final int DEFAULT_MAX_THREAD_STATES
Constructor Detail |
---|
public IndexWriterConfig(Version matchVersion, Analyzer analyzer)
Version
as well as the default Analyzer
. If matchVersion is >= Version.LUCENE_32
, TieredMergePolicy
is used
for merging; else LogByteSizeMergePolicy
.
Note that TieredMergePolicy
is free to select
non-contiguous merges, which means docIDs may not
remain monotonic over time. If this is a problem you
should switch to LogByteSizeMergePolicy
or
LogDocMergePolicy
.
Method Detail |
---|
public static void setDefaultWriteLockTimeout(long writeLockTimeout)
public static long getDefaultWriteLockTimeout()
setDefaultWriteLockTimeout(long)
public IndexWriterConfig clone()
clone
in class Object
public IndexWriterConfig setOpenMode(IndexWriterConfig.OpenMode openMode)
IndexWriterConfig.OpenMode
of the index.
Only takes effect when IndexWriter is first created.
public IndexWriterConfig.OpenMode getOpenMode()
LiveIndexWriterConfig
IndexWriterConfig.OpenMode
set by setOpenMode(OpenMode)
.
getOpenMode
in class LiveIndexWriterConfig
public IndexWriterConfig setIndexDeletionPolicy(IndexDeletionPolicy delPolicy)
IndexDeletionPolicy
implementation to be
specified. You can use this to control when prior commits are deleted from
the index. The default policy is KeepOnlyLastCommitDeletionPolicy
which removes all prior commits as soon as a new commit is done (this
matches behavior before 2.2). Creating your own policy can allow you to
explicitly keep previous "point in time" commits alive in the index for
some time, to allow readers to refresh to the new commit without having the
old commit deleted out from under them. This is necessary on filesystems
like NFS that do not support "delete on last close" semantics, which
Lucene's "point in time" search normally relies on.
NOTE: the deletion policy cannot be null. If null
is
passed, the deletion policy will be set to the default.
Only takes effect when IndexWriter is first created.
public IndexDeletionPolicy getIndexDeletionPolicy()
LiveIndexWriterConfig
IndexDeletionPolicy
specified in
setIndexDeletionPolicy(IndexDeletionPolicy)
or
the default KeepOnlyLastCommitDeletionPolicy
/
getIndexDeletionPolicy
in class LiveIndexWriterConfig
public IndexWriterConfig setIndexCommit(IndexCommit commit)
Only takes effect when IndexWriter is first created.
public IndexCommit getIndexCommit()
LiveIndexWriterConfig
IndexCommit
as specified in
setIndexCommit(IndexCommit)
or the default,
null
which specifies to open the latest index commit point.
getIndexCommit
in class LiveIndexWriterConfig
public IndexWriterConfig setSimilarity(Similarity similarity)
Similarity
implementation used by this IndexWriter.
NOTE: the similarity cannot be null. If null
is passed,
the similarity will be set to the default implementation (unspecified).
Only takes effect when IndexWriter is first created.
public Similarity getSimilarity()
LiveIndexWriterConfig
Similarity
implementation used by this
IndexWriter
.
getSimilarity
in class LiveIndexWriterConfig
public IndexWriterConfig setMergeScheduler(MergeScheduler mergeScheduler)
ConcurrentMergeScheduler
.
NOTE: the merge scheduler cannot be null. If null
is
passed, the merge scheduler will be set to the default.
Only takes effect when IndexWriter is first created.
public MergeScheduler getMergeScheduler()
LiveIndexWriterConfig
MergeScheduler
that was set by
setMergeScheduler(MergeScheduler)
.
getMergeScheduler
in class LiveIndexWriterConfig
public IndexWriterConfig setWriteLockTimeout(long writeLockTimeout)
setDefaultWriteLockTimeout(long)
.
Only takes effect when IndexWriter is first created.
public long getWriteLockTimeout()
LiveIndexWriterConfig
getWriteLockTimeout
in class LiveIndexWriterConfig
setWriteLockTimeout(long)
public IndexWriterConfig setMergePolicy(MergePolicy mergePolicy)
MergePolicy
is invoked whenever there are changes to the
segments in the index. Its role is to select which merges to do, if any,
and return a MergePolicy.MergeSpecification
describing the merges.
It also selects merges to do for forceMerge. (The default is
LogByteSizeMergePolicy
.
Only takes effect when IndexWriter is first created.
public IndexWriterConfig setCodec(Codec codec)
Codec
.
Only takes effect when IndexWriter is first created.
public Codec getCodec()
LiveIndexWriterConfig
Codec
.
getCodec
in class LiveIndexWriterConfig
public MergePolicy getMergePolicy()
LiveIndexWriterConfig
getMergePolicy
in class LiveIndexWriterConfig
setMergePolicy(MergePolicy)
public IndexWriterConfig setMaxThreadStates(int maxThreadStates)
maxThreadStates
will be set to
DEFAULT_MAX_THREAD_STATES
.
Only takes effect when IndexWriter is first created.
public int getMaxThreadStates()
LiveIndexWriterConfig
getMaxThreadStates
in class LiveIndexWriterConfig
public IndexWriterConfig setReaderPooling(boolean readerPooling)
DirectoryReader.open(IndexWriter, boolean)
.
This method lets you enable pooling without getting a
near-real-time reader. NOTE: if you set this to
false, IndexWriter will still pool readers once
DirectoryReader.open(IndexWriter, boolean)
is called.
Only takes effect when IndexWriter is first created.
public boolean getReaderPooling()
LiveIndexWriterConfig
true
if IndexWriter
should pool readers even if
DirectoryReader.open(IndexWriter, boolean)
has not been called.
getReaderPooling
in class LiveIndexWriterConfig
public IndexWriterConfig setRAMPerThreadHardLimitMB(int perThreadHardLimitMB)
DocumentsWriterPerThread
is forcefully flushed
once it exceeds this limit even if the getRAMBufferSizeMB()
has
not been exceeded. This is a safety limit to prevent a
DocumentsWriterPerThread
from address space exhaustion due to its
internal 32 bit signed integer based memory addressing.
The given value must be less that 2GB (2048MB)
DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
public int getRAMPerThreadHardLimitMB()
LiveIndexWriterConfig
DocumentsWriterPerThread
can
consume until forcefully flushed.
getRAMPerThreadHardLimitMB
in class LiveIndexWriterConfig
setRAMPerThreadHardLimitMB(int)
public InfoStream getInfoStream()
LiveIndexWriterConfig
InfoStream
used for debugging.
getInfoStream
in class LiveIndexWriterConfig
setInfoStream(InfoStream)
public Analyzer getAnalyzer()
LiveIndexWriterConfig
getAnalyzer
in class LiveIndexWriterConfig
public int getMaxBufferedDeleteTerms()
LiveIndexWriterConfig
getMaxBufferedDeleteTerms
in class LiveIndexWriterConfig
LiveIndexWriterConfig.setMaxBufferedDeleteTerms(int)
public int getMaxBufferedDocs()
LiveIndexWriterConfig
getMaxBufferedDocs
in class LiveIndexWriterConfig
LiveIndexWriterConfig.setMaxBufferedDocs(int)
public IndexWriter.IndexReaderWarmer getMergedSegmentWarmer()
LiveIndexWriterConfig
IndexWriter.IndexReaderWarmer
.
getMergedSegmentWarmer
in class LiveIndexWriterConfig
public double getRAMBufferSizeMB()
LiveIndexWriterConfig
LiveIndexWriterConfig.setRAMBufferSizeMB(double)
if enabled.
getRAMBufferSizeMB
in class LiveIndexWriterConfig
public int getReaderTermsIndexDivisor()
LiveIndexWriterConfig
termInfosIndexDivisor
.
getReaderTermsIndexDivisor
in class LiveIndexWriterConfig
LiveIndexWriterConfig.setReaderTermsIndexDivisor(int)
public int getTermIndexInterval()
LiveIndexWriterConfig
getTermIndexInterval
in class LiveIndexWriterConfig
LiveIndexWriterConfig.setTermIndexInterval(int)
public IndexWriterConfig setInfoStream(InfoStream infoStream)
public IndexWriterConfig setInfoStream(PrintStream printStream)
PrintStreamInfoStream
public IndexWriterConfig setMaxBufferedDeleteTerms(int maxBufferedDeleteTerms)
LiveIndexWriterConfig
Disabled by default (writer flushes by RAM usage).
NOTE: This setting won't trigger a segment flush.
Takes effect immediately, but only the next time a document is added, updated or deleted.
setMaxBufferedDeleteTerms
in class LiveIndexWriterConfig
LiveIndexWriterConfig.setRAMBufferSizeMB(double)
public IndexWriterConfig setMaxBufferedDocs(int maxBufferedDocs)
LiveIndexWriterConfig
When this is set, the writer will flush every maxBufferedDocs added
documents. Pass in DISABLE_AUTO_FLUSH
to prevent
triggering a flush due to number of buffered documents. Note that if
flushing by RAM usage is also enabled, then the flush will be triggered by
whichever comes first.
Disabled by default (writer flushes by RAM usage).
Takes effect immediately, but only the next time a document is added, updated or deleted.
setMaxBufferedDocs
in class LiveIndexWriterConfig
LiveIndexWriterConfig.setRAMBufferSizeMB(double)
public IndexWriterConfig setMergedSegmentWarmer(IndexWriter.IndexReaderWarmer mergeSegmentWarmer)
LiveIndexWriterConfig
IndexWriter.IndexReaderWarmer
.
Takes effect on the next merge.
setMergedSegmentWarmer
in class LiveIndexWriterConfig
public IndexWriterConfig setRAMBufferSizeMB(double ramBufferSizeMB)
LiveIndexWriterConfig
When this is set, the writer will flush whenever buffered documents and
deletions use this much RAM. Pass in
DISABLE_AUTO_FLUSH
to prevent triggering a flush
due to RAM usage. Note that if flushing by document count is also enabled,
then the flush will be triggered by whichever comes first.
The maximum RAM limit is inherently determined by the JVMs available
memory. Yet, an IndexWriter
session can consume a significantly
larger amount of memory than the given RAM limit since this limit is just
an indicator when to flush memory resident documents to the Directory.
Flushes are likely happen concurrently while other threads adding documents
to the writer. For application stability the available memory in the JVM
should be significantly larger than the RAM buffer used for indexing.
NOTE: the account of RAM usage for pending deletions is only
approximate. Specifically, if you delete by Query, Lucene currently has no
way to measure the RAM usage of individual Queries so the accounting will
under-estimate and you should compensate by either calling commit()
periodically yourself, or by using LiveIndexWriterConfig.setMaxBufferedDeleteTerms(int)
to flush and apply buffered deletes by count instead of RAM usage (for each
buffered delete Query a constant number of bytes is used to estimate RAM
usage). Note that enabling LiveIndexWriterConfig.setMaxBufferedDeleteTerms(int)
will not
trigger any segment flushes.
NOTE: It's not guaranteed that all memory resident documents are
flushed once this limit is exceeded. Depending on the configured
FlushPolicy
only a subset of the buffered documents are flushed and
therefore only parts of the RAM buffer is released.
The default value is DEFAULT_RAM_BUFFER_SIZE_MB
.
Takes effect immediately, but only the next time a document is added, updated or deleted.
setRAMBufferSizeMB
in class LiveIndexWriterConfig
setRAMPerThreadHardLimitMB(int)
public IndexWriterConfig setReaderTermsIndexDivisor(int divisor)
LiveIndexWriterConfig
DirectoryReader.open(IndexWriter, boolean)
. If you pass -1, the
terms index won't be loaded by the readers. This is only useful in advanced
situations when you will only .next() through all terms; attempts to seek
will hit an exception.
Takes effect immediately, but only applies to readers opened after this call
NOTE: divisor settings > 1 do not apply to all PostingsFormat implementations, including the default one in this release. It only makes sense for terms indexes that can efficiently re-sample terms at load time.
setReaderTermsIndexDivisor
in class LiveIndexWriterConfig
public IndexWriterConfig setTermIndexInterval(int interval)
LiveIndexWriterConfig
This parameter determines the amount of computation required per query term, regardless of the number of documents that contain that term. In particular, it is the maximum number of other terms that must be scanned before a term is located and its frequency and position information may be processed. In a large index with user-entered query terms, query processing time is likely to be dominated not by term lookup but rather by the processing of frequency and positional data. In a small index or when many uncommon query terms are generated (e.g., by wildcard queries) term lookup may become a dominant cost.
In particular, numUniqueTerms/interval
terms are read into
memory by an IndexReader, and, on average, interval/2
terms
must be scanned for each random term access.
Takes effect immediately, but only applies to newly flushed/merged segments.
NOTE: This parameter does not apply to all PostingsFormat implementations,
including the default one in this release. It only makes sense for term indexes
that are implemented as a fixed gap between terms. For example,
Lucene41PostingsFormat
implements the term index instead based upon how
terms share prefixes. To configure its parameters (the minimum and maximum size
for a block), you would instead use Lucene41PostingsFormat.Lucene41PostingsFormat(int, int)
.
which can also be configured on a per-field basis:
//customize Lucene41PostingsFormat, passing minBlockSize=50, maxBlockSize=100 final PostingsFormat tweakedPostings = new Lucene41PostingsFormat(50, 100); iwc.setCodec(new Lucene42Codec() { @Override public PostingsFormat getPostingsFormatForField(String field) { if (field.equals("fieldWithTonsOfTerms")) return tweakedPostings; else return super.getPostingsFormatForField(field); } });Note that other implementations may have their own parameters, or no parameters at all.
setTermIndexInterval
in class LiveIndexWriterConfig
DEFAULT_TERM_INDEX_INTERVAL
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |