Package com.arcadedb.index.lsm
Class LSMTreeFullTextIndex
- java.lang.Object
-
- com.arcadedb.index.lsm.LSMTreeFullTextIndex
-
- All Implemented Interfaces:
Index
,IndexInternal
public class LSMTreeFullTextIndex extends Object implements Index, IndexInternal
Full Text index implementation based on LSM-Tree index. In order to support a full-text index, we leverage on the Lucene ecosystem in terms of Analyzer, Tokenizers, and stemmers, but leaving the current efficient LSM-Tree implementation with the management for ACID(ity), bg compaction, wal, replication, ha, etc.
The idea to index a text is:
parse the text with the configured analyzer. The analyzer uses a tokenizer that splits the text into words, then the stemmer extracts the stem of each word. In the end, the stop words are removed. The output of this phase is an array of strings to be indexed. Put all the strings from the resulting array in the underlying LSM index with the RID as value (as with default LSM-Tree index implementation) For the search, the process is similar, with the computation of the score:
parse the text with the configured analyzer, extract the array of strings (see above) search for all the strings in the array, by storing the multiple results in a Map<String,List<RID>> (as Map<keyword,results>) browse all the results in the maps, by adding all of them to a final TreeMap<RID, AtomicInteger> that represents the score, where the key is the record id and the value is a counter that stores the score. At the beginning the score is 1. Every time a RID is already present in the score TreeMap, then the value is incremented. In this way, the records that match a higher number of keywords will have a higher score. The score can start from 1 to Integer.MAX_INT. the query result will be the TreeMap ordered by score, so if the query has a limit, only the first X items will be returned ordered by score desc
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
LSMTreeFullTextIndex.IndexFactoryHandler
static class
LSMTreeFullTextIndex.PaginatedComponentFactoryHandlerNotUnique
-
Nested classes/interfaces inherited from interface com.arcadedb.index.Index
Index.BuildIndexCallback
-
-
Constructor Summary
Constructors Constructor Description LSMTreeFullTextIndex(DatabaseInternal database, String name, String filePath, int fileId, PaginatedFile.MODE mode, int pageSize, int version)
Loading time.LSMTreeFullTextIndex(DatabaseInternal database, String name, String filePath, PaginatedFile.MODE mode, int pageSize, Index.BuildIndexCallback callback)
Creation time.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description List<String>
analyzeText(org.apache.lucene.analysis.Analyzer analyzer, Object[] text)
long
build(Index.BuildIndexCallback callback)
void
close()
boolean
compact()
long
countEntries()
void
drop()
IndexCursor
get(Object[] keys)
Retrieves the set of RIDs associated to a key.IndexCursor
get(Object[] keys, int limit)
Retrieves the set of RIDs associated to a key with a limit for the result.org.apache.lucene.analysis.Analyzer
getAnalyzer()
int
getAssociatedBucketId()
byte[]
getBinaryKeyTypes()
int
getFileId()
List<Integer>
getFileIds()
Type[]
getKeyTypes()
String
getName()
LSMTreeIndexAbstract.NULL_STRATEGY
getNullStrategy()
int
getPageSize()
PaginatedComponent
getPaginatedComponent()
List<String>
getPropertyNames()
Map<String,Long>
getStats()
Schema.INDEX_TYPE
getType()
TypeIndex
getTypeIndex()
String
getTypeName()
boolean
isAutomatic()
boolean
isCompacting()
boolean
isUnique()
void
put(Object[] keys, RID[] rids)
Add multiple values for one key in the index.void
remove(Object[] keys)
Removes the keys from the index.void
remove(Object[] keys, Identifiable rid)
Removes an entry keys/record entry from the index.boolean
scheduleCompaction()
void
setMetadata(String name, String[] propertyNames, int associatedBucketId)
void
setNullStrategy(LSMTreeIndexAbstract.NULL_STRATEGY nullStrategy)
void
setTypeIndex(TypeIndex typeIndex)
boolean
supportsOrderedIterations()
-
-
-
Constructor Detail
-
LSMTreeFullTextIndex
public LSMTreeFullTextIndex(DatabaseInternal database, String name, String filePath, PaginatedFile.MODE mode, int pageSize, Index.BuildIndexCallback callback)
Creation time.
-
LSMTreeFullTextIndex
public LSMTreeFullTextIndex(DatabaseInternal database, String name, String filePath, int fileId, PaginatedFile.MODE mode, int pageSize, int version)
Loading time.
-
-
Method Detail
-
get
public IndexCursor get(Object[] keys)
Description copied from interface:Index
Retrieves the set of RIDs associated to a key.
-
get
public IndexCursor get(Object[] keys, int limit)
Description copied from interface:Index
Retrieves the set of RIDs associated to a key with a limit for the result.
-
put
public void put(Object[] keys, RID[] rids)
Description copied from interface:Index
Add multiple values for one key in the index.
-
remove
public void remove(Object[] keys)
Description copied from interface:Index
Removes the keys from the index.
-
remove
public void remove(Object[] keys, Identifiable rid)
Description copied from interface:Index
Removes an entry keys/record entry from the index.
-
countEntries
public long countEntries()
- Specified by:
countEntries
in interfaceIndex
-
compact
public boolean compact() throws IOException, InterruptedException
- Specified by:
compact
in interfaceIndexInternal
- Throws:
IOException
InterruptedException
-
isCompacting
public boolean isCompacting()
- Specified by:
isCompacting
in interfaceIndex
-
scheduleCompaction
public boolean scheduleCompaction()
- Specified by:
scheduleCompaction
in interfaceIndex
-
setMetadata
public void setMetadata(String name, String[] propertyNames, int associatedBucketId)
- Specified by:
setMetadata
in interfaceIndexInternal
-
getTypeName
public String getTypeName()
- Specified by:
getTypeName
in interfaceIndex
-
getPropertyNames
public List<String> getPropertyNames()
- Specified by:
getPropertyNames
in interfaceIndex
-
close
public void close()
- Specified by:
close
in interfaceIndexInternal
-
drop
public void drop()
- Specified by:
drop
in interfaceIndexInternal
-
getStats
public Map<String,Long> getStats()
- Specified by:
getStats
in interfaceIndexInternal
-
getNullStrategy
public LSMTreeIndexAbstract.NULL_STRATEGY getNullStrategy()
- Specified by:
getNullStrategy
in interfaceIndex
-
setNullStrategy
public void setNullStrategy(LSMTreeIndexAbstract.NULL_STRATEGY nullStrategy)
- Specified by:
setNullStrategy
in interfaceIndex
-
getFileId
public int getFileId()
- Specified by:
getFileId
in interfaceIndexInternal
-
getPaginatedComponent
public PaginatedComponent getPaginatedComponent()
- Specified by:
getPaginatedComponent
in interfaceIndexInternal
-
getKeyTypes
public Type[] getKeyTypes()
- Specified by:
getKeyTypes
in interfaceIndexInternal
-
getBinaryKeyTypes
public byte[] getBinaryKeyTypes()
- Specified by:
getBinaryKeyTypes
in interfaceIndexInternal
-
getAssociatedBucketId
public int getAssociatedBucketId()
- Specified by:
getAssociatedBucketId
in interfaceIndex
-
supportsOrderedIterations
public boolean supportsOrderedIterations()
- Specified by:
supportsOrderedIterations
in interfaceIndex
-
isAutomatic
public boolean isAutomatic()
- Specified by:
isAutomatic
in interfaceIndex
-
getPageSize
public int getPageSize()
- Specified by:
getPageSize
in interfaceIndex
-
getFileIds
public List<Integer> getFileIds()
- Specified by:
getFileIds
in interfaceIndexInternal
-
setTypeIndex
public void setTypeIndex(TypeIndex typeIndex)
- Specified by:
setTypeIndex
in interfaceIndexInternal
-
getTypeIndex
public TypeIndex getTypeIndex()
- Specified by:
getTypeIndex
in interfaceIndexInternal
-
build
public long build(Index.BuildIndexCallback callback)
- Specified by:
build
in interfaceIndexInternal
-
getType
public Schema.INDEX_TYPE getType()
-
getAnalyzer
public org.apache.lucene.analysis.Analyzer getAnalyzer()
-
-