Class Lucene40DocValuesFormat
- java.lang.Object
-
- org.apache.lucene.codecs.DocValuesFormat
-
- org.apache.lucene.codecs.lucene40.Lucene40DocValuesFormat
-
- All Implemented Interfaces:
NamedSPILoader.NamedSPI
@Deprecated public class Lucene40DocValuesFormat extends DocValuesFormat
Deprecated.Only for reading old 4.0 and 4.1 segmentsLucene 4.0 DocValues format.Files:
- .dv.cfs:
compound container
- .dv.cfe:
compound entries
- <segment>_<fieldNumber>.dat: data values
- <segment>_<fieldNumber>.idx: index into the .dat for DEREF types
There are several many types of
Formats:DocValues
with different encodings. From the perspective of filenames, all types store their values in .dat entries within the compound file. In the case of dereferenced/sorted types, the .dat actually contains only the unique values, and an additional .idx file contains pointers to these unique values.VAR_INTS
.dat --> Header, PackedType, MinValue, DefaultValue, PackedStreamFIXED_INTS_8
.dat --> Header, ValueSize,Byte
maxdocFIXED_INTS_16
.dat --> Header, ValueSize,Short
maxdocFIXED_INTS_32
.dat --> Header, ValueSize,Int32
maxdocFIXED_INTS_64
.dat --> Header, ValueSize,Int64
maxdocFLOAT_32
.dat --> Header, ValueSize, Float32maxdocFLOAT_64
.dat --> Header, ValueSize, Float64maxdocBYTES_FIXED_STRAIGHT
.dat --> Header, ValueSize, (Byte
* ValueSize)maxdocBYTES_VAR_STRAIGHT
.idx --> Header, TotalBytes, AddressesBYTES_VAR_STRAIGHT
.dat --> Header, (Byte
* variable ValueSize)maxdocBYTES_FIXED_DEREF
.idx --> Header, NumValues, AddressesBYTES_FIXED_DEREF
.dat --> Header, ValueSize, (Byte
* ValueSize)NumValuesBYTES_VAR_DEREF
.idx --> Header, TotalVarBytes, AddressesBYTES_VAR_DEREF
.dat --> Header, (LengthPrefix +Byte
* variable ValueSize)NumValuesBYTES_FIXED_SORTED
.idx --> Header, NumValues, OrdinalsBYTES_FIXED_SORTED
.dat --> Header, ValueSize, (Byte
* ValueSize)NumValuesBYTES_VAR_SORTED
.idx --> Header, TotalVarBytes, Addresses, OrdinalsBYTES_VAR_SORTED
.dat --> Header, (Byte
* variable ValueSize)NumValues
- Header -->
CodecHeader
- PackedType -->
Byte
- MaxAddress, MinValue, DefaultValue -->
Int64
- PackedStream, Addresses, Ordinals -->
PackedInts
- ValueSize, NumValues -->
Int32
- Float32 --> 32-bit float encoded with
Float.floatToRawIntBits(float)
then written asInt32
- Float64 --> 64-bit float encoded with
Double.doubleToRawLongBits(double)
then written asInt64
- TotalBytes -->
VLong
- TotalVarBytes -->
Int64
- LengthPrefix --> Length of the data value as
VInt
(maximum of 2 bytes)
- PackedType is a 0 when compressed, 1 when the stream is written as 64-bit integers.
- Addresses stores pointers to the actual byte location (indexed by docid). In the VAR_STRAIGHT
case, each entry can have a different length, so to determine the length, docid+1 is
retrieved. A sentinel address is written at the end for the VAR_STRAIGHT case, so the Addresses
stream contains maxdoc+1 indices. For the deduplicated VAR_DEREF case, each length
is encoded as a prefix to the data itself as a
VInt
(maximum of 2 bytes). - Ordinals stores the term ID in sorted order (indexed by docid). In the FIXED_SORTED case,
the address into the .dat can be computed from the ordinal as
Header+ValueSize+(ordinal*ValueSize)
because the byte length is fixed. In the VAR_SORTED case, there is double indirection (docid -> ordinal -> address), but an additional sentinel ordinal+address is always written (so there are NumValues+1 ordinals). To determine the length, ord+1's address is looked up as well. BYTES_VAR_STRAIGHT BYTES_VAR_STRAIGHT
in contrast to other straight variants uses a .idx file to improve lookup perfromance. In contrast toBYTES_VAR_DEREF BYTES_VAR_DEREF
it doesn't apply deduplication of the document values.
Limitations:
- Binary doc values can be at most
MAX_BINARY_FIELD_LENGTH
in length.
-
-
Field Summary
Fields Modifier and Type Field Description static int
MAX_BINARY_FIELD_LENGTH
Deprecated.Maximum length for each binary doc values field.
-
Constructor Summary
Constructors Constructor Description Lucene40DocValuesFormat()
Deprecated.Sole constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description DocValuesConsumer
fieldsConsumer(SegmentWriteState state)
Deprecated.Returns aDocValuesConsumer
to write docvalues to the index.DocValuesProducer
fieldsProducer(SegmentReadState state)
Deprecated.Returns aDocValuesProducer
to read docvalues from the index.-
Methods inherited from class org.apache.lucene.codecs.DocValuesFormat
availableDocValuesFormats, forName, getName, reloadDocValuesFormats, toString
-
-
-
-
Field Detail
-
MAX_BINARY_FIELD_LENGTH
public static final int MAX_BINARY_FIELD_LENGTH
Deprecated.Maximum length for each binary doc values field.- See Also:
- Constant Field Values
-
-
Method Detail
-
fieldsConsumer
public DocValuesConsumer fieldsConsumer(SegmentWriteState state) throws IOException
Deprecated.Description copied from class:DocValuesFormat
Returns aDocValuesConsumer
to write docvalues to the index.- Specified by:
fieldsConsumer
in classDocValuesFormat
- Throws:
IOException
-
fieldsProducer
public DocValuesProducer fieldsProducer(SegmentReadState state) throws IOException
Deprecated.Description copied from class:DocValuesFormat
Returns aDocValuesProducer
to read docvalues from the index.NOTE: by the time this call returns, it must hold open any files it will need to use; else, those files may be deleted. Additionally, required files may be deleted during the execution of this call before there is a chance to open them. Under these circumstances an IOException should be thrown by the implementation. IOExceptions are expected and will automatically cause a retry of the segment opening logic with the newly revised segments.
- Specified by:
fieldsProducer
in classDocValuesFormat
- Throws:
IOException
-
-