Class ElasticsearchAsyncClient
- All Implemented Interfaces:
Closeable
,AutoCloseable
-
Field Summary
Fields inherited from class co.elastic.clients.ApiClient
transport, transportOptions
-
Constructor Summary
ConstructorsConstructorDescriptionElasticsearchAsyncClient
(ElasticsearchTransport transport) ElasticsearchAsyncClient
(ElasticsearchTransport transport, TransportOptions transportOptions) -
Method Summary
Modifier and TypeMethodDescriptionbulk()
Bulk index or delete documents.bulk
(BulkRequest request) Bulk index or delete documents.final CompletableFuture<BulkResponse>
Bulk index or delete documents.cat()
ccr()
Clear a scrolling search.clearScroll
(ClearScrollRequest request) Clear a scrolling search.Clear a scrolling search.closePointInTime
(ClosePointInTimeRequest request) Close a point in time.closePointInTime
(Function<ClosePointInTimeRequest.Builder, ObjectBuilder<ClosePointInTimeRequest>> fn) Close a point in time.cluster()
count()
Count search results.count
(CountRequest request) Count search results.final CompletableFuture<CountResponse>
Count search results.<TDocument>
CompletableFuture<CreateResponse>create
(CreateRequest<TDocument> request) Create a new document in the index.final <TDocument>
CompletableFuture<CreateResponse>create
(Function<CreateRequest.Builder<TDocument>, ObjectBuilder<CreateRequest<TDocument>>> fn) Create a new document in the index.delete
(DeleteRequest request) Delete a document.final CompletableFuture<DeleteResponse>
Delete a document.deleteByQuery
(DeleteByQueryRequest request) Delete documents.Delete documents.Throttle a delete by query operation.deleteByQueryRethrottle
(Function<DeleteByQueryRethrottleRequest.Builder, ObjectBuilder<DeleteByQueryRethrottleRequest>> fn) Throttle a delete by query operation.deleteScript
(DeleteScriptRequest request) Delete a script or search template.Delete a script or search template.enrich()
eql()
esql()
exists
(ExistsRequest request) Check a document.final CompletableFuture<BooleanResponse>
Check a document.existsSource
(ExistsSourceRequest request) Check for a document source.final CompletableFuture<BooleanResponse>
Check for a document source.explain
(ExplainRequest request) Overload ofexplain(ExplainRequest, Class)
, where Class is defined as Void, meaning the documents will not be deserialized.<TDocument>
CompletableFuture<ExplainResponse<TDocument>>explain
(ExplainRequest request, Class<TDocument> tDocumentClass) Explain a document match result.<TDocument>
CompletableFuture<ExplainResponse<TDocument>>explain
(ExplainRequest request, Type tDocumentType) Explain a document match result.final CompletableFuture<ExplainResponse<Void>>
Overload ofexplain(Function, Class)
, where Class is defined as Void, meaning the documents will not be deserialized.final <TDocument>
CompletableFuture<ExplainResponse<TDocument>>explain
(Function<ExplainRequest.Builder, ObjectBuilder<ExplainRequest>> fn, Class<TDocument> tDocumentClass) Explain a document match result.final <TDocument>
CompletableFuture<ExplainResponse<TDocument>>explain
(Function<ExplainRequest.Builder, ObjectBuilder<ExplainRequest>> fn, Type tDocumentType) Explain a document match result.features()
Get the field capabilities.fieldCaps
(FieldCapsRequest request) Get the field capabilities.Get the field capabilities.fleet()
get
(GetRequest request) Overload ofget(GetRequest, Class)
, where Class is defined as Void, meaning the documents will not be deserialized.<TDocument>
CompletableFuture<GetResponse<TDocument>>get
(GetRequest request, Class<TDocument> tDocumentClass) Get a document by its ID.<TDocument>
CompletableFuture<GetResponse<TDocument>>get
(GetRequest request, Type tDocumentType) Get a document by its ID.final CompletableFuture<GetResponse<Void>>
Overload ofget(Function, Class)
, where Class is defined as Void, meaning the documents will not be deserialized.final <TDocument>
CompletableFuture<GetResponse<TDocument>>get
(Function<GetRequest.Builder, ObjectBuilder<GetRequest>> fn, Class<TDocument> tDocumentClass) Get a document by its ID.final <TDocument>
CompletableFuture<GetResponse<TDocument>>get
(Function<GetRequest.Builder, ObjectBuilder<GetRequest>> fn, Type tDocumentType) Get a document by its ID.getScript
(GetScriptRequest request) Get a script or search template.Get a script or search template.Get script contexts.Get script languages.getSource
(GetSourceRequest request) Overload ofgetSource(GetSourceRequest, Class)
, where Class is defined as Void, meaning the documents will not be deserialized.<TDocument>
CompletableFuture<GetSourceResponse<TDocument>>getSource
(GetSourceRequest request, Class<TDocument> tDocumentClass) Get a document's source.<TDocument>
CompletableFuture<GetSourceResponse<TDocument>>getSource
(GetSourceRequest request, Type tDocumentType) Get a document's source.final CompletableFuture<GetSourceResponse<Void>>
Overload ofgetSource(Function, Class)
, where Class is defined as Void, meaning the documents will not be deserialized.final <TDocument>
CompletableFuture<GetSourceResponse<TDocument>>getSource
(Function<GetSourceRequest.Builder, ObjectBuilder<GetSourceRequest>> fn, Class<TDocument> tDocumentClass) Get a document's source.final <TDocument>
CompletableFuture<GetSourceResponse<TDocument>>getSource
(Function<GetSourceRequest.Builder, ObjectBuilder<GetSourceRequest>> fn, Type tDocumentType) Get a document's source.graph()
Get the cluster health.healthReport
(HealthReportRequest request) Get the cluster health.Get the cluster health.ilm()
<TDocument>
CompletableFuture<IndexResponse>index
(IndexRequest<TDocument> request) Create or update a document in an index.final <TDocument>
CompletableFuture<IndexResponse>index
(Function<IndexRequest.Builder<TDocument>, ObjectBuilder<IndexRequest<TDocument>>> fn) Create or update a document in an index.indices()
info()
Get cluster info.ingest()
knnSearch
(KnnSearchRequest request) Overload ofknnSearch(KnnSearchRequest, Class)
, where Class is defined as Void, meaning the documents will not be deserialized.<TDocument>
CompletableFuture<KnnSearchResponse<TDocument>>knnSearch
(KnnSearchRequest request, Class<TDocument> tDocumentClass) Run a knn search.<TDocument>
CompletableFuture<KnnSearchResponse<TDocument>>knnSearch
(KnnSearchRequest request, Type tDocumentType) Run a knn search.final CompletableFuture<KnnSearchResponse<Void>>
Overload ofknnSearch(Function, Class)
, where Class is defined as Void, meaning the documents will not be deserialized.final <TDocument>
CompletableFuture<KnnSearchResponse<TDocument>>knnSearch
(Function<KnnSearchRequest.Builder, ObjectBuilder<KnnSearchRequest>> fn, Class<TDocument> tDocumentClass) Run a knn search.final <TDocument>
CompletableFuture<KnnSearchResponse<TDocument>>knnSearch
(Function<KnnSearchRequest.Builder, ObjectBuilder<KnnSearchRequest>> fn, Type tDocumentType) Run a knn search.license()
logstash()
mget
(MgetRequest request) Overload ofmget(MgetRequest, Class)
, where Class is defined as Void, meaning the documents will not be deserialized.<TDocument>
CompletableFuture<MgetResponse<TDocument>>mget
(MgetRequest request, Class<TDocument> tDocumentClass) Get multiple documents.<TDocument>
CompletableFuture<MgetResponse<TDocument>>mget
(MgetRequest request, Type tDocumentType) Get multiple documents.final CompletableFuture<MgetResponse<Void>>
Overload ofmget(Function, Class)
, where Class is defined as Void, meaning the documents will not be deserialized.final <TDocument>
CompletableFuture<MgetResponse<TDocument>>mget
(Function<MgetRequest.Builder, ObjectBuilder<MgetRequest>> fn, Class<TDocument> tDocumentClass) Get multiple documents.final <TDocument>
CompletableFuture<MgetResponse<TDocument>>mget
(Function<MgetRequest.Builder, ObjectBuilder<MgetRequest>> fn, Type tDocumentType) Get multiple documents.ml()
msearch
(MsearchRequest request) Overload ofmsearch(MsearchRequest, Class)
, where Class is defined as Void, meaning the documents will not be deserialized.<TDocument>
CompletableFuture<MsearchResponse<TDocument>>msearch
(MsearchRequest request, Class<TDocument> tDocumentClass) Run multiple searches.<TDocument>
CompletableFuture<MsearchResponse<TDocument>>msearch
(MsearchRequest request, Type tDocumentType) Run multiple searches.final CompletableFuture<MsearchResponse<Void>>
Overload ofmsearch(Function, Class)
, where Class is defined as Void, meaning the documents will not be deserialized.final <TDocument>
CompletableFuture<MsearchResponse<TDocument>>msearch
(Function<MsearchRequest.Builder, ObjectBuilder<MsearchRequest>> fn, Class<TDocument> tDocumentClass) Run multiple searches.final <TDocument>
CompletableFuture<MsearchResponse<TDocument>>msearch
(Function<MsearchRequest.Builder, ObjectBuilder<MsearchRequest>> fn, Type tDocumentType) Run multiple searches.msearchTemplate
(MsearchTemplateRequest request) Overload ofmsearchTemplate(MsearchTemplateRequest, Class)
, where Class is defined as Void, meaning the documents will not be deserialized.<TDocument>
CompletableFuture<MsearchTemplateResponse<TDocument>>msearchTemplate
(MsearchTemplateRequest request, Class<TDocument> tDocumentClass) Run multiple templated searches.<TDocument>
CompletableFuture<MsearchTemplateResponse<TDocument>>msearchTemplate
(MsearchTemplateRequest request, Type tDocumentType) Run multiple templated searches.Overload ofmsearchTemplate(Function, Class)
, where Class is defined as Void, meaning the documents will not be deserialized.final <TDocument>
CompletableFuture<MsearchTemplateResponse<TDocument>>msearchTemplate
(Function<MsearchTemplateRequest.Builder, ObjectBuilder<MsearchTemplateRequest>> fn, Class<TDocument> tDocumentClass) Run multiple templated searches.final <TDocument>
CompletableFuture<MsearchTemplateResponse<TDocument>>msearchTemplate
(Function<MsearchTemplateRequest.Builder, ObjectBuilder<MsearchTemplateRequest>> fn, Type tDocumentType) Run multiple templated searches.Get multiple term vectors.mtermvectors
(MtermvectorsRequest request) Get multiple term vectors.Get multiple term vectors.nodes()
openPointInTime
(OpenPointInTimeRequest request) Open a point in time.Open a point in time.ping()
Ping the cluster.putScript
(PutScriptRequest request) Create or update a script or search template.Create or update a script or search template.rankEval
(RankEvalRequest request) Evaluate ranked search results.Evaluate ranked search results.reindex
(ReindexRequest request) Reindex documents.final CompletableFuture<ReindexResponse>
Reindex documents.Throttle a reindex operation.reindexRethrottle
(Function<ReindexRethrottleRequest.Builder, ObjectBuilder<ReindexRethrottleRequest>> fn) Throttle a reindex operation.Render a search template.Render a search template.renderSearchTemplate
(Function<RenderSearchTemplateRequest.Builder, ObjectBuilder<RenderSearchTemplateRequest>> fn) Render a search template.rollup()
Overload ofscriptsPainlessExecute(ScriptsPainlessExecuteRequest, Class)
, where Class is defined as Void, meaning the documents will not be deserialized.<TResult> CompletableFuture<ScriptsPainlessExecuteResponse<TResult>>
scriptsPainlessExecute
(ScriptsPainlessExecuteRequest request, Class<TResult> tResultClass) Run a script.<TResult> CompletableFuture<ScriptsPainlessExecuteResponse<TResult>>
scriptsPainlessExecute
(ScriptsPainlessExecuteRequest request, Type tResultType) Run a script.scriptsPainlessExecute
(Function<ScriptsPainlessExecuteRequest.Builder, ObjectBuilder<ScriptsPainlessExecuteRequest>> fn) Overload ofscriptsPainlessExecute(Function, Class)
, where Class is defined as Void, meaning the documents will not be deserialized.final <TResult> CompletableFuture<ScriptsPainlessExecuteResponse<TResult>>
scriptsPainlessExecute
(Function<ScriptsPainlessExecuteRequest.Builder, ObjectBuilder<ScriptsPainlessExecuteRequest>> fn, Class<TResult> tResultClass) Run a script.final <TResult> CompletableFuture<ScriptsPainlessExecuteResponse<TResult>>
scriptsPainlessExecute
(Function<ScriptsPainlessExecuteRequest.Builder, ObjectBuilder<ScriptsPainlessExecuteRequest>> fn, Type tResultType) Run a script.scroll
(ScrollRequest request) Overload ofscroll(ScrollRequest, Class)
, where Class is defined as Void, meaning the documents will not be deserialized.<TDocument>
CompletableFuture<ScrollResponse<TDocument>>scroll
(ScrollRequest request, Class<TDocument> tDocumentClass) Run a scrolling search.<TDocument>
CompletableFuture<ScrollResponse<TDocument>>scroll
(ScrollRequest request, Type tDocumentType) Run a scrolling search.final CompletableFuture<ScrollResponse<Void>>
Overload ofscroll(Function, Class)
, where Class is defined as Void, meaning the documents will not be deserialized.final <TDocument>
CompletableFuture<ScrollResponse<TDocument>>scroll
(Function<ScrollRequest.Builder, ObjectBuilder<ScrollRequest>> fn, Class<TDocument> tDocumentClass) Run a scrolling search.final <TDocument>
CompletableFuture<ScrollResponse<TDocument>>scroll
(Function<ScrollRequest.Builder, ObjectBuilder<ScrollRequest>> fn, Type tDocumentType) Run a scrolling search.search
(SearchRequest request) Overload ofsearch(SearchRequest, Class)
, where Class is defined as Void, meaning the documents will not be deserialized.<TDocument>
CompletableFuture<SearchResponse<TDocument>>search
(SearchRequest request, Class<TDocument> tDocumentClass) Run a search.<TDocument>
CompletableFuture<SearchResponse<TDocument>>search
(SearchRequest request, Type tDocumentType) Run a search.final CompletableFuture<SearchResponse<Void>>
Overload ofsearch(Function, Class)
, where Class is defined as Void, meaning the documents will not be deserialized.final <TDocument>
CompletableFuture<SearchResponse<TDocument>>search
(Function<SearchRequest.Builder, ObjectBuilder<SearchRequest>> fn, Class<TDocument> tDocumentClass) Run a search.final <TDocument>
CompletableFuture<SearchResponse<TDocument>>search
(Function<SearchRequest.Builder, ObjectBuilder<SearchRequest>> fn, Type tDocumentType) Run a search.searchMvt
(SearchMvtRequest request) Search a vector tile.final CompletableFuture<BinaryResponse>
Search a vector tile.Get the search shards.searchShards
(SearchShardsRequest request) Get the search shards.Get the search shards.searchTemplate
(SearchTemplateRequest request) Overload ofsearchTemplate(SearchTemplateRequest, Class)
, where Class is defined as Void, meaning the documents will not be deserialized.<TDocument>
CompletableFuture<SearchTemplateResponse<TDocument>>searchTemplate
(SearchTemplateRequest request, Class<TDocument> tDocumentClass) Run a search with a search template.<TDocument>
CompletableFuture<SearchTemplateResponse<TDocument>>searchTemplate
(SearchTemplateRequest request, Type tDocumentType) Run a search with a search template.Overload ofsearchTemplate(Function, Class)
, where Class is defined as Void, meaning the documents will not be deserialized.final <TDocument>
CompletableFuture<SearchTemplateResponse<TDocument>>searchTemplate
(Function<SearchTemplateRequest.Builder, ObjectBuilder<SearchTemplateRequest>> fn, Class<TDocument> tDocumentClass) Run a search with a search template.final <TDocument>
CompletableFuture<SearchTemplateResponse<TDocument>>searchTemplate
(Function<SearchTemplateRequest.Builder, ObjectBuilder<SearchTemplateRequest>> fn, Type tDocumentType) Run a search with a search template.security()
shutdown()
simulate()
slm()
snapshot()
sql()
ssl()
synonyms()
tasks()
termsEnum
(TermsEnumRequest request) Get terms in an index.Get terms in an index.<TDocument>
CompletableFuture<TermvectorsResponse>termvectors
(TermvectorsRequest<TDocument> request) Get term vector information.final <TDocument>
CompletableFuture<TermvectorsResponse>termvectors
(Function<TermvectorsRequest.Builder<TDocument>, ObjectBuilder<TermvectorsRequest<TDocument>>> fn) Get term vector information.<TDocument,
TPartialDocument>
CompletableFuture<UpdateResponse<TDocument>>update
(UpdateRequest<TDocument, TPartialDocument> request, Class<TDocument> tDocumentClass) Update a document.<TDocument,
TPartialDocument>
CompletableFuture<UpdateResponse<TDocument>>update
(UpdateRequest<TDocument, TPartialDocument> request, Type tDocumentType) Update a document.final <TDocument,
TPartialDocument>
CompletableFuture<UpdateResponse<TDocument>>update
(Function<UpdateRequest.Builder<TDocument, TPartialDocument>, ObjectBuilder<UpdateRequest<TDocument, TPartialDocument>>> fn, Class<TDocument> tDocumentClass) Update a document.final <TDocument,
TPartialDocument>
CompletableFuture<UpdateResponse<TDocument>>update
(Function<UpdateRequest.Builder<TDocument, TPartialDocument>, ObjectBuilder<UpdateRequest<TDocument, TPartialDocument>>> fn, Type tDocumentType) Update a document.updateByQuery
(UpdateByQueryRequest request) Update documents.Update documents.Throttle an update by query operation.updateByQueryRethrottle
(Function<UpdateByQueryRethrottleRequest.Builder, ObjectBuilder<UpdateByQueryRethrottleRequest>> fn) Throttle an update by query operation.watcher()
withTransportOptions
(TransportOptions transportOptions) Creates a new client with some request optionsxpack()
Methods inherited from class co.elastic.clients.ApiClient
_jsonpMapper, _transport, _transportOptions, close, getDeserializer, withTransportOptions
-
Constructor Details
-
ElasticsearchAsyncClient
-
ElasticsearchAsyncClient
public ElasticsearchAsyncClient(ElasticsearchTransport transport, @Nullable TransportOptions transportOptions)
-
-
Method Details
-
withTransportOptions
Description copied from class:ApiClient
Creates a new client with some request options- Specified by:
withTransportOptions
in classApiClient<ElasticsearchTransport,
ElasticsearchAsyncClient>
-
asyncSearch
-
autoscaling
-
cat
-
ccr
-
cluster
-
connector
-
danglingIndices
-
enrich
-
eql
-
esql
-
features
-
fleet
-
graph
-
ilm
-
indices
-
inference
-
ingest
-
license
-
logstash
-
migration
-
ml
-
monitoring
-
nodes
-
queryRules
-
rollup
-
searchApplication
-
searchableSnapshots
-
security
-
shutdown
-
simulate
-
slm
-
snapshot
-
sql
-
ssl
-
synonyms
-
tasks
-
textStructure
-
transform
-
watcher
-
xpack
-
bulk
Bulk index or delete documents. Perform multipleindex
,create
,delete
, andupdate
actions in a single request. This reduces overhead and can greatly increase indexing speed.If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:
- To use the
create
action, you must have thecreate_doc
,create
,index
, orwrite
index privilege. Data streams support only thecreate
action. - To use the
index
action, you must have thecreate
,index
, orwrite
index privilege. - To use the
delete
action, you must have thedelete
orwrite
index privilege. - To use the
update
action, you must have theindex
orwrite
index privilege. - To automatically create a data stream or index with a bulk API request,
you must have the
auto_configure
,create_index
, ormanage
index privilege. - To make the result of a bulk operation visible to search using the
refresh
parameter, you must have themaintenance
ormanage
index privilege.
Automatic data stream creation requires a matching index template with data stream enabled.
The actions are specified in the request body using a newline delimited JSON (NDJSON) structure:
action_and_meta_data\n optional_source\n action_and_meta_data\n optional_source\n .... action_and_meta_data\n optional_source\n
The
index
andcreate
actions expect a source on the next line and have the same semantics as theop_type
parameter in the standard index API. Acreate
action fails if a document with the same ID already exists in the target Anindex
action adds or replaces a document as necessary.NOTE: Data streams support only the
create
action. To update or delete a document in a data stream, you must target the backing index containing the document.An
update
action expects that the partial doc, upsert, and script and its options are specified on the next line.A
delete
action does not expect a source on the next line and has the same semantics as the standard delete API.NOTE: The final line of data must end with a newline character (
\n
). Each newline character may be preceded by a carriage return (\r
). When sending NDJSON data to the_bulk
endpoint, use aContent-Type
header ofapplication/json
orapplication/x-ndjson
. Because this format uses literal newline characters (\n
) as delimiters, make sure that the JSON actions and sources are not pretty printed.If you provide a target in the request path, it is used for any actions that don't explicitly specify an
_index
argument.A note on the format: the idea here is to make processing as fast as possible. As some of the actions are redirected to other shards on other nodes, only
action_meta_data
is parsed on the receiving node side.Client libraries using this protocol should try and strive to do something similar on the client side, and reduce buffering as much as possible.
There is no "correct" number of actions to perform in a single bulk request. Experiment with different settings to find the optimal size for your particular workload. Note that Elasticsearch limits the maximum size of a HTTP request to 100mb by default so clients must ensure that no request exceeds this size. It is not possible to index a single document that exceeds the size limit, so you must pre-process any such documents into smaller pieces before sending them to Elasticsearch. For instance, split documents into pages or chapters before indexing them, or store raw binary data in a system outside Elasticsearch and replace the raw data with a link to the external system in the documents that you send to Elasticsearch.
Client suppport for bulk requests
Some of the officially supported clients provide helpers to assist with bulk requests and reindexing:
- Go: Check out
esutil.BulkIndexer
- Perl: Check out
Search::Elasticsearch::Client::5_0::Bulk
andSearch::Elasticsearch::Client::5_0::Scroll
- Python: Check out
elasticsearch.helpers.*
- JavaScript: Check out
client.helpers.*
- .NET: Check out
BulkAllObservable
- PHP: Check out bulk indexing.
Submitting bulk requests with cURL
If you're providing text file input to
curl
, you must use the--data-binary
flag instead of plain-d
. The latter doesn't preserve newlines. For example:$ cat requests { "index" : { "_index" : "test", "_id" : "1" } } { "field1" : "value1" } $ curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary "@requests"; echo {"took":7, "errors": false, "items":[{"index":{"_index":"test","_id":"1","_version":1,"result":"created","forced_refresh":false}}]}
Optimistic concurrency control
Each
index
anddelete
action within a bulk API call may include theif_seq_no
andif_primary_term
parameters in their respective action and meta data lines. Theif_seq_no
andif_primary_term
parameters control how operations are run, based on the last modification to existing documents. See Optimistic concurrency control for more details.Versioning
Each bulk item can include the version value using the
version
field. It automatically follows the behavior of the index or delete operation based on the_version
mapping. It also support theversion_type
.Routing
Each bulk item can include the routing value using the
routing
field. It automatically follows the behavior of the index or delete operation based on the_routing
mapping.NOTE: Data streams do not support custom routing unless they were created with the
allow_custom_routing
setting enabled in the template.Wait for active shards
When making bulk calls, you can set the
wait_for_active_shards
parameter to require a minimum number of shard copies to be active before starting to process the bulk request.Refresh
Control when the changes made by this request are visible to search.
NOTE: Only the shards that receive the bulk request will be affected by refresh. Imagine a
_bulk?refresh=wait_for
request with three documents in it that happen to be routed to different shards in an index with five shards. The request will only wait for those three shards to refresh. The other two shards that make up the index do not participate in the_bulk
request at all.- See Also:
- To use the
-
bulk
public final CompletableFuture<BulkResponse> bulk(Function<BulkRequest.Builder, ObjectBuilder<BulkRequest>> fn) Bulk index or delete documents. Perform multipleindex
,create
,delete
, andupdate
actions in a single request. This reduces overhead and can greatly increase indexing speed.If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:
- To use the
create
action, you must have thecreate_doc
,create
,index
, orwrite
index privilege. Data streams support only thecreate
action. - To use the
index
action, you must have thecreate
,index
, orwrite
index privilege. - To use the
delete
action, you must have thedelete
orwrite
index privilege. - To use the
update
action, you must have theindex
orwrite
index privilege. - To automatically create a data stream or index with a bulk API request,
you must have the
auto_configure
,create_index
, ormanage
index privilege. - To make the result of a bulk operation visible to search using the
refresh
parameter, you must have themaintenance
ormanage
index privilege.
Automatic data stream creation requires a matching index template with data stream enabled.
The actions are specified in the request body using a newline delimited JSON (NDJSON) structure:
action_and_meta_data\n optional_source\n action_and_meta_data\n optional_source\n .... action_and_meta_data\n optional_source\n
The
index
andcreate
actions expect a source on the next line and have the same semantics as theop_type
parameter in the standard index API. Acreate
action fails if a document with the same ID already exists in the target Anindex
action adds or replaces a document as necessary.NOTE: Data streams support only the
create
action. To update or delete a document in a data stream, you must target the backing index containing the document.An
update
action expects that the partial doc, upsert, and script and its options are specified on the next line.A
delete
action does not expect a source on the next line and has the same semantics as the standard delete API.NOTE: The final line of data must end with a newline character (
\n
). Each newline character may be preceded by a carriage return (\r
). When sending NDJSON data to the_bulk
endpoint, use aContent-Type
header ofapplication/json
orapplication/x-ndjson
. Because this format uses literal newline characters (\n
) as delimiters, make sure that the JSON actions and sources are not pretty printed.If you provide a target in the request path, it is used for any actions that don't explicitly specify an
_index
argument.A note on the format: the idea here is to make processing as fast as possible. As some of the actions are redirected to other shards on other nodes, only
action_meta_data
is parsed on the receiving node side.Client libraries using this protocol should try and strive to do something similar on the client side, and reduce buffering as much as possible.
There is no "correct" number of actions to perform in a single bulk request. Experiment with different settings to find the optimal size for your particular workload. Note that Elasticsearch limits the maximum size of a HTTP request to 100mb by default so clients must ensure that no request exceeds this size. It is not possible to index a single document that exceeds the size limit, so you must pre-process any such documents into smaller pieces before sending them to Elasticsearch. For instance, split documents into pages or chapters before indexing them, or store raw binary data in a system outside Elasticsearch and replace the raw data with a link to the external system in the documents that you send to Elasticsearch.
Client suppport for bulk requests
Some of the officially supported clients provide helpers to assist with bulk requests and reindexing:
- Go: Check out
esutil.BulkIndexer
- Perl: Check out
Search::Elasticsearch::Client::5_0::Bulk
andSearch::Elasticsearch::Client::5_0::Scroll
- Python: Check out
elasticsearch.helpers.*
- JavaScript: Check out
client.helpers.*
- .NET: Check out
BulkAllObservable
- PHP: Check out bulk indexing.
Submitting bulk requests with cURL
If you're providing text file input to
curl
, you must use the--data-binary
flag instead of plain-d
. The latter doesn't preserve newlines. For example:$ cat requests { "index" : { "_index" : "test", "_id" : "1" } } { "field1" : "value1" } $ curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary "@requests"; echo {"took":7, "errors": false, "items":[{"index":{"_index":"test","_id":"1","_version":1,"result":"created","forced_refresh":false}}]}
Optimistic concurrency control
Each
index
anddelete
action within a bulk API call may include theif_seq_no
andif_primary_term
parameters in their respective action and meta data lines. Theif_seq_no
andif_primary_term
parameters control how operations are run, based on the last modification to existing documents. See Optimistic concurrency control for more details.Versioning
Each bulk item can include the version value using the
version
field. It automatically follows the behavior of the index or delete operation based on the_version
mapping. It also support theversion_type
.Routing
Each bulk item can include the routing value using the
routing
field. It automatically follows the behavior of the index or delete operation based on the_routing
mapping.NOTE: Data streams do not support custom routing unless they were created with the
allow_custom_routing
setting enabled in the template.Wait for active shards
When making bulk calls, you can set the
wait_for_active_shards
parameter to require a minimum number of shard copies to be active before starting to process the bulk request.Refresh
Control when the changes made by this request are visible to search.
NOTE: Only the shards that receive the bulk request will be affected by refresh. Imagine a
_bulk?refresh=wait_for
request with three documents in it that happen to be routed to different shards in an index with five shards. The request will only wait for those three shards to refresh. The other two shards that make up the index do not participate in the_bulk
request at all.- Parameters:
fn
- a function that initializes a builder to create theBulkRequest
- See Also:
- To use the
-
bulk
Bulk index or delete documents. Perform multipleindex
,create
,delete
, andupdate
actions in a single request. This reduces overhead and can greatly increase indexing speed.If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:
- To use the
create
action, you must have thecreate_doc
,create
,index
, orwrite
index privilege. Data streams support only thecreate
action. - To use the
index
action, you must have thecreate
,index
, orwrite
index privilege. - To use the
delete
action, you must have thedelete
orwrite
index privilege. - To use the
update
action, you must have theindex
orwrite
index privilege. - To automatically create a data stream or index with a bulk API request,
you must have the
auto_configure
,create_index
, ormanage
index privilege. - To make the result of a bulk operation visible to search using the
refresh
parameter, you must have themaintenance
ormanage
index privilege.
Automatic data stream creation requires a matching index template with data stream enabled.
The actions are specified in the request body using a newline delimited JSON (NDJSON) structure:
action_and_meta_data\n optional_source\n action_and_meta_data\n optional_source\n .... action_and_meta_data\n optional_source\n
The
index
andcreate
actions expect a source on the next line and have the same semantics as theop_type
parameter in the standard index API. Acreate
action fails if a document with the same ID already exists in the target Anindex
action adds or replaces a document as necessary.NOTE: Data streams support only the
create
action. To update or delete a document in a data stream, you must target the backing index containing the document.An
update
action expects that the partial doc, upsert, and script and its options are specified on the next line.A
delete
action does not expect a source on the next line and has the same semantics as the standard delete API.NOTE: The final line of data must end with a newline character (
\n
). Each newline character may be preceded by a carriage return (\r
). When sending NDJSON data to the_bulk
endpoint, use aContent-Type
header ofapplication/json
orapplication/x-ndjson
. Because this format uses literal newline characters (\n
) as delimiters, make sure that the JSON actions and sources are not pretty printed.If you provide a target in the request path, it is used for any actions that don't explicitly specify an
_index
argument.A note on the format: the idea here is to make processing as fast as possible. As some of the actions are redirected to other shards on other nodes, only
action_meta_data
is parsed on the receiving node side.Client libraries using this protocol should try and strive to do something similar on the client side, and reduce buffering as much as possible.
There is no "correct" number of actions to perform in a single bulk request. Experiment with different settings to find the optimal size for your particular workload. Note that Elasticsearch limits the maximum size of a HTTP request to 100mb by default so clients must ensure that no request exceeds this size. It is not possible to index a single document that exceeds the size limit, so you must pre-process any such documents into smaller pieces before sending them to Elasticsearch. For instance, split documents into pages or chapters before indexing them, or store raw binary data in a system outside Elasticsearch and replace the raw data with a link to the external system in the documents that you send to Elasticsearch.
Client suppport for bulk requests
Some of the officially supported clients provide helpers to assist with bulk requests and reindexing:
- Go: Check out
esutil.BulkIndexer
- Perl: Check out
Search::Elasticsearch::Client::5_0::Bulk
andSearch::Elasticsearch::Client::5_0::Scroll
- Python: Check out
elasticsearch.helpers.*
- JavaScript: Check out
client.helpers.*
- .NET: Check out
BulkAllObservable
- PHP: Check out bulk indexing.
Submitting bulk requests with cURL
If you're providing text file input to
curl
, you must use the--data-binary
flag instead of plain-d
. The latter doesn't preserve newlines. For example:$ cat requests { "index" : { "_index" : "test", "_id" : "1" } } { "field1" : "value1" } $ curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary "@requests"; echo {"took":7, "errors": false, "items":[{"index":{"_index":"test","_id":"1","_version":1,"result":"created","forced_refresh":false}}]}
Optimistic concurrency control
Each
index
anddelete
action within a bulk API call may include theif_seq_no
andif_primary_term
parameters in their respective action and meta data lines. Theif_seq_no
andif_primary_term
parameters control how operations are run, based on the last modification to existing documents. See Optimistic concurrency control for more details.Versioning
Each bulk item can include the version value using the
version
field. It automatically follows the behavior of the index or delete operation based on the_version
mapping. It also support theversion_type
.Routing
Each bulk item can include the routing value using the
routing
field. It automatically follows the behavior of the index or delete operation based on the_routing
mapping.NOTE: Data streams do not support custom routing unless they were created with the
allow_custom_routing
setting enabled in the template.Wait for active shards
When making bulk calls, you can set the
wait_for_active_shards
parameter to require a minimum number of shard copies to be active before starting to process the bulk request.Refresh
Control when the changes made by this request are visible to search.
NOTE: Only the shards that receive the bulk request will be affected by refresh. Imagine a
_bulk?refresh=wait_for
request with three documents in it that happen to be routed to different shards in an index with five shards. The request will only wait for those three shards to refresh. The other two shards that make up the index do not participate in the_bulk
request at all.- See Also:
- To use the
-
clearScroll
Clear a scrolling search. Clear the search context and results for a scrolling search.- See Also:
-
clearScroll
public final CompletableFuture<ClearScrollResponse> clearScroll(Function<ClearScrollRequest.Builder, ObjectBuilder<ClearScrollRequest>> fn) Clear a scrolling search. Clear the search context and results for a scrolling search.- Parameters:
fn
- a function that initializes a builder to create theClearScrollRequest
- See Also:
-
clearScroll
Clear a scrolling search. Clear the search context and results for a scrolling search.- See Also:
-
closePointInTime
public CompletableFuture<ClosePointInTimeResponse> closePointInTime(ClosePointInTimeRequest request) Close a point in time. A point in time must be opened explicitly before being used in search requests. Thekeep_alive
parameter tells Elasticsearch how long it should persist. A point in time is automatically closed when thekeep_alive
period has elapsed. However, keeping points in time has a cost; close them as soon as they are no longer required for search requests.- See Also:
-
closePointInTime
public final CompletableFuture<ClosePointInTimeResponse> closePointInTime(Function<ClosePointInTimeRequest.Builder, ObjectBuilder<ClosePointInTimeRequest>> fn) Close a point in time. A point in time must be opened explicitly before being used in search requests. Thekeep_alive
parameter tells Elasticsearch how long it should persist. A point in time is automatically closed when thekeep_alive
period has elapsed. However, keeping points in time has a cost; close them as soon as they are no longer required for search requests.- Parameters:
fn
- a function that initializes a builder to create theClosePointInTimeRequest
- See Also:
-
count
Count search results. Get the number of documents matching a query.The query can be provided either by using a simple query string as a parameter, or by defining Query DSL within the request body. The query is optional. When no query is provided, the API uses
match_all
to count all the documents.The count API supports multi-target syntax. You can run a single count API search across multiple data streams and indices.
The operation is broadcast across all shards. For each shard ID group, a replica is chosen and the search is run against it. This means that replicas increase the scalability of the count.
- See Also:
-
count
public final CompletableFuture<CountResponse> count(Function<CountRequest.Builder, ObjectBuilder<CountRequest>> fn) Count search results. Get the number of documents matching a query.The query can be provided either by using a simple query string as a parameter, or by defining Query DSL within the request body. The query is optional. When no query is provided, the API uses
match_all
to count all the documents.The count API supports multi-target syntax. You can run a single count API search across multiple data streams and indices.
The operation is broadcast across all shards. For each shard ID group, a replica is chosen and the search is run against it. This means that replicas increase the scalability of the count.
- Parameters:
fn
- a function that initializes a builder to create theCountRequest
- See Also:
-
count
Count search results. Get the number of documents matching a query.The query can be provided either by using a simple query string as a parameter, or by defining Query DSL within the request body. The query is optional. When no query is provided, the API uses
match_all
to count all the documents.The count API supports multi-target syntax. You can run a single count API search across multiple data streams and indices.
The operation is broadcast across all shards. For each shard ID group, a replica is chosen and the search is run against it. This means that replicas increase the scalability of the count.
- See Also:
-
create
Create a new document in the index.You can index a new JSON document with the
/<target>/_doc/
or/<target>/_create/<_id>
APIs Using_create
guarantees that the document is indexed only if it does not already exist. It returns a 409 response when a document with a same ID already exists in the index. To update an existing document, you must use the/<target>/_doc/
API.If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:
- To add a document using the
PUT /<target>/_create/<_id>
orPOST /<target>/_create/<_id>
request formats, you must have thecreate_doc
,create
,index
, orwrite
index privilege. - To automatically create a data stream or index with this API request, you
must have the
auto_configure
,create_index
, ormanage
index privilege.
Automatic data stream creation requires a matching index template with data stream enabled.
Automatically create data streams and indices
If the request's target doesn't exist and matches an index template with a
data_stream
definition, the index operation automatically creates the data stream.If the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.
NOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.
If no mapping exists, the index operation creates a dynamic mapping. By default, new fields and objects are automatically added to the mapping if needed.
Automatic index creation is controlled by the
action.auto_create_index
setting. If it istrue
, any index can be created automatically. You can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it tofalse
to turn off automatic index creation entirely. Specify a comma-separated list of patterns you want to allow or prefix each pattern with+
or-
to indicate whether it should be allowed or blocked. When a list is specified, the default behaviour is to disallow.NOTE: The
action.auto_create_index
setting affects the automatic creation of indices only. It does not affect the creation of data streams.Routing
By default, shard placement — or routing — is controlled by using a hash of the document's ID value. For more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the
routing
parameter.When setting up explicit mapping, you can also use the
_routing
field to direct the index operation to extract the routing value from the document itself. This does come at the (very minimal) cost of an additional document parsing pass. If the_routing
mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.NOTE: Data streams do not support custom routing unless they were created with the
allow_custom_routing
setting enabled in the template.Distributed
The index operation is directed to the primary shard based on its route and performed on the actual node containing this shard. After the primary shard completes the operation, if needed, the update is distributed to applicable replicas.
Active shards
To improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation. If the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs. By default, write operations only wait for the primary shards to be active before proceeding (that is to say
wait_for_active_shards
is1
). This default can be overridden in the index settings dynamically by settingindex.write.wait_for_active_shards
. To alter this behavior per operation, use thewait_for_active_shards request
parameter.Valid values are all or any positive integer up to the total number of configured copies per shard in the index (which is
number_of_replicas
+1). Specifying a negative value or a number greater than the number of shard copies will throw an error.For example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes). If you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding. This means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data. If
wait_for_active_shards
is set on the request to3
(and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding. This requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard. However, if you setwait_for_active_shards
toall
(or to4
, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index. The operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.It is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts. After the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary. The
_shards
section of the API response reveals the number of shard copies on which replication succeeded and failed.- See Also:
- To add a document using the
-
create
public final <TDocument> CompletableFuture<CreateResponse> create(Function<CreateRequest.Builder<TDocument>, ObjectBuilder<CreateRequest<TDocument>>> fn) Create a new document in the index.You can index a new JSON document with the
/<target>/_doc/
or/<target>/_create/<_id>
APIs Using_create
guarantees that the document is indexed only if it does not already exist. It returns a 409 response when a document with a same ID already exists in the index. To update an existing document, you must use the/<target>/_doc/
API.If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:
- To add a document using the
PUT /<target>/_create/<_id>
orPOST /<target>/_create/<_id>
request formats, you must have thecreate_doc
,create
,index
, orwrite
index privilege. - To automatically create a data stream or index with this API request, you
must have the
auto_configure
,create_index
, ormanage
index privilege.
Automatic data stream creation requires a matching index template with data stream enabled.
Automatically create data streams and indices
If the request's target doesn't exist and matches an index template with a
data_stream
definition, the index operation automatically creates the data stream.If the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.
NOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.
If no mapping exists, the index operation creates a dynamic mapping. By default, new fields and objects are automatically added to the mapping if needed.
Automatic index creation is controlled by the
action.auto_create_index
setting. If it istrue
, any index can be created automatically. You can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it tofalse
to turn off automatic index creation entirely. Specify a comma-separated list of patterns you want to allow or prefix each pattern with+
or-
to indicate whether it should be allowed or blocked. When a list is specified, the default behaviour is to disallow.NOTE: The
action.auto_create_index
setting affects the automatic creation of indices only. It does not affect the creation of data streams.Routing
By default, shard placement — or routing — is controlled by using a hash of the document's ID value. For more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the
routing
parameter.When setting up explicit mapping, you can also use the
_routing
field to direct the index operation to extract the routing value from the document itself. This does come at the (very minimal) cost of an additional document parsing pass. If the_routing
mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.NOTE: Data streams do not support custom routing unless they were created with the
allow_custom_routing
setting enabled in the template.Distributed
The index operation is directed to the primary shard based on its route and performed on the actual node containing this shard. After the primary shard completes the operation, if needed, the update is distributed to applicable replicas.
Active shards
To improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation. If the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs. By default, write operations only wait for the primary shards to be active before proceeding (that is to say
wait_for_active_shards
is1
). This default can be overridden in the index settings dynamically by settingindex.write.wait_for_active_shards
. To alter this behavior per operation, use thewait_for_active_shards request
parameter.Valid values are all or any positive integer up to the total number of configured copies per shard in the index (which is
number_of_replicas
+1). Specifying a negative value or a number greater than the number of shard copies will throw an error.For example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes). If you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding. This means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data. If
wait_for_active_shards
is set on the request to3
(and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding. This requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard. However, if you setwait_for_active_shards
toall
(or to4
, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index. The operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.It is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts. After the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary. The
_shards
section of the API response reveals the number of shard copies on which replication succeeded and failed.- Parameters:
fn
- a function that initializes a builder to create theCreateRequest
- See Also:
- To add a document using the
-
delete
Delete a document.Remove a JSON document from the specified index.
NOTE: You cannot send deletion requests directly to a data stream. To delete a document in a data stream, you must target the backing index containing the document.
Optimistic concurrency control
Delete operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the
if_seq_no
andif_primary_term
parameters. If a mismatch is detected, the operation will result in aVersionConflictException
and a status code of409
.Versioning
Each document indexed is versioned. When deleting a document, the version can be specified to make sure the relevant document you are trying to delete is actually being deleted and it has not changed in the meantime. Every write operation run on a document, deletes included, causes its version to be incremented. The version number of a deleted document remains available for a short time after deletion to allow for control of concurrent operations. The length of time for which a deleted document's version remains available is determined by the
index.gc_deletes
index setting.Routing
If routing is used during indexing, the routing value also needs to be specified to delete a document.
If the
_routing
mapping is set torequired
and no routing value is specified, the delete API throws aRoutingMissingException
and rejects the request.For example:
DELETE /my-index-000001/_doc/1?routing=shard-1
This request deletes the document with ID 1, but it is routed based on the user. The document is not deleted if the correct routing is not specified.
Distributed
The delete operation gets hashed into a specific shard ID. It then gets redirected into the primary shard within that ID group and replicated (if needed) to shard replicas within that ID group.
- See Also:
-
delete
public final CompletableFuture<DeleteResponse> delete(Function<DeleteRequest.Builder, ObjectBuilder<DeleteRequest>> fn) Delete a document.Remove a JSON document from the specified index.
NOTE: You cannot send deletion requests directly to a data stream. To delete a document in a data stream, you must target the backing index containing the document.
Optimistic concurrency control
Delete operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the
if_seq_no
andif_primary_term
parameters. If a mismatch is detected, the operation will result in aVersionConflictException
and a status code of409
.Versioning
Each document indexed is versioned. When deleting a document, the version can be specified to make sure the relevant document you are trying to delete is actually being deleted and it has not changed in the meantime. Every write operation run on a document, deletes included, causes its version to be incremented. The version number of a deleted document remains available for a short time after deletion to allow for control of concurrent operations. The length of time for which a deleted document's version remains available is determined by the
index.gc_deletes
index setting.Routing
If routing is used during indexing, the routing value also needs to be specified to delete a document.
If the
_routing
mapping is set torequired
and no routing value is specified, the delete API throws aRoutingMissingException
and rejects the request.For example:
DELETE /my-index-000001/_doc/1?routing=shard-1
This request deletes the document with ID 1, but it is routed based on the user. The document is not deleted if the correct routing is not specified.
Distributed
The delete operation gets hashed into a specific shard ID. It then gets redirected into the primary shard within that ID group and replicated (if needed) to shard replicas within that ID group.
- Parameters:
fn
- a function that initializes a builder to create theDeleteRequest
- See Also:
-
deleteByQuery
Delete documents.Deletes documents that match the specified query.
If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or alias:
read
delete
orwrite
You can specify the query criteria in the request URI or the request body using the same syntax as the search API. When you submit a delete by query request, Elasticsearch gets a snapshot of the data stream or index when it begins processing the request and deletes matching documents using internal versioning. If a document changes between the time that the snapshot is taken and the delete operation is processed, it results in a version conflict and the delete operation fails.
NOTE: Documents with a version equal to 0 cannot be deleted using delete by query because internal versioning does not support 0 as a valid version number.
While processing a delete by query request, Elasticsearch performs multiple search requests sequentially to find all of the matching documents to delete. A bulk delete request is performed for each batch of matching documents. If a search or bulk request is rejected, the requests are retried up to 10 times, with exponential back off. If the maximum retry limit is reached, processing halts and all failed requests are returned in the response. Any delete requests that completed successfully still stick, they are not rolled back.
You can opt to count version conflicts instead of halting and returning by setting
conflicts
toproceed
. Note that if you opt to count version conflicts the operation could attempt to delete more documents from the source thanmax_docs
until it has successfully deletedmax_docs documents
, or it has gone through every document in the source query.Throttling delete requests
To control the rate at which delete by query issues batches of delete operations, you can set
requests_per_second
to any positive decimal number. This pads each batch with a wait time to throttle the rate. Setrequests_per_second
to-1
to disable throttling.Throttling uses a wait time between batches so that the internal scroll requests can be given a timeout that takes the request padding into account. The padding time is the difference between the batch size divided by the
requests_per_second
and the time spent writing. By default the batch size is1000
, so ifrequests_per_second
is set to500
:target_time = 1000 / 500 per second = 2 seconds wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
Since the batch is issued as a single
_bulk
request, large batch sizes cause Elasticsearch to create many requests and wait before starting the next set. This is "bursty" instead of "smooth".Slicing
Delete by query supports sliced scroll to parallelize the delete process. This can improve efficiency and provide a convenient way to break the request down into smaller parts.
Setting
slices
toauto
lets Elasticsearch choose the number of slices to use. This setting will use one slice per shard, up to a certain limit. If there are multiple source data streams or indices, it will choose the number of slices based on the index or backing index with the smallest number of shards. Adding slices to the delete by query operation creates sub-requests which means it has some quirks:- You can see these requests in the tasks APIs. These sub-requests are "child" tasks of the task for the request with slices.
- Fetching the status of the task for the request with slices only contains the status of completed slices.
- These sub-requests are individually addressable for things like cancellation and rethrottling.
- Rethrottling the request with
slices
will rethrottle the unfinished sub-request proportionally. - Canceling the request with
slices
will cancel each sub-request. - Due to the nature of
slices
each sub-request won't get a perfectly even portion of the documents. All documents will be addressed, but some slices may be larger than others. Expect larger slices to have a more even distribution. - Parameters like
requests_per_second
andmax_docs
on a request withslices
are distributed proportionally to each sub-request. Combine that with the earlier point about distribution being uneven and you should conclude that usingmax_docs
withslices
might not result in exactlymax_docs
documents being deleted. - Each sub-request gets a slightly different snapshot of the source data stream or index though these are all taken at approximately the same time.
If you're slicing manually or otherwise tuning automatic slicing, keep in mind that:
- Query performance is most efficient when the number of slices is equal to
the number of shards in the index or backing index. If that number is large
(for example, 500), choose a lower number as too many
slices
hurts performance. Settingslices
higher than the number of shards generally does not improve efficiency and adds overhead. - Delete performance scales linearly across available resources with the number of slices.
Whether query or delete performance dominates the runtime depends on the documents being reindexed and cluster resources.
Cancel a delete by query operation
Any delete by query can be canceled using the task cancel API. For example:
POST _tasks/r1A2WoRbTwKZ516z6NEs5A:36619/_cancel
The task ID can be found by using the get tasks API.
Cancellation should happen quickly but might take a few seconds. The get task status API will continue to list the delete by query task until this task checks that it has been cancelled and terminates itself.
- See Also:
-
deleteByQuery
public final CompletableFuture<DeleteByQueryResponse> deleteByQuery(Function<DeleteByQueryRequest.Builder, ObjectBuilder<DeleteByQueryRequest>> fn) Delete documents.Deletes documents that match the specified query.
If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or alias:
read
delete
orwrite
You can specify the query criteria in the request URI or the request body using the same syntax as the search API. When you submit a delete by query request, Elasticsearch gets a snapshot of the data stream or index when it begins processing the request and deletes matching documents using internal versioning. If a document changes between the time that the snapshot is taken and the delete operation is processed, it results in a version conflict and the delete operation fails.
NOTE: Documents with a version equal to 0 cannot be deleted using delete by query because internal versioning does not support 0 as a valid version number.
While processing a delete by query request, Elasticsearch performs multiple search requests sequentially to find all of the matching documents to delete. A bulk delete request is performed for each batch of matching documents. If a search or bulk request is rejected, the requests are retried up to 10 times, with exponential back off. If the maximum retry limit is reached, processing halts and all failed requests are returned in the response. Any delete requests that completed successfully still stick, they are not rolled back.
You can opt to count version conflicts instead of halting and returning by setting
conflicts
toproceed
. Note that if you opt to count version conflicts the operation could attempt to delete more documents from the source thanmax_docs
until it has successfully deletedmax_docs documents
, or it has gone through every document in the source query.Throttling delete requests
To control the rate at which delete by query issues batches of delete operations, you can set
requests_per_second
to any positive decimal number. This pads each batch with a wait time to throttle the rate. Setrequests_per_second
to-1
to disable throttling.Throttling uses a wait time between batches so that the internal scroll requests can be given a timeout that takes the request padding into account. The padding time is the difference between the batch size divided by the
requests_per_second
and the time spent writing. By default the batch size is1000
, so ifrequests_per_second
is set to500
:target_time = 1000 / 500 per second = 2 seconds wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
Since the batch is issued as a single
_bulk
request, large batch sizes cause Elasticsearch to create many requests and wait before starting the next set. This is "bursty" instead of "smooth".Slicing
Delete by query supports sliced scroll to parallelize the delete process. This can improve efficiency and provide a convenient way to break the request down into smaller parts.
Setting
slices
toauto
lets Elasticsearch choose the number of slices to use. This setting will use one slice per shard, up to a certain limit. If there are multiple source data streams or indices, it will choose the number of slices based on the index or backing index with the smallest number of shards. Adding slices to the delete by query operation creates sub-requests which means it has some quirks:- You can see these requests in the tasks APIs. These sub-requests are "child" tasks of the task for the request with slices.
- Fetching the status of the task for the request with slices only contains the status of completed slices.
- These sub-requests are individually addressable for things like cancellation and rethrottling.
- Rethrottling the request with
slices
will rethrottle the unfinished sub-request proportionally. - Canceling the request with
slices
will cancel each sub-request. - Due to the nature of
slices
each sub-request won't get a perfectly even portion of the documents. All documents will be addressed, but some slices may be larger than others. Expect larger slices to have a more even distribution. - Parameters like
requests_per_second
andmax_docs
on a request withslices
are distributed proportionally to each sub-request. Combine that with the earlier point about distribution being uneven and you should conclude that usingmax_docs
withslices
might not result in exactlymax_docs
documents being deleted. - Each sub-request gets a slightly different snapshot of the source data stream or index though these are all taken at approximately the same time.
If you're slicing manually or otherwise tuning automatic slicing, keep in mind that:
- Query performance is most efficient when the number of slices is equal to
the number of shards in the index or backing index. If that number is large
(for example, 500), choose a lower number as too many
slices
hurts performance. Settingslices
higher than the number of shards generally does not improve efficiency and adds overhead. - Delete performance scales linearly across available resources with the number of slices.
Whether query or delete performance dominates the runtime depends on the documents being reindexed and cluster resources.
Cancel a delete by query operation
Any delete by query can be canceled using the task cancel API. For example:
POST _tasks/r1A2WoRbTwKZ516z6NEs5A:36619/_cancel
The task ID can be found by using the get tasks API.
Cancellation should happen quickly but might take a few seconds. The get task status API will continue to list the delete by query task until this task checks that it has been cancelled and terminates itself.
- Parameters:
fn
- a function that initializes a builder to create theDeleteByQueryRequest
- See Also:
-
deleteByQueryRethrottle
public CompletableFuture<DeleteByQueryRethrottleResponse> deleteByQueryRethrottle(DeleteByQueryRethrottleRequest request) Throttle a delete by query operation.Change the number of requests per second for a particular delete by query operation. Rethrottling that speeds up the query takes effect immediately but rethrotting that slows down the query takes effect after completing the current batch to prevent scroll timeouts.
- See Also:
-
deleteByQueryRethrottle
public final CompletableFuture<DeleteByQueryRethrottleResponse> deleteByQueryRethrottle(Function<DeleteByQueryRethrottleRequest.Builder, ObjectBuilder<DeleteByQueryRethrottleRequest>> fn) Throttle a delete by query operation.Change the number of requests per second for a particular delete by query operation. Rethrottling that speeds up the query takes effect immediately but rethrotting that slows down the query takes effect after completing the current batch to prevent scroll timeouts.
- Parameters:
fn
- a function that initializes a builder to create theDeleteByQueryRethrottleRequest
- See Also:
-
deleteScript
Delete a script or search template. Deletes a stored script or search template.- See Also:
-
deleteScript
public final CompletableFuture<DeleteScriptResponse> deleteScript(Function<DeleteScriptRequest.Builder, ObjectBuilder<DeleteScriptRequest>> fn) Delete a script or search template. Deletes a stored script or search template.- Parameters:
fn
- a function that initializes a builder to create theDeleteScriptRequest
- See Also:
-
exists
Check a document.Verify that a document exists. For example, check to see if a document with the
_id
0 exists:HEAD my-index-000001/_doc/0
If the document exists, the API returns a status code of
200 - OK
. If the document doesn’t exist, the API returns404 - Not Found
.Versioning support
You can use the
version
parameter to check the document only if its current version is equal to the specified one.Internally, Elasticsearch has marked the old document as deleted and added an entirely new document. The old version of the document doesn't disappear immediately, although you won't be able to access it. Elasticsearch cleans up deleted documents in the background as you continue to index more data.
- See Also:
-
exists
public final CompletableFuture<BooleanResponse> exists(Function<ExistsRequest.Builder, ObjectBuilder<ExistsRequest>> fn) Check a document.Verify that a document exists. For example, check to see if a document with the
_id
0 exists:HEAD my-index-000001/_doc/0
If the document exists, the API returns a status code of
200 - OK
. If the document doesn’t exist, the API returns404 - Not Found
.Versioning support
You can use the
version
parameter to check the document only if its current version is equal to the specified one.Internally, Elasticsearch has marked the old document as deleted and added an entirely new document. The old version of the document doesn't disappear immediately, although you won't be able to access it. Elasticsearch cleans up deleted documents in the background as you continue to index more data.
- Parameters:
fn
- a function that initializes a builder to create theExistsRequest
- See Also:
-
existsSource
Check for a document source.Check whether a document source exists in an index. For example:
HEAD my-index-000001/_source/1
A document's source is not available if it is disabled in the mapping.
- See Also:
-
existsSource
public final CompletableFuture<BooleanResponse> existsSource(Function<ExistsSourceRequest.Builder, ObjectBuilder<ExistsSourceRequest>> fn) Check for a document source.Check whether a document source exists in an index. For example:
HEAD my-index-000001/_source/1
A document's source is not available if it is disabled in the mapping.
- Parameters:
fn
- a function that initializes a builder to create theExistsSourceRequest
- See Also:
-
explain
public <TDocument> CompletableFuture<ExplainResponse<TDocument>> explain(ExplainRequest request, Class<TDocument> tDocumentClass) Explain a document match result. Get information about why a specific document matches, or doesn't match, a query. It computes a score explanation for a query and a specific document.- See Also:
-
explain
public final <TDocument> CompletableFuture<ExplainResponse<TDocument>> explain(Function<ExplainRequest.Builder, ObjectBuilder<ExplainRequest>> fn, Class<TDocument> tDocumentClass) Explain a document match result. Get information about why a specific document matches, or doesn't match, a query. It computes a score explanation for a query and a specific document.- Parameters:
fn
- a function that initializes a builder to create theExplainRequest
- See Also:
-
explain
Overload ofexplain(ExplainRequest, Class)
, where Class is defined as Void, meaning the documents will not be deserialized. -
explain
public final CompletableFuture<ExplainResponse<Void>> explain(Function<ExplainRequest.Builder, ObjectBuilder<ExplainRequest>> fn) Overload ofexplain(Function, Class)
, where Class is defined as Void, meaning the documents will not be deserialized. -
explain
public <TDocument> CompletableFuture<ExplainResponse<TDocument>> explain(ExplainRequest request, Type tDocumentType) Explain a document match result. Get information about why a specific document matches, or doesn't match, a query. It computes a score explanation for a query and a specific document.- See Also:
-
explain
public final <TDocument> CompletableFuture<ExplainResponse<TDocument>> explain(Function<ExplainRequest.Builder, ObjectBuilder<ExplainRequest>> fn, Type tDocumentType) Explain a document match result. Get information about why a specific document matches, or doesn't match, a query. It computes a score explanation for a query and a specific document.- Parameters:
fn
- a function that initializes a builder to create theExplainRequest
- See Also:
-
fieldCaps
Get the field capabilities.Get information about the capabilities of fields among multiple indices.
For data streams, the API returns field capabilities among the stream’s backing indices. It returns runtime fields like any other field. For example, a runtime field with a type of keyword is returned the same as any other field that belongs to the
keyword
family.- See Also:
-
fieldCaps
public final CompletableFuture<FieldCapsResponse> fieldCaps(Function<FieldCapsRequest.Builder, ObjectBuilder<FieldCapsRequest>> fn) Get the field capabilities.Get information about the capabilities of fields among multiple indices.
For data streams, the API returns field capabilities among the stream’s backing indices. It returns runtime fields like any other field. For example, a runtime field with a type of keyword is returned the same as any other field that belongs to the
keyword
family.- Parameters:
fn
- a function that initializes a builder to create theFieldCapsRequest
- See Also:
-
fieldCaps
Get the field capabilities.Get information about the capabilities of fields among multiple indices.
For data streams, the API returns field capabilities among the stream’s backing indices. It returns runtime fields like any other field. For example, a runtime field with a type of keyword is returned the same as any other field that belongs to the
keyword
family.- See Also:
-
get
public <TDocument> CompletableFuture<GetResponse<TDocument>> get(GetRequest request, Class<TDocument> tDocumentClass) Get a document by its ID.Get a document and its source or stored fields from an index.
By default, this API is realtime and is not affected by the refresh rate of the index (when data will become visible for search). In the case where stored fields are requested with the
stored_fields
parameter and the document has been updated but is not yet refreshed, the API will have to parse and analyze the source to extract the stored fields. To turn off realtime behavior, set therealtime
parameter to false.Source filtering
By default, the API returns the contents of the
_source
field unless you have used thestored_fields
parameter or the_source
field is turned off. You can turn off_source
retrieval by using the_source
parameter:GET my-index-000001/_doc/0?_source=false
If you only need one or two fields from the
_source
, use the_source_includes
or_source_excludes
parameters to include or filter out particular fields. This can be helpful with large documents where partial retrieval can save on network overhead Both parameters take a comma separated list of fields or wildcard expressions. For example:GET my-index-000001/_doc/0?_source_includes=*.id&_source_excludes=entities
If you only want to specify includes, you can use a shorter notation:
GET my-index-000001/_doc/0?_source=*.id
Routing
If routing is used during indexing, the routing value also needs to be specified to retrieve a document. For example:
GET my-index-000001/_doc/2?routing=user1
This request gets the document with ID 2, but it is routed based on the user. The document is not fetched if the correct routing is not specified.
Distributed
The GET operation is hashed into a specific shard ID. It is then redirected to one of the replicas within that shard ID and returns the result. The replicas are the primary shard and its replicas within that shard ID group. This means that the more replicas you have, the better your GET scaling will be.
Versioning support
You can use the
version
parameter to retrieve the document only if its current version is equal to the specified one.Internally, Elasticsearch has marked the old document as deleted and added an entirely new document. The old version of the document doesn't disappear immediately, although you won't be able to access it. Elasticsearch cleans up deleted documents in the background as you continue to index more data.
- See Also:
-
get
public final <TDocument> CompletableFuture<GetResponse<TDocument>> get(Function<GetRequest.Builder, ObjectBuilder<GetRequest>> fn, Class<TDocument> tDocumentClass) Get a document by its ID.Get a document and its source or stored fields from an index.
By default, this API is realtime and is not affected by the refresh rate of the index (when data will become visible for search). In the case where stored fields are requested with the
stored_fields
parameter and the document has been updated but is not yet refreshed, the API will have to parse and analyze the source to extract the stored fields. To turn off realtime behavior, set therealtime
parameter to false.Source filtering
By default, the API returns the contents of the
_source
field unless you have used thestored_fields
parameter or the_source
field is turned off. You can turn off_source
retrieval by using the_source
parameter:GET my-index-000001/_doc/0?_source=false
If you only need one or two fields from the
_source
, use the_source_includes
or_source_excludes
parameters to include or filter out particular fields. This can be helpful with large documents where partial retrieval can save on network overhead Both parameters take a comma separated list of fields or wildcard expressions. For example:GET my-index-000001/_doc/0?_source_includes=*.id&_source_excludes=entities
If you only want to specify includes, you can use a shorter notation:
GET my-index-000001/_doc/0?_source=*.id
Routing
If routing is used during indexing, the routing value also needs to be specified to retrieve a document. For example:
GET my-index-000001/_doc/2?routing=user1
This request gets the document with ID 2, but it is routed based on the user. The document is not fetched if the correct routing is not specified.
Distributed
The GET operation is hashed into a specific shard ID. It is then redirected to one of the replicas within that shard ID and returns the result. The replicas are the primary shard and its replicas within that shard ID group. This means that the more replicas you have, the better your GET scaling will be.
Versioning support
You can use the
version
parameter to retrieve the document only if its current version is equal to the specified one.Internally, Elasticsearch has marked the old document as deleted and added an entirely new document. The old version of the document doesn't disappear immediately, although you won't be able to access it. Elasticsearch cleans up deleted documents in the background as you continue to index more data.
- Parameters:
fn
- a function that initializes a builder to create theGetRequest
- See Also:
-
get
Overload ofget(GetRequest, Class)
, where Class is defined as Void, meaning the documents will not be deserialized. -
get
public final CompletableFuture<GetResponse<Void>> get(Function<GetRequest.Builder, ObjectBuilder<GetRequest>> fn) Overload ofget(Function, Class)
, where Class is defined as Void, meaning the documents will not be deserialized. -
get
public <TDocument> CompletableFuture<GetResponse<TDocument>> get(GetRequest request, Type tDocumentType) Get a document by its ID.Get a document and its source or stored fields from an index.
By default, this API is realtime and is not affected by the refresh rate of the index (when data will become visible for search). In the case where stored fields are requested with the
stored_fields
parameter and the document has been updated but is not yet refreshed, the API will have to parse and analyze the source to extract the stored fields. To turn off realtime behavior, set therealtime
parameter to false.Source filtering
By default, the API returns the contents of the
_source
field unless you have used thestored_fields
parameter or the_source
field is turned off. You can turn off_source
retrieval by using the_source
parameter:GET my-index-000001/_doc/0?_source=false
If you only need one or two fields from the
_source
, use the_source_includes
or_source_excludes
parameters to include or filter out particular fields. This can be helpful with large documents where partial retrieval can save on network overhead Both parameters take a comma separated list of fields or wildcard expressions. For example:GET my-index-000001/_doc/0?_source_includes=*.id&_source_excludes=entities
If you only want to specify includes, you can use a shorter notation:
GET my-index-000001/_doc/0?_source=*.id
Routing
If routing is used during indexing, the routing value also needs to be specified to retrieve a document. For example:
GET my-index-000001/_doc/2?routing=user1
This request gets the document with ID 2, but it is routed based on the user. The document is not fetched if the correct routing is not specified.
Distributed
The GET operation is hashed into a specific shard ID. It is then redirected to one of the replicas within that shard ID and returns the result. The replicas are the primary shard and its replicas within that shard ID group. This means that the more replicas you have, the better your GET scaling will be.
Versioning support
You can use the
version
parameter to retrieve the document only if its current version is equal to the specified one.Internally, Elasticsearch has marked the old document as deleted and added an entirely new document. The old version of the document doesn't disappear immediately, although you won't be able to access it. Elasticsearch cleans up deleted documents in the background as you continue to index more data.
- See Also:
-
get
public final <TDocument> CompletableFuture<GetResponse<TDocument>> get(Function<GetRequest.Builder, ObjectBuilder<GetRequest>> fn, Type tDocumentType) Get a document by its ID.Get a document and its source or stored fields from an index.
By default, this API is realtime and is not affected by the refresh rate of the index (when data will become visible for search). In the case where stored fields are requested with the
stored_fields
parameter and the document has been updated but is not yet refreshed, the API will have to parse and analyze the source to extract the stored fields. To turn off realtime behavior, set therealtime
parameter to false.Source filtering
By default, the API returns the contents of the
_source
field unless you have used thestored_fields
parameter or the_source
field is turned off. You can turn off_source
retrieval by using the_source
parameter:GET my-index-000001/_doc/0?_source=false
If you only need one or two fields from the
_source
, use the_source_includes
or_source_excludes
parameters to include or filter out particular fields. This can be helpful with large documents where partial retrieval can save on network overhead Both parameters take a comma separated list of fields or wildcard expressions. For example:GET my-index-000001/_doc/0?_source_includes=*.id&_source_excludes=entities
If you only want to specify includes, you can use a shorter notation:
GET my-index-000001/_doc/0?_source=*.id
Routing
If routing is used during indexing, the routing value also needs to be specified to retrieve a document. For example:
GET my-index-000001/_doc/2?routing=user1
This request gets the document with ID 2, but it is routed based on the user. The document is not fetched if the correct routing is not specified.
Distributed
The GET operation is hashed into a specific shard ID. It is then redirected to one of the replicas within that shard ID and returns the result. The replicas are the primary shard and its replicas within that shard ID group. This means that the more replicas you have, the better your GET scaling will be.
Versioning support
You can use the
version
parameter to retrieve the document only if its current version is equal to the specified one.Internally, Elasticsearch has marked the old document as deleted and added an entirely new document. The old version of the document doesn't disappear immediately, although you won't be able to access it. Elasticsearch cleans up deleted documents in the background as you continue to index more data.
- Parameters:
fn
- a function that initializes a builder to create theGetRequest
- See Also:
-
getScript
Get a script or search template. Retrieves a stored script or search template.- See Also:
-
getScript
public final CompletableFuture<GetScriptResponse> getScript(Function<GetScriptRequest.Builder, ObjectBuilder<GetScriptRequest>> fn) Get a script or search template. Retrieves a stored script or search template.- Parameters:
fn
- a function that initializes a builder to create theGetScriptRequest
- See Also:
-
getScriptContext
Get script contexts.Get a list of supported script contexts and their methods.
- See Also:
-
getScriptLanguages
Get script languages.Get a list of available script types, languages, and contexts.
- See Also:
-
getSource
public <TDocument> CompletableFuture<GetSourceResponse<TDocument>> getSource(GetSourceRequest request, Class<TDocument> tDocumentClass) Get a document's source.Get the source of a document. For example:
GET my-index-000001/_source/1
You can use the source filtering parameters to control which parts of the
_source
are returned:GET my-index-000001/_source/1/?_source_includes=*.id&_source_excludes=entities
- See Also:
-
getSource
public final <TDocument> CompletableFuture<GetSourceResponse<TDocument>> getSource(Function<GetSourceRequest.Builder, ObjectBuilder<GetSourceRequest>> fn, Class<TDocument> tDocumentClass) Get a document's source.Get the source of a document. For example:
GET my-index-000001/_source/1
You can use the source filtering parameters to control which parts of the
_source
are returned:GET my-index-000001/_source/1/?_source_includes=*.id&_source_excludes=entities
- Parameters:
fn
- a function that initializes a builder to create theGetSourceRequest
- See Also:
-
getSource
Overload ofgetSource(GetSourceRequest, Class)
, where Class is defined as Void, meaning the documents will not be deserialized. -
getSource
public final CompletableFuture<GetSourceResponse<Void>> getSource(Function<GetSourceRequest.Builder, ObjectBuilder<GetSourceRequest>> fn) Overload ofgetSource(Function, Class)
, where Class is defined as Void, meaning the documents will not be deserialized. -
getSource
public <TDocument> CompletableFuture<GetSourceResponse<TDocument>> getSource(GetSourceRequest request, Type tDocumentType) Get a document's source.Get the source of a document. For example:
GET my-index-000001/_source/1
You can use the source filtering parameters to control which parts of the
_source
are returned:GET my-index-000001/_source/1/?_source_includes=*.id&_source_excludes=entities
- See Also:
-
getSource
public final <TDocument> CompletableFuture<GetSourceResponse<TDocument>> getSource(Function<GetSourceRequest.Builder, ObjectBuilder<GetSourceRequest>> fn, Type tDocumentType) Get a document's source.Get the source of a document. For example:
GET my-index-000001/_source/1
You can use the source filtering parameters to control which parts of the
_source
are returned:GET my-index-000001/_source/1/?_source_includes=*.id&_source_excludes=entities
- Parameters:
fn
- a function that initializes a builder to create theGetSourceRequest
- See Also:
-
healthReport
Get the cluster health. Get a report with the health status of an Elasticsearch cluster. The report contains a list of indicators that compose Elasticsearch functionality.Each indicator has a health status of: green, unknown, yellow or red. The indicator will provide an explanation and metadata describing the reason for its current health status.
The cluster’s status is controlled by the worst indicator status.
In the event that an indicator’s status is non-green, a list of impacts may be present in the indicator result which detail the functionalities that are negatively affected by the health issue. Each impact carries with it a severity level, an area of the system that is affected, and a simple description of the impact on the system.
Some health indicators can determine the root cause of a health problem and prescribe a set of steps that can be performed in order to improve the health of the system. The root cause and remediation steps are encapsulated in a diagnosis. A diagnosis contains a cause detailing a root cause analysis, an action containing a brief description of the steps to take to fix the problem, the list of affected resources (if applicable), and a detailed step-by-step troubleshooting guide to fix the diagnosed problem.
NOTE: The health indicators perform root cause analysis of non-green health statuses. This can be computationally expensive when called frequently. When setting up automated polling of the API for health status, set verbose to false to disable the more expensive analysis logic.
- See Also:
-
healthReport
public final CompletableFuture<HealthReportResponse> healthReport(Function<HealthReportRequest.Builder, ObjectBuilder<HealthReportRequest>> fn) Get the cluster health. Get a report with the health status of an Elasticsearch cluster. The report contains a list of indicators that compose Elasticsearch functionality.Each indicator has a health status of: green, unknown, yellow or red. The indicator will provide an explanation and metadata describing the reason for its current health status.
The cluster’s status is controlled by the worst indicator status.
In the event that an indicator’s status is non-green, a list of impacts may be present in the indicator result which detail the functionalities that are negatively affected by the health issue. Each impact carries with it a severity level, an area of the system that is affected, and a simple description of the impact on the system.
Some health indicators can determine the root cause of a health problem and prescribe a set of steps that can be performed in order to improve the health of the system. The root cause and remediation steps are encapsulated in a diagnosis. A diagnosis contains a cause detailing a root cause analysis, an action containing a brief description of the steps to take to fix the problem, the list of affected resources (if applicable), and a detailed step-by-step troubleshooting guide to fix the diagnosed problem.
NOTE: The health indicators perform root cause analysis of non-green health statuses. This can be computationally expensive when called frequently. When setting up automated polling of the API for health status, set verbose to false to disable the more expensive analysis logic.
- Parameters:
fn
- a function that initializes a builder to create theHealthReportRequest
- See Also:
-
healthReport
Get the cluster health. Get a report with the health status of an Elasticsearch cluster. The report contains a list of indicators that compose Elasticsearch functionality.Each indicator has a health status of: green, unknown, yellow or red. The indicator will provide an explanation and metadata describing the reason for its current health status.
The cluster’s status is controlled by the worst indicator status.
In the event that an indicator’s status is non-green, a list of impacts may be present in the indicator result which detail the functionalities that are negatively affected by the health issue. Each impact carries with it a severity level, an area of the system that is affected, and a simple description of the impact on the system.
Some health indicators can determine the root cause of a health problem and prescribe a set of steps that can be performed in order to improve the health of the system. The root cause and remediation steps are encapsulated in a diagnosis. A diagnosis contains a cause detailing a root cause analysis, an action containing a brief description of the steps to take to fix the problem, the list of affected resources (if applicable), and a detailed step-by-step troubleshooting guide to fix the diagnosed problem.
NOTE: The health indicators perform root cause analysis of non-green health statuses. This can be computationally expensive when called frequently. When setting up automated polling of the API for health status, set verbose to false to disable the more expensive analysis logic.
- See Also:
-
index
Create or update a document in an index.Add a JSON document to the specified data stream or index and make it searchable. If the target is an index and the document already exists, the request updates the document and increments its version.
NOTE: You cannot use this API to send update requests for existing documents in a data stream.
If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:
- To add or overwrite a document using the
PUT /<target>/_doc/<_id>
request format, you must have thecreate
,index
, orwrite
index privilege. - To add a document using the
POST /<target>/_doc/
request format, you must have thecreate_doc
,create
,index
, orwrite
index privilege. - To automatically create a data stream or index with this API request, you
must have the
auto_configure
,create_index
, ormanage
index privilege.
Automatic data stream creation requires a matching index template with data stream enabled.
NOTE: Replica shards might not all be started when an indexing operation returns successfully. By default, only the primary is required. Set
wait_for_active_shards
to change this default behavior.Automatically create data streams and indices
If the request's target doesn't exist and matches an index template with a
data_stream
definition, the index operation automatically creates the data stream.If the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.
NOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.
If no mapping exists, the index operation creates a dynamic mapping. By default, new fields and objects are automatically added to the mapping if needed.
Automatic index creation is controlled by the
action.auto_create_index
setting. If it istrue
, any index can be created automatically. You can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it tofalse
to turn off automatic index creation entirely. Specify a comma-separated list of patterns you want to allow or prefix each pattern with+
or-
to indicate whether it should be allowed or blocked. When a list is specified, the default behaviour is to disallow.NOTE: The
action.auto_create_index
setting affects the automatic creation of indices only. It does not affect the creation of data streams.Optimistic concurrency control
Index operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the
if_seq_no
andif_primary_term
parameters. If a mismatch is detected, the operation will result in aVersionConflictException
and a status code of409
.Routing
By default, shard placement — or routing — is controlled by using a hash of the document's ID value. For more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the
routing
parameter.When setting up explicit mapping, you can also use the
_routing
field to direct the index operation to extract the routing value from the document itself. This does come at the (very minimal) cost of an additional document parsing pass. If the_routing
mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.NOTE: Data streams do not support custom routing unless they were created with the
allow_custom_routing
setting enabled in the template.Distributed
The index operation is directed to the primary shard based on its route and performed on the actual node containing this shard. After the primary shard completes the operation, if needed, the update is distributed to applicable replicas.
Active shards
To improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation. If the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs. By default, write operations only wait for the primary shards to be active before proceeding (that is to say
wait_for_active_shards
is1
). This default can be overridden in the index settings dynamically by settingindex.write.wait_for_active_shards
. To alter this behavior per operation, use thewait_for_active_shards request
parameter.Valid values are all or any positive integer up to the total number of configured copies per shard in the index (which is
number_of_replicas
+1). Specifying a negative value or a number greater than the number of shard copies will throw an error.For example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes). If you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding. This means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data. If
wait_for_active_shards
is set on the request to3
(and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding. This requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard. However, if you setwait_for_active_shards
toall
(or to4
, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index. The operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.It is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts. After the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary. The
_shards
section of the API response reveals the number of shard copies on which replication succeeded and failed.No operation (noop) updates
When updating a document by using this API, a new version of the document is always created even if the document hasn't changed. If this isn't acceptable use the
_update
API withdetect_noop
set totrue
. Thedetect_noop
option isn't available on this API because it doesn’t fetch the old source and isn't able to compare it against the new source.There isn't a definitive rule for when noop updates aren't acceptable. It's a combination of lots of factors like how frequently your data source sends updates that are actually noops and how many queries per second Elasticsearch runs on the shard receiving the updates.
Versioning
Each indexed document is given a version number. By default, internal versioning is used that starts at 1 and increments with each update, deletes included. Optionally, the version number can be set to an external value (for example, if maintained in a database). To enable this functionality,
version_type
should be set toexternal
. The value provided must be a numeric, long value greater than or equal to 0, and less than around9.2e+18
.NOTE: Versioning is completely real time, and is not affected by the near real time aspects of search operations. If no version is provided, the operation runs without any version checks.
When using the external version type, the system checks to see if the version number passed to the index request is greater than the version of the currently stored document. If true, the document will be indexed and the new version number used. If the value provided is less than or equal to the stored document's version number, a version conflict will occur and the index operation will fail. For example:
PUT my-index-000001/_doc/1?version=2&version_type=external { "user": { "id": "elkbee" } } In this example, the operation will succeed since the supplied version of 2 is higher than the current document version of 1. If the document was already updated and its version was set to 2 or higher, the indexing command will fail and result in a conflict (409 HTTP status code). A nice side effect is that there is no need to maintain strict ordering of async indexing operations run as a result of changes to a source database, as long as version numbers from the source database are used. Even the simple case of updating the Elasticsearch index using data from a database is simplified if external versioning is used, as only the latest version will be used if the index operations arrive out of order.
- See Also:
- To add or overwrite a document using the
-
index
public final <TDocument> CompletableFuture<IndexResponse> index(Function<IndexRequest.Builder<TDocument>, ObjectBuilder<IndexRequest<TDocument>>> fn) Create or update a document in an index.Add a JSON document to the specified data stream or index and make it searchable. If the target is an index and the document already exists, the request updates the document and increments its version.
NOTE: You cannot use this API to send update requests for existing documents in a data stream.
If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:
- To add or overwrite a document using the
PUT /<target>/_doc/<_id>
request format, you must have thecreate
,index
, orwrite
index privilege. - To add a document using the
POST /<target>/_doc/
request format, you must have thecreate_doc
,create
,index
, orwrite
index privilege. - To automatically create a data stream or index with this API request, you
must have the
auto_configure
,create_index
, ormanage
index privilege.
Automatic data stream creation requires a matching index template with data stream enabled.
NOTE: Replica shards might not all be started when an indexing operation returns successfully. By default, only the primary is required. Set
wait_for_active_shards
to change this default behavior.Automatically create data streams and indices
If the request's target doesn't exist and matches an index template with a
data_stream
definition, the index operation automatically creates the data stream.If the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.
NOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.
If no mapping exists, the index operation creates a dynamic mapping. By default, new fields and objects are automatically added to the mapping if needed.
Automatic index creation is controlled by the
action.auto_create_index
setting. If it istrue
, any index can be created automatically. You can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it tofalse
to turn off automatic index creation entirely. Specify a comma-separated list of patterns you want to allow or prefix each pattern with+
or-
to indicate whether it should be allowed or blocked. When a list is specified, the default behaviour is to disallow.NOTE: The
action.auto_create_index
setting affects the automatic creation of indices only. It does not affect the creation of data streams.Optimistic concurrency control
Index operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the
if_seq_no
andif_primary_term
parameters. If a mismatch is detected, the operation will result in aVersionConflictException
and a status code of409
.Routing
By default, shard placement — or routing — is controlled by using a hash of the document's ID value. For more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the
routing
parameter.When setting up explicit mapping, you can also use the
_routing
field to direct the index operation to extract the routing value from the document itself. This does come at the (very minimal) cost of an additional document parsing pass. If the_routing
mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.NOTE: Data streams do not support custom routing unless they were created with the
allow_custom_routing
setting enabled in the template.Distributed
The index operation is directed to the primary shard based on its route and performed on the actual node containing this shard. After the primary shard completes the operation, if needed, the update is distributed to applicable replicas.
Active shards
To improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation. If the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs. By default, write operations only wait for the primary shards to be active before proceeding (that is to say
wait_for_active_shards
is1
). This default can be overridden in the index settings dynamically by settingindex.write.wait_for_active_shards
. To alter this behavior per operation, use thewait_for_active_shards request
parameter.Valid values are all or any positive integer up to the total number of configured copies per shard in the index (which is
number_of_replicas
+1). Specifying a negative value or a number greater than the number of shard copies will throw an error.For example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes). If you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding. This means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data. If
wait_for_active_shards
is set on the request to3
(and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding. This requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard. However, if you setwait_for_active_shards
toall
(or to4
, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index. The operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.It is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts. After the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary. The
_shards
section of the API response reveals the number of shard copies on which replication succeeded and failed.No operation (noop) updates
When updating a document by using this API, a new version of the document is always created even if the document hasn't changed. If this isn't acceptable use the
_update
API withdetect_noop
set totrue
. Thedetect_noop
option isn't available on this API because it doesn’t fetch the old source and isn't able to compare it against the new source.There isn't a definitive rule for when noop updates aren't acceptable. It's a combination of lots of factors like how frequently your data source sends updates that are actually noops and how many queries per second Elasticsearch runs on the shard receiving the updates.
Versioning
Each indexed document is given a version number. By default, internal versioning is used that starts at 1 and increments with each update, deletes included. Optionally, the version number can be set to an external value (for example, if maintained in a database). To enable this functionality,
version_type
should be set toexternal
. The value provided must be a numeric, long value greater than or equal to 0, and less than around9.2e+18
.NOTE: Versioning is completely real time, and is not affected by the near real time aspects of search operations. If no version is provided, the operation runs without any version checks.
When using the external version type, the system checks to see if the version number passed to the index request is greater than the version of the currently stored document. If true, the document will be indexed and the new version number used. If the value provided is less than or equal to the stored document's version number, a version conflict will occur and the index operation will fail. For example:
PUT my-index-000001/_doc/1?version=2&version_type=external { "user": { "id": "elkbee" } } In this example, the operation will succeed since the supplied version of 2 is higher than the current document version of 1. If the document was already updated and its version was set to 2 or higher, the indexing command will fail and result in a conflict (409 HTTP status code). A nice side effect is that there is no need to maintain strict ordering of async indexing operations run as a result of changes to a source database, as long as version numbers from the source database are used. Even the simple case of updating the Elasticsearch index using data from a database is simplified if external versioning is used, as only the latest version will be used if the index operations arrive out of order.
- Parameters:
fn
- a function that initializes a builder to create theIndexRequest
- See Also:
- To add or overwrite a document using the
-
info
Get cluster info. Get basic build, version, and cluster information.- See Also:
-
knnSearch
public <TDocument> CompletableFuture<KnnSearchResponse<TDocument>> knnSearch(KnnSearchRequest request, Class<TDocument> tDocumentClass) Run a knn search.NOTE: The kNN search API has been replaced by the
knn
option in the search API.Perform a k-nearest neighbor (kNN) search on a dense_vector field and return the matching documents. Given a query vector, the API finds the k closest vectors and returns those documents as search hits.
Elasticsearch uses the HNSW algorithm to support efficient kNN search. Like most kNN algorithms, HNSW is an approximate method that sacrifices result accuracy for improved search speed. This means the results returned are not always the true k closest neighbors.
The kNN search API supports restricting the search using a filter. The search will return the top k documents that also match the filter query.
A kNN search response has the exact same structure as a search API response. However, certain sections have a meaning specific to kNN search:
- The document
_score
is determined by the similarity between the query and document vector. - The
hits.total
object contains the total number of nearest neighbor candidates considered, which isnum_candidates * num_shards
. Thehits.total.relation
will always beeq
, indicating an exact value.
- See Also:
- The document
-
knnSearch
public final <TDocument> CompletableFuture<KnnSearchResponse<TDocument>> knnSearch(Function<KnnSearchRequest.Builder, ObjectBuilder<KnnSearchRequest>> fn, Class<TDocument> tDocumentClass) Run a knn search.NOTE: The kNN search API has been replaced by the
knn
option in the search API.Perform a k-nearest neighbor (kNN) search on a dense_vector field and return the matching documents. Given a query vector, the API finds the k closest vectors and returns those documents as search hits.
Elasticsearch uses the HNSW algorithm to support efficient kNN search. Like most kNN algorithms, HNSW is an approximate method that sacrifices result accuracy for improved search speed. This means the results returned are not always the true k closest neighbors.
The kNN search API supports restricting the search using a filter. The search will return the top k documents that also match the filter query.
A kNN search response has the exact same structure as a search API response. However, certain sections have a meaning specific to kNN search:
- The document
_score
is determined by the similarity between the query and document vector. - The
hits.total
object contains the total number of nearest neighbor candidates considered, which isnum_candidates * num_shards
. Thehits.total.relation
will always beeq
, indicating an exact value.
- Parameters:
fn
- a function that initializes a builder to create theKnnSearchRequest
- See Also:
- The document
-
knnSearch
Overload ofknnSearch(KnnSearchRequest, Class)
, where Class is defined as Void, meaning the documents will not be deserialized. -
knnSearch
public final CompletableFuture<KnnSearchResponse<Void>> knnSearch(Function<KnnSearchRequest.Builder, ObjectBuilder<KnnSearchRequest>> fn) Overload ofknnSearch(Function, Class)
, where Class is defined as Void, meaning the documents will not be deserialized. -
knnSearch
public <TDocument> CompletableFuture<KnnSearchResponse<TDocument>> knnSearch(KnnSearchRequest request, Type tDocumentType) Run a knn search.NOTE: The kNN search API has been replaced by the
knn
option in the search API.Perform a k-nearest neighbor (kNN) search on a dense_vector field and return the matching documents. Given a query vector, the API finds the k closest vectors and returns those documents as search hits.
Elasticsearch uses the HNSW algorithm to support efficient kNN search. Like most kNN algorithms, HNSW is an approximate method that sacrifices result accuracy for improved search speed. This means the results returned are not always the true k closest neighbors.
The kNN search API supports restricting the search using a filter. The search will return the top k documents that also match the filter query.
A kNN search response has the exact same structure as a search API response. However, certain sections have a meaning specific to kNN search:
- The document
_score
is determined by the similarity between the query and document vector. - The
hits.total
object contains the total number of nearest neighbor candidates considered, which isnum_candidates * num_shards
. Thehits.total.relation
will always beeq
, indicating an exact value.
- See Also:
- The document
-
knnSearch
public final <TDocument> CompletableFuture<KnnSearchResponse<TDocument>> knnSearch(Function<KnnSearchRequest.Builder, ObjectBuilder<KnnSearchRequest>> fn, Type tDocumentType) Run a knn search.NOTE: The kNN search API has been replaced by the
knn
option in the search API.Perform a k-nearest neighbor (kNN) search on a dense_vector field and return the matching documents. Given a query vector, the API finds the k closest vectors and returns those documents as search hits.
Elasticsearch uses the HNSW algorithm to support efficient kNN search. Like most kNN algorithms, HNSW is an approximate method that sacrifices result accuracy for improved search speed. This means the results returned are not always the true k closest neighbors.
The kNN search API supports restricting the search using a filter. The search will return the top k documents that also match the filter query.
A kNN search response has the exact same structure as a search API response. However, certain sections have a meaning specific to kNN search:
- The document
_score
is determined by the similarity between the query and document vector. - The
hits.total
object contains the total number of nearest neighbor candidates considered, which isnum_candidates * num_shards
. Thehits.total.relation
will always beeq
, indicating an exact value.
- Parameters:
fn
- a function that initializes a builder to create theKnnSearchRequest
- See Also:
- The document
-
mget
public <TDocument> CompletableFuture<MgetResponse<TDocument>> mget(MgetRequest request, Class<TDocument> tDocumentClass) Get multiple documents.Get multiple JSON documents by ID from one or more indices. If you specify an index in the request URI, you only need to specify the document IDs in the request body. To ensure fast responses, this multi get (mget) API responds with partial results if one or more shards fail.
Filter source fields
By default, the
_source
field is returned for every document (if stored). Use the_source
and_source_include
orsource_exclude
attributes to filter what fields are returned for a particular document. You can include the_source
,_source_includes
, and_source_excludes
query parameters in the request URI to specify the defaults to use when there are no per-document instructions.Get stored fields
Use the
stored_fields
attribute to specify the set of stored fields you want to retrieve. Any requested fields that are not stored are ignored. You can include thestored_fields
query parameter in the request URI to specify the defaults to use when there are no per-document instructions.- See Also:
-
mget
public final <TDocument> CompletableFuture<MgetResponse<TDocument>> mget(Function<MgetRequest.Builder, ObjectBuilder<MgetRequest>> fn, Class<TDocument> tDocumentClass) Get multiple documents.Get multiple JSON documents by ID from one or more indices. If you specify an index in the request URI, you only need to specify the document IDs in the request body. To ensure fast responses, this multi get (mget) API responds with partial results if one or more shards fail.
Filter source fields
By default, the
_source
field is returned for every document (if stored). Use the_source
and_source_include
orsource_exclude
attributes to filter what fields are returned for a particular document. You can include the_source
,_source_includes
, and_source_excludes
query parameters in the request URI to specify the defaults to use when there are no per-document instructions.Get stored fields
Use the
stored_fields
attribute to specify the set of stored fields you want to retrieve. Any requested fields that are not stored are ignored. You can include thestored_fields
query parameter in the request URI to specify the defaults to use when there are no per-document instructions.- Parameters:
fn
- a function that initializes a builder to create theMgetRequest
- See Also:
-
mget
Overload ofmget(MgetRequest, Class)
, where Class is defined as Void, meaning the documents will not be deserialized. -
mget
public final CompletableFuture<MgetResponse<Void>> mget(Function<MgetRequest.Builder, ObjectBuilder<MgetRequest>> fn) Overload ofmget(Function, Class)
, where Class is defined as Void, meaning the documents will not be deserialized. -
mget
public <TDocument> CompletableFuture<MgetResponse<TDocument>> mget(MgetRequest request, Type tDocumentType) Get multiple documents.Get multiple JSON documents by ID from one or more indices. If you specify an index in the request URI, you only need to specify the document IDs in the request body. To ensure fast responses, this multi get (mget) API responds with partial results if one or more shards fail.
Filter source fields
By default, the
_source
field is returned for every document (if stored). Use the_source
and_source_include
orsource_exclude
attributes to filter what fields are returned for a particular document. You can include the_source
,_source_includes
, and_source_excludes
query parameters in the request URI to specify the defaults to use when there are no per-document instructions.Get stored fields
Use the
stored_fields
attribute to specify the set of stored fields you want to retrieve. Any requested fields that are not stored are ignored. You can include thestored_fields
query parameter in the request URI to specify the defaults to use when there are no per-document instructions.- See Also:
-
mget
public final <TDocument> CompletableFuture<MgetResponse<TDocument>> mget(Function<MgetRequest.Builder, ObjectBuilder<MgetRequest>> fn, Type tDocumentType) Get multiple documents.Get multiple JSON documents by ID from one or more indices. If you specify an index in the request URI, you only need to specify the document IDs in the request body. To ensure fast responses, this multi get (mget) API responds with partial results if one or more shards fail.
Filter source fields
By default, the
_source
field is returned for every document (if stored). Use the_source
and_source_include
orsource_exclude
attributes to filter what fields are returned for a particular document. You can include the_source
,_source_includes
, and_source_excludes
query parameters in the request URI to specify the defaults to use when there are no per-document instructions.Get stored fields
Use the
stored_fields
attribute to specify the set of stored fields you want to retrieve. Any requested fields that are not stored are ignored. You can include thestored_fields
query parameter in the request URI to specify the defaults to use when there are no per-document instructions.- Parameters:
fn
- a function that initializes a builder to create theMgetRequest
- See Also:
-
msearch
public <TDocument> CompletableFuture<MsearchResponse<TDocument>> msearch(MsearchRequest request, Class<TDocument> tDocumentClass) Run multiple searches.The format of the request is similar to the bulk API format and makes use of the newline delimited JSON (NDJSON) format. The structure is as follows:
header\n body\n header\n body\n
This structure is specifically optimized to reduce parsing if a specific search ends up redirected to another node.
IMPORTANT: The final line of data must end with a newline character
\n
. Each newline character may be preceded by a carriage return\r
. When sending requests to this endpoint theContent-Type
header should be set toapplication/x-ndjson
.- See Also:
-
msearch
public final <TDocument> CompletableFuture<MsearchResponse<TDocument>> msearch(Function<MsearchRequest.Builder, ObjectBuilder<MsearchRequest>> fn, Class<TDocument> tDocumentClass) Run multiple searches.The format of the request is similar to the bulk API format and makes use of the newline delimited JSON (NDJSON) format. The structure is as follows:
header\n body\n header\n body\n
This structure is specifically optimized to reduce parsing if a specific search ends up redirected to another node.
IMPORTANT: The final line of data must end with a newline character
\n
. Each newline character may be preceded by a carriage return\r
. When sending requests to this endpoint theContent-Type
header should be set toapplication/x-ndjson
.- Parameters:
fn
- a function that initializes a builder to create theMsearchRequest
- See Also:
-
msearch
Overload ofmsearch(MsearchRequest, Class)
, where Class is defined as Void, meaning the documents will not be deserialized. -
msearch
public final CompletableFuture<MsearchResponse<Void>> msearch(Function<MsearchRequest.Builder, ObjectBuilder<MsearchRequest>> fn) Overload ofmsearch(Function, Class)
, where Class is defined as Void, meaning the documents will not be deserialized. -
msearch
public <TDocument> CompletableFuture<MsearchResponse<TDocument>> msearch(MsearchRequest request, Type tDocumentType) Run multiple searches.The format of the request is similar to the bulk API format and makes use of the newline delimited JSON (NDJSON) format. The structure is as follows:
header\n body\n header\n body\n
This structure is specifically optimized to reduce parsing if a specific search ends up redirected to another node.
IMPORTANT: The final line of data must end with a newline character
\n
. Each newline character may be preceded by a carriage return\r
. When sending requests to this endpoint theContent-Type
header should be set toapplication/x-ndjson
.- See Also:
-
msearch
public final <TDocument> CompletableFuture<MsearchResponse<TDocument>> msearch(Function<MsearchRequest.Builder, ObjectBuilder<MsearchRequest>> fn, Type tDocumentType) Run multiple searches.The format of the request is similar to the bulk API format and makes use of the newline delimited JSON (NDJSON) format. The structure is as follows:
header\n body\n header\n body\n
This structure is specifically optimized to reduce parsing if a specific search ends up redirected to another node.
IMPORTANT: The final line of data must end with a newline character
\n
. Each newline character may be preceded by a carriage return\r
. When sending requests to this endpoint theContent-Type
header should be set toapplication/x-ndjson
.- Parameters:
fn
- a function that initializes a builder to create theMsearchRequest
- See Also:
-
msearchTemplate
public <TDocument> CompletableFuture<MsearchTemplateResponse<TDocument>> msearchTemplate(MsearchTemplateRequest request, Class<TDocument> tDocumentClass) Run multiple templated searches.Run multiple templated searches with a single request. If you are providing a text file or text input to
curl
, use the--data-binary
flag instead of-d
to preserve newlines. For example:$ cat requests { "index": "my-index" } { "id": "my-search-template", "params": { "query_string": "hello world", "from": 0, "size": 10 }} { "index": "my-other-index" } { "id": "my-other-search-template", "params": { "query_type": "match_all" }} $ curl -H "Content-Type: application/x-ndjson" -XGET localhost:9200/_msearch/template --data-binary "@requests"; echo
- See Also:
-
msearchTemplate
public final <TDocument> CompletableFuture<MsearchTemplateResponse<TDocument>> msearchTemplate(Function<MsearchTemplateRequest.Builder, ObjectBuilder<MsearchTemplateRequest>> fn, Class<TDocument> tDocumentClass) Run multiple templated searches.Run multiple templated searches with a single request. If you are providing a text file or text input to
curl
, use the--data-binary
flag instead of-d
to preserve newlines. For example:$ cat requests { "index": "my-index" } { "id": "my-search-template", "params": { "query_string": "hello world", "from": 0, "size": 10 }} { "index": "my-other-index" } { "id": "my-other-search-template", "params": { "query_type": "match_all" }} $ curl -H "Content-Type: application/x-ndjson" -XGET localhost:9200/_msearch/template --data-binary "@requests"; echo
- Parameters:
fn
- a function that initializes a builder to create theMsearchTemplateRequest
- See Also:
-
msearchTemplate
public CompletableFuture<MsearchTemplateResponse<Void>> msearchTemplate(MsearchTemplateRequest request) Overload ofmsearchTemplate(MsearchTemplateRequest, Class)
, where Class is defined as Void, meaning the documents will not be deserialized. -
msearchTemplate
public final CompletableFuture<MsearchTemplateResponse<Void>> msearchTemplate(Function<MsearchTemplateRequest.Builder, ObjectBuilder<MsearchTemplateRequest>> fn) Overload ofmsearchTemplate(Function, Class)
, where Class is defined as Void, meaning the documents will not be deserialized. -
msearchTemplate
public <TDocument> CompletableFuture<MsearchTemplateResponse<TDocument>> msearchTemplate(MsearchTemplateRequest request, Type tDocumentType) Run multiple templated searches.Run multiple templated searches with a single request. If you are providing a text file or text input to
curl
, use the--data-binary
flag instead of-d
to preserve newlines. For example:$ cat requests { "index": "my-index" } { "id": "my-search-template", "params": { "query_string": "hello world", "from": 0, "size": 10 }} { "index": "my-other-index" } { "id": "my-other-search-template", "params": { "query_type": "match_all" }} $ curl -H "Content-Type: application/x-ndjson" -XGET localhost:9200/_msearch/template --data-binary "@requests"; echo
- See Also:
-
msearchTemplate
public final <TDocument> CompletableFuture<MsearchTemplateResponse<TDocument>> msearchTemplate(Function<MsearchTemplateRequest.Builder, ObjectBuilder<MsearchTemplateRequest>> fn, Type tDocumentType) Run multiple templated searches.Run multiple templated searches with a single request. If you are providing a text file or text input to
curl
, use the--data-binary
flag instead of-d
to preserve newlines. For example:$ cat requests { "index": "my-index" } { "id": "my-search-template", "params": { "query_string": "hello world", "from": 0, "size": 10 }} { "index": "my-other-index" } { "id": "my-other-search-template", "params": { "query_type": "match_all" }} $ curl -H "Content-Type: application/x-ndjson" -XGET localhost:9200/_msearch/template --data-binary "@requests"; echo
- Parameters:
fn
- a function that initializes a builder to create theMsearchTemplateRequest
- See Also:
-
mtermvectors
Get multiple term vectors.Get multiple term vectors with a single request. You can specify existing documents by index and ID or provide artificial documents in the body of the request. You can specify the index in the request body or request URI. The response contains a
docs
array with all the fetched termvectors. Each element has the structure provided by the termvectors API.Artificial documents
You can also use
mtermvectors
to generate term vectors for artificial documents provided in the body of the request. The mapping used is determined by the specified_index
.- See Also:
-
mtermvectors
public final CompletableFuture<MtermvectorsResponse> mtermvectors(Function<MtermvectorsRequest.Builder, ObjectBuilder<MtermvectorsRequest>> fn) Get multiple term vectors.Get multiple term vectors with a single request. You can specify existing documents by index and ID or provide artificial documents in the body of the request. You can specify the index in the request body or request URI. The response contains a
docs
array with all the fetched termvectors. Each element has the structure provided by the termvectors API.Artificial documents
You can also use
mtermvectors
to generate term vectors for artificial documents provided in the body of the request. The mapping used is determined by the specified_index
.- Parameters:
fn
- a function that initializes a builder to create theMtermvectorsRequest
- See Also:
-
mtermvectors
Get multiple term vectors.Get multiple term vectors with a single request. You can specify existing documents by index and ID or provide artificial documents in the body of the request. You can specify the index in the request body or request URI. The response contains a
docs
array with all the fetched termvectors. Each element has the structure provided by the termvectors API.Artificial documents
You can also use
mtermvectors
to generate term vectors for artificial documents provided in the body of the request. The mapping used is determined by the specified_index
.- See Also:
-
openPointInTime
Open a point in time.A search request by default runs against the most recent visible data of the target indices, which is called point in time. Elasticsearch pit (point in time) is a lightweight view into the state of the data as it existed when initiated. In some cases, it’s preferred to perform multiple search requests using the same point in time. For example, if refreshes happen between
search_after
requests, then the results of those requests might not be consistent as changes happening between searches are only visible to the more recent point in time.A point in time must be opened explicitly before being used in search requests.
A subsequent search request with the
pit
parameter must not specifyindex
,routing
, orpreference
values as these parameters are copied from the point in time.Just like regular searches, you can use
from
andsize
to page through point in time search results, up to the first 10,000 hits. If you want to retrieve more hits, use PIT withsearch_after
.IMPORTANT: The open point in time request and each subsequent search request can return different identifiers; always use the most recently received ID for the next search request.
When a PIT that contains shard failures is used in a search request, the missing are always reported in the search response as a
NoShardAvailableActionException
exception. To get rid of these exceptions, a new PIT needs to be created so that shards missing from the previous PIT can be handled, assuming they become available in the meantime.Keeping point in time alive
The
keep_alive
parameter, which is passed to a open point in time request and search request, extends the time to live of the corresponding point in time. The value does not need to be long enough to process all data — it just needs to be long enough for the next request.Normally, the background merge process optimizes the index by merging together smaller segments to create new, bigger segments. Once the smaller segments are no longer needed they are deleted. However, open point-in-times prevent the old segments from being deleted since they are still in use.
TIP: Keeping older segments alive means that more disk space and file handles are needed. Ensure that you have configured your nodes to have ample free file handles.
Additionally, if a segment contains deleted or updated documents then the point in time must keep track of whether each document in the segment was live at the time of the initial search request. Ensure that your nodes have sufficient heap space if you have many open point-in-times on an index that is subject to ongoing deletes or updates. Note that a point-in-time doesn't prevent its associated indices from being deleted. You can check how many point-in-times (that is, search contexts) are open with the nodes stats API.
- See Also:
-
openPointInTime
public final CompletableFuture<OpenPointInTimeResponse> openPointInTime(Function<OpenPointInTimeRequest.Builder, ObjectBuilder<OpenPointInTimeRequest>> fn) Open a point in time.A search request by default runs against the most recent visible data of the target indices, which is called point in time. Elasticsearch pit (point in time) is a lightweight view into the state of the data as it existed when initiated. In some cases, it’s preferred to perform multiple search requests using the same point in time. For example, if refreshes happen between
search_after
requests, then the results of those requests might not be consistent as changes happening between searches are only visible to the more recent point in time.A point in time must be opened explicitly before being used in search requests.
A subsequent search request with the
pit
parameter must not specifyindex
,routing
, orpreference
values as these parameters are copied from the point in time.Just like regular searches, you can use
from
andsize
to page through point in time search results, up to the first 10,000 hits. If you want to retrieve more hits, use PIT withsearch_after
.IMPORTANT: The open point in time request and each subsequent search request can return different identifiers; always use the most recently received ID for the next search request.
When a PIT that contains shard failures is used in a search request, the missing are always reported in the search response as a
NoShardAvailableActionException
exception. To get rid of these exceptions, a new PIT needs to be created so that shards missing from the previous PIT can be handled, assuming they become available in the meantime.Keeping point in time alive
The
keep_alive
parameter, which is passed to a open point in time request and search request, extends the time to live of the corresponding point in time. The value does not need to be long enough to process all data — it just needs to be long enough for the next request.Normally, the background merge process optimizes the index by merging together smaller segments to create new, bigger segments. Once the smaller segments are no longer needed they are deleted. However, open point-in-times prevent the old segments from being deleted since they are still in use.
TIP: Keeping older segments alive means that more disk space and file handles are needed. Ensure that you have configured your nodes to have ample free file handles.
Additionally, if a segment contains deleted or updated documents then the point in time must keep track of whether each document in the segment was live at the time of the initial search request. Ensure that your nodes have sufficient heap space if you have many open point-in-times on an index that is subject to ongoing deletes or updates. Note that a point-in-time doesn't prevent its associated indices from being deleted. You can check how many point-in-times (that is, search contexts) are open with the nodes stats API.
- Parameters:
fn
- a function that initializes a builder to create theOpenPointInTimeRequest
- See Also:
-
ping
Ping the cluster. Get information about whether the cluster is running.- See Also:
-
putScript
Create or update a script or search template. Creates or updates a stored script or search template.- See Also:
-
putScript
public final CompletableFuture<PutScriptResponse> putScript(Function<PutScriptRequest.Builder, ObjectBuilder<PutScriptRequest>> fn) Create or update a script or search template. Creates or updates a stored script or search template.- Parameters:
fn
- a function that initializes a builder to create thePutScriptRequest
- See Also:
-
rankEval
Evaluate ranked search results.Evaluate the quality of ranked search results over a set of typical search queries.
- See Also:
-
rankEval
public final CompletableFuture<RankEvalResponse> rankEval(Function<RankEvalRequest.Builder, ObjectBuilder<RankEvalRequest>> fn) Evaluate ranked search results.Evaluate the quality of ranked search results over a set of typical search queries.
- Parameters:
fn
- a function that initializes a builder to create theRankEvalRequest
- See Also:
-
reindex
Reindex documents.Copy documents from a source to a destination. You can copy all documents to the destination index or reindex a subset of the documents. The source can be any existing index, alias, or data stream. The destination must differ from the source. For example, you cannot reindex a data stream into itself.
IMPORTANT: Reindex requires
_source
to be enabled for all documents in the source. The destination should be configured as wanted before calling the reindex API. Reindex does not copy the settings from the source or its associated template. Mappings, shard counts, and replicas, for example, must be configured ahead of time.If the Elasticsearch security features are enabled, you must have the following security privileges:
- The
read
index privilege for the source data stream, index, or alias. - The
write
index privilege for the destination data stream, index, or index alias. - To automatically create a data stream or index with a reindex API
request, you must have the
auto_configure
,create_index
, ormanage
index privilege for the destination data stream, index, or alias. - If reindexing from a remote cluster, the
source.remote.user
must have themonitor
cluster privilege and theread
index privilege for the source data stream, index, or alias.
If reindexing from a remote cluster, you must explicitly allow the remote host in the
reindex.remote.whitelist
setting. Automatic data stream creation requires a matching index template with data stream enabled.The
dest
element can be configured like the index API to control optimistic concurrency control. Omittingversion_type
or setting it tointernal
causes Elasticsearch to blindly dump documents into the destination, overwriting any that happen to have the same ID.Setting
version_type
toexternal
causes Elasticsearch to preserve theversion
from the source, create any documents that are missing, and update any documents that have an older version in the destination than they do in the source.Setting
op_type
tocreate
causes the reindex API to create only missing documents in the destination. All existing documents will cause a version conflict.IMPORTANT: Because data streams are append-only, any reindex request to a destination data stream must have an
op_type
ofcreate
. A reindex can only add new documents to a destination data stream. It cannot update existing documents in a destination data stream.By default, version conflicts abort the reindex process. To continue reindexing if there are conflicts, set the
conflicts
request body property toproceed
. In this case, the response includes a count of the version conflicts that were encountered. Note that the handling of other error types is unaffected by theconflicts
property. Additionally, if you opt to count version conflicts, the operation could attempt to reindex more documents from the source thanmax_docs
until it has successfully indexedmax_docs
documents into the target or it has gone through every document in the source query.NOTE: The reindex API makes no effort to handle ID collisions. The last document written will "win" but the order isn't usually predictable so it is not a good idea to rely on this behavior. Instead, make sure that IDs are unique by using a script.
Running reindex asynchronously
If the request contains
wait_for_completion=false
, Elasticsearch performs some preflight checks, launches the request, and returns a task you can use to cancel or get the status of the task. Elasticsearch creates a record of this task as a document at_tasks/<task_id>
.Reindex from multiple sources
If you have many sources to reindex it is generally better to reindex them one at a time rather than using a glob pattern to pick up multiple sources. That way you can resume the process if there are any errors by removing the partially completed source and starting over. It also makes parallelizing the process fairly simple: split the list of sources to reindex and run each list in parallel.
For example, you can use a bash script like this:
for index in i1 i2 i3 i4 i5; do curl -HContent-Type:application/json -XPOST localhost:9200/_reindex?pretty -d'{ "source": { "index": "'$index'" }, "dest": { "index": "'$index'-reindexed" } }' done
Throttling
Set
requests_per_second
to any positive decimal number (1.4
,6
,1000
, for example) to throttle the rate at which reindex issues batches of index operations. Requests are throttled by padding each batch with a wait time. To turn off throttling, setrequests_per_second
to-1
.The throttling is done by waiting between batches so that the scroll that reindex uses internally can be given a timeout that takes into account the padding. The padding time is the difference between the batch size divided by the
requests_per_second
and the time spent writing. By default the batch size is1000
, so ifrequests_per_second
is set to500
:target_time = 1000 / 500 per second = 2 seconds wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
Since the batch is issued as a single bulk request, large batch sizes cause Elasticsearch to create many requests and then wait for a while before starting the next set. This is "bursty" instead of "smooth".
Slicing
Reindex supports sliced scroll to parallelize the reindexing process. This parallelization can improve efficiency and provide a convenient way to break the request down into smaller parts.
NOTE: Reindexing from remote clusters does not support manual or automatic slicing.
You can slice a reindex request manually by providing a slice ID and total number of slices to each request. You can also let reindex automatically parallelize by using sliced scroll to slice on
_id
. Theslices
parameter specifies the number of slices to use.Adding
slices
to the reindex request just automates the manual process, creating sub-requests which means it has some quirks:- You can see these requests in the tasks API. These sub-requests are "child" tasks of the task for the request with slices.
- Fetching the status of the task for the request with
slices
only contains the status of completed slices. - These sub-requests are individually addressable for things like cancellation and rethrottling.
- Rethrottling the request with
slices
will rethrottle the unfinished sub-request proportionally. - Canceling the request with
slices
will cancel each sub-request. - Due to the nature of
slices
, each sub-request won't get a perfectly even portion of the documents. All documents will be addressed, but some slices may be larger than others. Expect larger slices to have a more even distribution. - Parameters like
requests_per_second
andmax_docs
on a request withslices
are distributed proportionally to each sub-request. Combine that with the previous point about distribution being uneven and you should conclude that usingmax_docs
withslices
might not result in exactlymax_docs
documents being reindexed. - Each sub-request gets a slightly different snapshot of the source, though these are all taken at approximately the same time.
If slicing automatically, setting
slices
toauto
will choose a reasonable number for most indices. If slicing manually or otherwise tuning automatic slicing, use the following guidelines.Query performance is most efficient when the number of slices is equal to the number of shards in the index. If that number is large (for example,
500
), choose a lower number as too many slices will hurt performance. Setting slices higher than the number of shards generally does not improve efficiency and adds overhead.Indexing performance scales linearly across available resources with the number of slices.
Whether query or indexing performance dominates the runtime depends on the documents being reindexed and cluster resources.
Modify documents during reindexing
Like
_update_by_query
, reindex operations support a script that modifies the document. Unlike_update_by_query
, the script is allowed to modify the document's metadata.Just as in
_update_by_query
, you can setctx.op
to change the operation that is run on the destination. For example, setctx.op
tonoop
if your script decides that the document doesn’t have to be indexed in the destination. This "no operation" will be reported in thenoop
counter in the response body. Setctx.op
todelete
if your script decides that the document must be deleted from the destination. The deletion will be reported in thedeleted
counter in the response body. Settingctx.op
to anything else will return an error, as will setting any other field inctx
.Think of the possibilities! Just be careful; you are able to change:
_id
_index
_version
_routing
Setting
_version
tonull
or clearing it from thectx
map is just like not sending the version in an indexing request. It will cause the document to be overwritten in the destination regardless of the version on the target or the version type you use in the reindex API.Reindex from remote
Reindex supports reindexing from a remote Elasticsearch cluster. The
host
parameter must contain a scheme, host, port, and optional path. Theusername
andpassword
parameters are optional and when they are present the reindex operation will connect to the remote Elasticsearch node using basic authentication. Be sure to use HTTPS when using basic authentication or the password will be sent in plain text. There are a range of settings available to configure the behavior of the HTTPS connection.When using Elastic Cloud, it is also possible to authenticate against the remote cluster through the use of a valid API key. Remote hosts must be explicitly allowed with the
reindex.remote.whitelist
setting. It can be set to a comma delimited list of allowed remote host and port combinations. Scheme is ignored; only the host and port are used. For example:reindex.remote.whitelist: [otherhost:9200, another:9200, 127.0.10.*:9200, localhost:*"]
The list of allowed hosts must be configured on any nodes that will coordinate the reindex. This feature should work with remote clusters of any version of Elasticsearch. This should enable you to upgrade from any version of Elasticsearch to the current version by reindexing from a cluster of the old version.
WARNING: Elasticsearch does not support forward compatibility across major versions. For example, you cannot reindex from a 7.x cluster into a 6.x cluster.
To enable queries sent to older versions of Elasticsearch, the
query
parameter is sent directly to the remote host without validation or modification.NOTE: Reindexing from remote clusters does not support manual or automatic slicing.
Reindexing from a remote server uses an on-heap buffer that defaults to a maximum size of 100mb. If the remote index includes very large documents you'll need to use a smaller batch size. It is also possible to set the socket read timeout on the remote connection with the
socket_timeout
field and the connection timeout with theconnect_timeout
field. Both default to 30 seconds.Configuring SSL parameters
Reindex from remote supports configurable SSL settings. These must be specified in the
elasticsearch.yml
file, with the exception of the secure settings, which you add in the Elasticsearch keystore. It is not possible to configure SSL in the body of the reindex request.- See Also:
- The
-
reindex
public final CompletableFuture<ReindexResponse> reindex(Function<ReindexRequest.Builder, ObjectBuilder<ReindexRequest>> fn) Reindex documents.Copy documents from a source to a destination. You can copy all documents to the destination index or reindex a subset of the documents. The source can be any existing index, alias, or data stream. The destination must differ from the source. For example, you cannot reindex a data stream into itself.
IMPORTANT: Reindex requires
_source
to be enabled for all documents in the source. The destination should be configured as wanted before calling the reindex API. Reindex does not copy the settings from the source or its associated template. Mappings, shard counts, and replicas, for example, must be configured ahead of time.If the Elasticsearch security features are enabled, you must have the following security privileges:
- The
read
index privilege for the source data stream, index, or alias. - The
write
index privilege for the destination data stream, index, or index alias. - To automatically create a data stream or index with a reindex API
request, you must have the
auto_configure
,create_index
, ormanage
index privilege for the destination data stream, index, or alias. - If reindexing from a remote cluster, the
source.remote.user
must have themonitor
cluster privilege and theread
index privilege for the source data stream, index, or alias.
If reindexing from a remote cluster, you must explicitly allow the remote host in the
reindex.remote.whitelist
setting. Automatic data stream creation requires a matching index template with data stream enabled.The
dest
element can be configured like the index API to control optimistic concurrency control. Omittingversion_type
or setting it tointernal
causes Elasticsearch to blindly dump documents into the destination, overwriting any that happen to have the same ID.Setting
version_type
toexternal
causes Elasticsearch to preserve theversion
from the source, create any documents that are missing, and update any documents that have an older version in the destination than they do in the source.Setting
op_type
tocreate
causes the reindex API to create only missing documents in the destination. All existing documents will cause a version conflict.IMPORTANT: Because data streams are append-only, any reindex request to a destination data stream must have an
op_type
ofcreate
. A reindex can only add new documents to a destination data stream. It cannot update existing documents in a destination data stream.By default, version conflicts abort the reindex process. To continue reindexing if there are conflicts, set the
conflicts
request body property toproceed
. In this case, the response includes a count of the version conflicts that were encountered. Note that the handling of other error types is unaffected by theconflicts
property. Additionally, if you opt to count version conflicts, the operation could attempt to reindex more documents from the source thanmax_docs
until it has successfully indexedmax_docs
documents into the target or it has gone through every document in the source query.NOTE: The reindex API makes no effort to handle ID collisions. The last document written will "win" but the order isn't usually predictable so it is not a good idea to rely on this behavior. Instead, make sure that IDs are unique by using a script.
Running reindex asynchronously
If the request contains
wait_for_completion=false
, Elasticsearch performs some preflight checks, launches the request, and returns a task you can use to cancel or get the status of the task. Elasticsearch creates a record of this task as a document at_tasks/<task_id>
.Reindex from multiple sources
If you have many sources to reindex it is generally better to reindex them one at a time rather than using a glob pattern to pick up multiple sources. That way you can resume the process if there are any errors by removing the partially completed source and starting over. It also makes parallelizing the process fairly simple: split the list of sources to reindex and run each list in parallel.
For example, you can use a bash script like this:
for index in i1 i2 i3 i4 i5; do curl -HContent-Type:application/json -XPOST localhost:9200/_reindex?pretty -d'{ "source": { "index": "'$index'" }, "dest": { "index": "'$index'-reindexed" } }' done
Throttling
Set
requests_per_second
to any positive decimal number (1.4
,6
,1000
, for example) to throttle the rate at which reindex issues batches of index operations. Requests are throttled by padding each batch with a wait time. To turn off throttling, setrequests_per_second
to-1
.The throttling is done by waiting between batches so that the scroll that reindex uses internally can be given a timeout that takes into account the padding. The padding time is the difference between the batch size divided by the
requests_per_second
and the time spent writing. By default the batch size is1000
, so ifrequests_per_second
is set to500
:target_time = 1000 / 500 per second = 2 seconds wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
Since the batch is issued as a single bulk request, large batch sizes cause Elasticsearch to create many requests and then wait for a while before starting the next set. This is "bursty" instead of "smooth".
Slicing
Reindex supports sliced scroll to parallelize the reindexing process. This parallelization can improve efficiency and provide a convenient way to break the request down into smaller parts.
NOTE: Reindexing from remote clusters does not support manual or automatic slicing.
You can slice a reindex request manually by providing a slice ID and total number of slices to each request. You can also let reindex automatically parallelize by using sliced scroll to slice on
_id
. Theslices
parameter specifies the number of slices to use.Adding
slices
to the reindex request just automates the manual process, creating sub-requests which means it has some quirks:- You can see these requests in the tasks API. These sub-requests are "child" tasks of the task for the request with slices.
- Fetching the status of the task for the request with
slices
only contains the status of completed slices. - These sub-requests are individually addressable for things like cancellation and rethrottling.
- Rethrottling the request with
slices
will rethrottle the unfinished sub-request proportionally. - Canceling the request with
slices
will cancel each sub-request. - Due to the nature of
slices
, each sub-request won't get a perfectly even portion of the documents. All documents will be addressed, but some slices may be larger than others. Expect larger slices to have a more even distribution. - Parameters like
requests_per_second
andmax_docs
on a request withslices
are distributed proportionally to each sub-request. Combine that with the previous point about distribution being uneven and you should conclude that usingmax_docs
withslices
might not result in exactlymax_docs
documents being reindexed. - Each sub-request gets a slightly different snapshot of the source, though these are all taken at approximately the same time.
If slicing automatically, setting
slices
toauto
will choose a reasonable number for most indices. If slicing manually or otherwise tuning automatic slicing, use the following guidelines.Query performance is most efficient when the number of slices is equal to the number of shards in the index. If that number is large (for example,
500
), choose a lower number as too many slices will hurt performance. Setting slices higher than the number of shards generally does not improve efficiency and adds overhead.Indexing performance scales linearly across available resources with the number of slices.
Whether query or indexing performance dominates the runtime depends on the documents being reindexed and cluster resources.
Modify documents during reindexing
Like
_update_by_query
, reindex operations support a script that modifies the document. Unlike_update_by_query
, the script is allowed to modify the document's metadata.Just as in
_update_by_query
, you can setctx.op
to change the operation that is run on the destination. For example, setctx.op
tonoop
if your script decides that the document doesn’t have to be indexed in the destination. This "no operation" will be reported in thenoop
counter in the response body. Setctx.op
todelete
if your script decides that the document must be deleted from the destination. The deletion will be reported in thedeleted
counter in the response body. Settingctx.op
to anything else will return an error, as will setting any other field inctx
.Think of the possibilities! Just be careful; you are able to change:
_id
_index
_version
_routing
Setting
_version
tonull
or clearing it from thectx
map is just like not sending the version in an indexing request. It will cause the document to be overwritten in the destination regardless of the version on the target or the version type you use in the reindex API.Reindex from remote
Reindex supports reindexing from a remote Elasticsearch cluster. The
host
parameter must contain a scheme, host, port, and optional path. Theusername
andpassword
parameters are optional and when they are present the reindex operation will connect to the remote Elasticsearch node using basic authentication. Be sure to use HTTPS when using basic authentication or the password will be sent in plain text. There are a range of settings available to configure the behavior of the HTTPS connection.When using Elastic Cloud, it is also possible to authenticate against the remote cluster through the use of a valid API key. Remote hosts must be explicitly allowed with the
reindex.remote.whitelist
setting. It can be set to a comma delimited list of allowed remote host and port combinations. Scheme is ignored; only the host and port are used. For example:reindex.remote.whitelist: [otherhost:9200, another:9200, 127.0.10.*:9200, localhost:*"]
The list of allowed hosts must be configured on any nodes that will coordinate the reindex. This feature should work with remote clusters of any version of Elasticsearch. This should enable you to upgrade from any version of Elasticsearch to the current version by reindexing from a cluster of the old version.
WARNING: Elasticsearch does not support forward compatibility across major versions. For example, you cannot reindex from a 7.x cluster into a 6.x cluster.
To enable queries sent to older versions of Elasticsearch, the
query
parameter is sent directly to the remote host without validation or modification.NOTE: Reindexing from remote clusters does not support manual or automatic slicing.
Reindexing from a remote server uses an on-heap buffer that defaults to a maximum size of 100mb. If the remote index includes very large documents you'll need to use a smaller batch size. It is also possible to set the socket read timeout on the remote connection with the
socket_timeout
field and the connection timeout with theconnect_timeout
field. Both default to 30 seconds.Configuring SSL parameters
Reindex from remote supports configurable SSL settings. These must be specified in the
elasticsearch.yml
file, with the exception of the secure settings, which you add in the Elasticsearch keystore. It is not possible to configure SSL in the body of the reindex request.- Parameters:
fn
- a function that initializes a builder to create theReindexRequest
- See Also:
- The
-
reindexRethrottle
public CompletableFuture<ReindexRethrottleResponse> reindexRethrottle(ReindexRethrottleRequest request) Throttle a reindex operation.Change the number of requests per second for a particular reindex operation. For example:
POST _reindex/r1A2WoRbTwKZ516z6NEs5A:36619/_rethrottle?requests_per_second=-1
Rethrottling that speeds up the query takes effect immediately. Rethrottling that slows down the query will take effect after completing the current batch. This behavior prevents scroll timeouts.
- See Also:
-
reindexRethrottle
public final CompletableFuture<ReindexRethrottleResponse> reindexRethrottle(Function<ReindexRethrottleRequest.Builder, ObjectBuilder<ReindexRethrottleRequest>> fn) Throttle a reindex operation.Change the number of requests per second for a particular reindex operation. For example:
POST _reindex/r1A2WoRbTwKZ516z6NEs5A:36619/_rethrottle?requests_per_second=-1
Rethrottling that speeds up the query takes effect immediately. Rethrottling that slows down the query will take effect after completing the current batch. This behavior prevents scroll timeouts.
- Parameters:
fn
- a function that initializes a builder to create theReindexRethrottleRequest
- See Also:
-
renderSearchTemplate
public CompletableFuture<RenderSearchTemplateResponse> renderSearchTemplate(RenderSearchTemplateRequest request) Render a search template.Render a search template as a search request body.
- See Also:
-
renderSearchTemplate
public final CompletableFuture<RenderSearchTemplateResponse> renderSearchTemplate(Function<RenderSearchTemplateRequest.Builder, ObjectBuilder<RenderSearchTemplateRequest>> fn) Render a search template.Render a search template as a search request body.
- Parameters:
fn
- a function that initializes a builder to create theRenderSearchTemplateRequest
- See Also:
-
renderSearchTemplate
Render a search template.Render a search template as a search request body.
- See Also:
-
scriptsPainlessExecute
public <TResult> CompletableFuture<ScriptsPainlessExecuteResponse<TResult>> scriptsPainlessExecute(ScriptsPainlessExecuteRequest request, Class<TResult> tResultClass) Run a script.Runs a script and returns a result. Use this API to build and test scripts, such as when defining a script for a runtime field. This API requires very few dependencies and is especially useful if you don't have permissions to write documents on a cluster.
The API uses several contexts, which control how scripts are run, what variables are available at runtime, and what the return type is.
Each context requires a script, but additional parameters depend on the context you're using for that script.
- See Also:
-
scriptsPainlessExecute
public final <TResult> CompletableFuture<ScriptsPainlessExecuteResponse<TResult>> scriptsPainlessExecute(Function<ScriptsPainlessExecuteRequest.Builder, ObjectBuilder<ScriptsPainlessExecuteRequest>> fn, Class<TResult> tResultClass) Run a script.Runs a script and returns a result. Use this API to build and test scripts, such as when defining a script for a runtime field. This API requires very few dependencies and is especially useful if you don't have permissions to write documents on a cluster.
The API uses several contexts, which control how scripts are run, what variables are available at runtime, and what the return type is.
Each context requires a script, but additional parameters depend on the context you're using for that script.
- Parameters:
fn
- a function that initializes a builder to create theScriptsPainlessExecuteRequest
- See Also:
-
scriptsPainlessExecute
public CompletableFuture<ScriptsPainlessExecuteResponse<Void>> scriptsPainlessExecute(ScriptsPainlessExecuteRequest request) Overload ofscriptsPainlessExecute(ScriptsPainlessExecuteRequest, Class)
, where Class is defined as Void, meaning the documents will not be deserialized. -
scriptsPainlessExecute
public final CompletableFuture<ScriptsPainlessExecuteResponse<Void>> scriptsPainlessExecute(Function<ScriptsPainlessExecuteRequest.Builder, ObjectBuilder<ScriptsPainlessExecuteRequest>> fn) Overload ofscriptsPainlessExecute(Function, Class)
, where Class is defined as Void, meaning the documents will not be deserialized. -
scriptsPainlessExecute
public <TResult> CompletableFuture<ScriptsPainlessExecuteResponse<TResult>> scriptsPainlessExecute(ScriptsPainlessExecuteRequest request, Type tResultType) Run a script.Runs a script and returns a result. Use this API to build and test scripts, such as when defining a script for a runtime field. This API requires very few dependencies and is especially useful if you don't have permissions to write documents on a cluster.
The API uses several contexts, which control how scripts are run, what variables are available at runtime, and what the return type is.
Each context requires a script, but additional parameters depend on the context you're using for that script.
- See Also:
-
scriptsPainlessExecute
public final <TResult> CompletableFuture<ScriptsPainlessExecuteResponse<TResult>> scriptsPainlessExecute(Function<ScriptsPainlessExecuteRequest.Builder, ObjectBuilder<ScriptsPainlessExecuteRequest>> fn, Type tResultType) Run a script.Runs a script and returns a result. Use this API to build and test scripts, such as when defining a script for a runtime field. This API requires very few dependencies and is especially useful if you don't have permissions to write documents on a cluster.
The API uses several contexts, which control how scripts are run, what variables are available at runtime, and what the return type is.
Each context requires a script, but additional parameters depend on the context you're using for that script.
- Parameters:
fn
- a function that initializes a builder to create theScriptsPainlessExecuteRequest
- See Also:
-
scroll
public <TDocument> CompletableFuture<ScrollResponse<TDocument>> scroll(ScrollRequest request, Class<TDocument> tDocumentClass) Run a scrolling search.IMPORTANT: The scroll API is no longer recommend for deep pagination. If you need to preserve the index state while paging through more than 10,000 hits, use the
search_after
parameter with a point in time (PIT).The scroll API gets large sets of results from a single scrolling search request. To get the necessary scroll ID, submit a search API request that includes an argument for the
scroll
query parameter. Thescroll
parameter indicates how long Elasticsearch should retain the search context for the request. The search response returns a scroll ID in the_scroll_id
response body parameter. You can then use the scroll ID with the scroll API to retrieve the next batch of results for the request. If the Elasticsearch security features are enabled, the access to the results of a specific scroll ID is restricted to the user or API key that submitted the search.You can also use the scroll API to specify a new scroll parameter that extends or shortens the retention period for the search context.
IMPORTANT: Results from a scrolling search reflect the state of the index at the time of the initial search request. Subsequent indexing or document changes only affect later search and scroll requests.
- See Also:
-
scroll
public final <TDocument> CompletableFuture<ScrollResponse<TDocument>> scroll(Function<ScrollRequest.Builder, ObjectBuilder<ScrollRequest>> fn, Class<TDocument> tDocumentClass) Run a scrolling search.IMPORTANT: The scroll API is no longer recommend for deep pagination. If you need to preserve the index state while paging through more than 10,000 hits, use the
search_after
parameter with a point in time (PIT).The scroll API gets large sets of results from a single scrolling search request. To get the necessary scroll ID, submit a search API request that includes an argument for the
scroll
query parameter. Thescroll
parameter indicates how long Elasticsearch should retain the search context for the request. The search response returns a scroll ID in the_scroll_id
response body parameter. You can then use the scroll ID with the scroll API to retrieve the next batch of results for the request. If the Elasticsearch security features are enabled, the access to the results of a specific scroll ID is restricted to the user or API key that submitted the search.You can also use the scroll API to specify a new scroll parameter that extends or shortens the retention period for the search context.
IMPORTANT: Results from a scrolling search reflect the state of the index at the time of the initial search request. Subsequent indexing or document changes only affect later search and scroll requests.
- Parameters:
fn
- a function that initializes a builder to create theScrollRequest
- See Also:
-
scroll
Overload ofscroll(ScrollRequest, Class)
, where Class is defined as Void, meaning the documents will not be deserialized. -
scroll
public final CompletableFuture<ScrollResponse<Void>> scroll(Function<ScrollRequest.Builder, ObjectBuilder<ScrollRequest>> fn) Overload ofscroll(Function, Class)
, where Class is defined as Void, meaning the documents will not be deserialized. -
scroll
public <TDocument> CompletableFuture<ScrollResponse<TDocument>> scroll(ScrollRequest request, Type tDocumentType) Run a scrolling search.IMPORTANT: The scroll API is no longer recommend for deep pagination. If you need to preserve the index state while paging through more than 10,000 hits, use the
search_after
parameter with a point in time (PIT).The scroll API gets large sets of results from a single scrolling search request. To get the necessary scroll ID, submit a search API request that includes an argument for the
scroll
query parameter. Thescroll
parameter indicates how long Elasticsearch should retain the search context for the request. The search response returns a scroll ID in the_scroll_id
response body parameter. You can then use the scroll ID with the scroll API to retrieve the next batch of results for the request. If the Elasticsearch security features are enabled, the access to the results of a specific scroll ID is restricted to the user or API key that submitted the search.You can also use the scroll API to specify a new scroll parameter that extends or shortens the retention period for the search context.
IMPORTANT: Results from a scrolling search reflect the state of the index at the time of the initial search request. Subsequent indexing or document changes only affect later search and scroll requests.
- See Also:
-
scroll
public final <TDocument> CompletableFuture<ScrollResponse<TDocument>> scroll(Function<ScrollRequest.Builder, ObjectBuilder<ScrollRequest>> fn, Type tDocumentType) Run a scrolling search.IMPORTANT: The scroll API is no longer recommend for deep pagination. If you need to preserve the index state while paging through more than 10,000 hits, use the
search_after
parameter with a point in time (PIT).The scroll API gets large sets of results from a single scrolling search request. To get the necessary scroll ID, submit a search API request that includes an argument for the
scroll
query parameter. Thescroll
parameter indicates how long Elasticsearch should retain the search context for the request. The search response returns a scroll ID in the_scroll_id
response body parameter. You can then use the scroll ID with the scroll API to retrieve the next batch of results for the request. If the Elasticsearch security features are enabled, the access to the results of a specific scroll ID is restricted to the user or API key that submitted the search.You can also use the scroll API to specify a new scroll parameter that extends or shortens the retention period for the search context.
IMPORTANT: Results from a scrolling search reflect the state of the index at the time of the initial search request. Subsequent indexing or document changes only affect later search and scroll requests.
- Parameters:
fn
- a function that initializes a builder to create theScrollRequest
- See Also:
-
search
public <TDocument> CompletableFuture<SearchResponse<TDocument>> search(SearchRequest request, Class<TDocument> tDocumentClass) Run a search.Get search hits that match the query defined in the request. You can provide search queries using the
q
query string parameter or the request body. If both are specified, only the query parameter is used.If the Elasticsearch security features are enabled, you must have the read index privilege for the target data stream, index, or alias. For cross-cluster search, refer to the documentation about configuring CCS privileges. To search a point in time (PIT) for an alias, you must have the
read
index privilege for the alias's data streams or indices.Search slicing
When paging through a large number of documents, it can be helpful to split the search into multiple slices to consume them independently with the
slice
andpit
properties. By default the splitting is done first on the shards, then locally on each shard. The local splitting partitions the shard into contiguous ranges based on Lucene document IDs.For instance if the number of shards is equal to 2 and you request 4 slices, the slices 0 and 2 are assigned to the first shard and the slices 1 and 3 are assigned to the second shard.
IMPORTANT: The same point-in-time ID should be used for all slices. If different PIT IDs are used, slices can overlap and miss documents. This situation can occur because the splitting criterion is based on Lucene document IDs, which are not stable across changes to the index.
- See Also:
-
search
public final <TDocument> CompletableFuture<SearchResponse<TDocument>> search(Function<SearchRequest.Builder, ObjectBuilder<SearchRequest>> fn, Class<TDocument> tDocumentClass) Run a search.Get search hits that match the query defined in the request. You can provide search queries using the
q
query string parameter or the request body. If both are specified, only the query parameter is used.If the Elasticsearch security features are enabled, you must have the read index privilege for the target data stream, index, or alias. For cross-cluster search, refer to the documentation about configuring CCS privileges. To search a point in time (PIT) for an alias, you must have the
read
index privilege for the alias's data streams or indices.Search slicing
When paging through a large number of documents, it can be helpful to split the search into multiple slices to consume them independently with the
slice
andpit
properties. By default the splitting is done first on the shards, then locally on each shard. The local splitting partitions the shard into contiguous ranges based on Lucene document IDs.For instance if the number of shards is equal to 2 and you request 4 slices, the slices 0 and 2 are assigned to the first shard and the slices 1 and 3 are assigned to the second shard.
IMPORTANT: The same point-in-time ID should be used for all slices. If different PIT IDs are used, slices can overlap and miss documents. This situation can occur because the splitting criterion is based on Lucene document IDs, which are not stable across changes to the index.
- Parameters:
fn
- a function that initializes a builder to create theSearchRequest
- See Also:
-
search
Overload ofsearch(SearchRequest, Class)
, where Class is defined as Void, meaning the documents will not be deserialized. -
search
public final CompletableFuture<SearchResponse<Void>> search(Function<SearchRequest.Builder, ObjectBuilder<SearchRequest>> fn) Overload ofsearch(Function, Class)
, where Class is defined as Void, meaning the documents will not be deserialized. -
search
public <TDocument> CompletableFuture<SearchResponse<TDocument>> search(SearchRequest request, Type tDocumentType) Run a search.Get search hits that match the query defined in the request. You can provide search queries using the
q
query string parameter or the request body. If both are specified, only the query parameter is used.If the Elasticsearch security features are enabled, you must have the read index privilege for the target data stream, index, or alias. For cross-cluster search, refer to the documentation about configuring CCS privileges. To search a point in time (PIT) for an alias, you must have the
read
index privilege for the alias's data streams or indices.Search slicing
When paging through a large number of documents, it can be helpful to split the search into multiple slices to consume them independently with the
slice
andpit
properties. By default the splitting is done first on the shards, then locally on each shard. The local splitting partitions the shard into contiguous ranges based on Lucene document IDs.For instance if the number of shards is equal to 2 and you request 4 slices, the slices 0 and 2 are assigned to the first shard and the slices 1 and 3 are assigned to the second shard.
IMPORTANT: The same point-in-time ID should be used for all slices. If different PIT IDs are used, slices can overlap and miss documents. This situation can occur because the splitting criterion is based on Lucene document IDs, which are not stable across changes to the index.
- See Also:
-
search
public final <TDocument> CompletableFuture<SearchResponse<TDocument>> search(Function<SearchRequest.Builder, ObjectBuilder<SearchRequest>> fn, Type tDocumentType) Run a search.Get search hits that match the query defined in the request. You can provide search queries using the
q
query string parameter or the request body. If both are specified, only the query parameter is used.If the Elasticsearch security features are enabled, you must have the read index privilege for the target data stream, index, or alias. For cross-cluster search, refer to the documentation about configuring CCS privileges. To search a point in time (PIT) for an alias, you must have the
read
index privilege for the alias's data streams or indices.Search slicing
When paging through a large number of documents, it can be helpful to split the search into multiple slices to consume them independently with the
slice
andpit
properties. By default the splitting is done first on the shards, then locally on each shard. The local splitting partitions the shard into contiguous ranges based on Lucene document IDs.For instance if the number of shards is equal to 2 and you request 4 slices, the slices 0 and 2 are assigned to the first shard and the slices 1 and 3 are assigned to the second shard.
IMPORTANT: The same point-in-time ID should be used for all slices. If different PIT IDs are used, slices can overlap and miss documents. This situation can occur because the splitting criterion is based on Lucene document IDs, which are not stable across changes to the index.
- Parameters:
fn
- a function that initializes a builder to create theSearchRequest
- See Also:
-
searchMvt
Search a vector tile.Search a vector tile for geospatial values. Before using this API, you should be familiar with the Mapbox vector tile specification. The API returns results as a binary mapbox vector tile.
Internally, Elasticsearch translates a vector tile search API request into a search containing:
- A
geo_bounding_box
query on the<field>
. The query uses the<zoom>/<x>/<y>
tile as a bounding box. - A
geotile_grid
orgeohex_grid
aggregation on the<field>
. Thegrid_agg
parameter determines the aggregation type. The aggregation uses the<zoom>/<x>/<y>
tile as a bounding box. - Optionally, a
geo_bounds
aggregation on the<field>
. The search only includes this aggregation if theexact_bounds
parameter istrue
. - If the optional parameter
with_labels
istrue
, the internal search will include a dynamic runtime field that calls thegetLabelPosition
function of the geometry doc value. This enables the generation of new point features containing suggested geometry labels, so that, for example, multi-polygons will have only one label.
For example, Elasticsearch may translate a vector tile search API request with a
grid_agg
argument ofgeotile
and anexact_bounds
argument oftrue
into the following searchGET my-index/_search { "size": 10000, "query": { "geo_bounding_box": { "my-geo-field": { "top_left": { "lat": -40.979898069620134, "lon": -45 }, "bottom_right": { "lat": -66.51326044311186, "lon": 0 } } } }, "aggregations": { "grid": { "geotile_grid": { "field": "my-geo-field", "precision": 11, "size": 65536, "bounds": { "top_left": { "lat": -40.979898069620134, "lon": -45 }, "bottom_right": { "lat": -66.51326044311186, "lon": 0 } } } }, "bounds": { "geo_bounds": { "field": "my-geo-field", "wrap_longitude": false } } } }
The API returns results as a binary Mapbox vector tile. Mapbox vector tiles are encoded as Google Protobufs (PBF). By default, the tile contains three layers:
- A
hits
layer containing a feature for each<field>
value matching thegeo_bounding_box
query. - An
aggs
layer containing a feature for each cell of thegeotile_grid
orgeohex_grid
. The layer only contains features for cells with matching data. - A meta layer containing:
- A feature containing a bounding box. By default, this is the bounding box of the tile.
- Value ranges for any sub-aggregations on the
geotile_grid
orgeohex_grid
. - Metadata for the search.
The API only returns features that can display at its zoom level. For example, if a polygon feature has no area at its zoom level, the API omits it. The API returns errors as UTF-8 encoded JSON.
IMPORTANT: You can specify several options for this API as either a query parameter or request body parameter. If you specify both parameters, the query parameter takes precedence.
Grid precision for geotile
For a
grid_agg
ofgeotile
, you can use cells in theaggs
layer as tiles for lower zoom levels.grid_precision
represents the additional zoom levels available through these cells. The final precision is computed by as follows:<zoom> + grid_precision
. For example, if<zoom>
is 7 andgrid_precision
is 8, then thegeotile_grid
aggregation will use a precision of 15. The maximum final precision is 29. Thegrid_precision
also determines the number of cells for the grid as follows:(2^grid_precision) x (2^grid_precision)
. For example, a value of 8 divides the tile into a grid of 256 x 256 cells. Theaggs
layer only contains features for cells with matching data.Grid precision for geohex
For a
grid_agg
ofgeohex
, Elasticsearch uses<zoom>
andgrid_precision
to calculate a final precision as follows:<zoom> + grid_precision
.This precision determines the H3 resolution of the hexagonal cells produced by the
geohex
aggregation. The following table maps the H3 resolution for each precision. For example, if<zoom>
is 3 andgrid_precision
is 3, the precision is 6. At a precision of 6, hexagonal cells have an H3 resolution of 2. If<zoom>
is 3 andgrid_precision
is 4, the precision is 7. At a precision of 7, hexagonal cells have an H3 resolution of 3.Precision Unique tile bins H3 resolution Unique hex bins Ratio 1 4 0 122 30.5 2 16 0 122 7.625 3 64 1 842 13.15625 4 256 1 842 3.2890625 5 1024 2 5882 5.744140625 6 4096 2 5882 1.436035156 7 16384 3 41162 2.512329102 8 65536 3 41162 0.6280822754 9 262144 4 288122 1.099098206 10 1048576 4 288122 0.2747745514 11 4194304 5 2016842 0.4808526039 12 16777216 6 14117882 0.8414913416 13 67108864 6 14117882 0.2103728354 14 268435456 7 98825162 0.3681524172 15 1073741824 8 691776122 0.644266719 16 4294967296 8 691776122 0.1610666797 17 17179869184 9 4842432842 0.2818666889 18 68719476736 10 33897029882 0.4932667053 19 274877906944 11 237279209162 0.8632167343 20 1099511627776 11 237279209162 0.2158041836 21 4398046511104 12 1660954464122 0.3776573213 22 17592186044416 13 11626681248842 0.6609003122 23 70368744177664 13 11626681248842 0.165225078 24 281474976710656 14 81386768741882 0.2891438866 25 1125899906842620 15 569707381193162 0.5060018015 26 4503599627370500 15 569707381193162 0.1265004504 27 18014398509482000 15 569707381193162 0.03162511259 28 72057594037927900 15 569707381193162 0.007906278149 29 288230376151712000 15 569707381193162 0.001976569537 Hexagonal cells don't align perfectly on a vector tile. Some cells may intersect more than one vector tile. To compute the H3 resolution for each precision, Elasticsearch compares the average density of hexagonal bins at each resolution with the average density of tile bins at each zoom level. Elasticsearch uses the H3 resolution that is closest to the corresponding geotile density.
- See Also:
- A
-
searchMvt
public final CompletableFuture<BinaryResponse> searchMvt(Function<SearchMvtRequest.Builder, ObjectBuilder<SearchMvtRequest>> fn) Search a vector tile.Search a vector tile for geospatial values. Before using this API, you should be familiar with the Mapbox vector tile specification. The API returns results as a binary mapbox vector tile.
Internally, Elasticsearch translates a vector tile search API request into a search containing:
- A
geo_bounding_box
query on the<field>
. The query uses the<zoom>/<x>/<y>
tile as a bounding box. - A
geotile_grid
orgeohex_grid
aggregation on the<field>
. Thegrid_agg
parameter determines the aggregation type. The aggregation uses the<zoom>/<x>/<y>
tile as a bounding box. - Optionally, a
geo_bounds
aggregation on the<field>
. The search only includes this aggregation if theexact_bounds
parameter istrue
. - If the optional parameter
with_labels
istrue
, the internal search will include a dynamic runtime field that calls thegetLabelPosition
function of the geometry doc value. This enables the generation of new point features containing suggested geometry labels, so that, for example, multi-polygons will have only one label.
For example, Elasticsearch may translate a vector tile search API request with a
grid_agg
argument ofgeotile
and anexact_bounds
argument oftrue
into the following searchGET my-index/_search { "size": 10000, "query": { "geo_bounding_box": { "my-geo-field": { "top_left": { "lat": -40.979898069620134, "lon": -45 }, "bottom_right": { "lat": -66.51326044311186, "lon": 0 } } } }, "aggregations": { "grid": { "geotile_grid": { "field": "my-geo-field", "precision": 11, "size": 65536, "bounds": { "top_left": { "lat": -40.979898069620134, "lon": -45 }, "bottom_right": { "lat": -66.51326044311186, "lon": 0 } } } }, "bounds": { "geo_bounds": { "field": "my-geo-field", "wrap_longitude": false } } } }
The API returns results as a binary Mapbox vector tile. Mapbox vector tiles are encoded as Google Protobufs (PBF). By default, the tile contains three layers:
- A
hits
layer containing a feature for each<field>
value matching thegeo_bounding_box
query. - An
aggs
layer containing a feature for each cell of thegeotile_grid
orgeohex_grid
. The layer only contains features for cells with matching data. - A meta layer containing:
- A feature containing a bounding box. By default, this is the bounding box of the tile.
- Value ranges for any sub-aggregations on the
geotile_grid
orgeohex_grid
. - Metadata for the search.
The API only returns features that can display at its zoom level. For example, if a polygon feature has no area at its zoom level, the API omits it. The API returns errors as UTF-8 encoded JSON.
IMPORTANT: You can specify several options for this API as either a query parameter or request body parameter. If you specify both parameters, the query parameter takes precedence.
Grid precision for geotile
For a
grid_agg
ofgeotile
, you can use cells in theaggs
layer as tiles for lower zoom levels.grid_precision
represents the additional zoom levels available through these cells. The final precision is computed by as follows:<zoom> + grid_precision
. For example, if<zoom>
is 7 andgrid_precision
is 8, then thegeotile_grid
aggregation will use a precision of 15. The maximum final precision is 29. Thegrid_precision
also determines the number of cells for the grid as follows:(2^grid_precision) x (2^grid_precision)
. For example, a value of 8 divides the tile into a grid of 256 x 256 cells. Theaggs
layer only contains features for cells with matching data.Grid precision for geohex
For a
grid_agg
ofgeohex
, Elasticsearch uses<zoom>
andgrid_precision
to calculate a final precision as follows:<zoom> + grid_precision
.This precision determines the H3 resolution of the hexagonal cells produced by the
geohex
aggregation. The following table maps the H3 resolution for each precision. For example, if<zoom>
is 3 andgrid_precision
is 3, the precision is 6. At a precision of 6, hexagonal cells have an H3 resolution of 2. If<zoom>
is 3 andgrid_precision
is 4, the precision is 7. At a precision of 7, hexagonal cells have an H3 resolution of 3.Precision Unique tile bins H3 resolution Unique hex bins Ratio 1 4 0 122 30.5 2 16 0 122 7.625 3 64 1 842 13.15625 4 256 1 842 3.2890625 5 1024 2 5882 5.744140625 6 4096 2 5882 1.436035156 7 16384 3 41162 2.512329102 8 65536 3 41162 0.6280822754 9 262144 4 288122 1.099098206 10 1048576 4 288122 0.2747745514 11 4194304 5 2016842 0.4808526039 12 16777216 6 14117882 0.8414913416 13 67108864 6 14117882 0.2103728354 14 268435456 7 98825162 0.3681524172 15 1073741824 8 691776122 0.644266719 16 4294967296 8 691776122 0.1610666797 17 17179869184 9 4842432842 0.2818666889 18 68719476736 10 33897029882 0.4932667053 19 274877906944 11 237279209162 0.8632167343 20 1099511627776 11 237279209162 0.2158041836 21 4398046511104 12 1660954464122 0.3776573213 22 17592186044416 13 11626681248842 0.6609003122 23 70368744177664 13 11626681248842 0.165225078 24 281474976710656 14 81386768741882 0.2891438866 25 1125899906842620 15 569707381193162 0.5060018015 26 4503599627370500 15 569707381193162 0.1265004504 27 18014398509482000 15 569707381193162 0.03162511259 28 72057594037927900 15 569707381193162 0.007906278149 29 288230376151712000 15 569707381193162 0.001976569537 Hexagonal cells don't align perfectly on a vector tile. Some cells may intersect more than one vector tile. To compute the H3 resolution for each precision, Elasticsearch compares the average density of hexagonal bins at each resolution with the average density of tile bins at each zoom level. Elasticsearch uses the H3 resolution that is closest to the corresponding geotile density.
- Parameters:
fn
- a function that initializes a builder to create theSearchMvtRequest
- See Also:
- A
-
searchShards
Get the search shards.Get the indices and shards that a search request would be run against. This information can be useful for working out issues or planning optimizations with routing and shard preferences. When filtered aliases are used, the filter is returned as part of the
indices
section.If the Elasticsearch security features are enabled, you must have the
view_index_metadata
ormanage
index privilege for the target data stream, index, or alias.- See Also:
-
searchShards
public final CompletableFuture<SearchShardsResponse> searchShards(Function<SearchShardsRequest.Builder, ObjectBuilder<SearchShardsRequest>> fn) Get the search shards.Get the indices and shards that a search request would be run against. This information can be useful for working out issues or planning optimizations with routing and shard preferences. When filtered aliases are used, the filter is returned as part of the
indices
section.If the Elasticsearch security features are enabled, you must have the
view_index_metadata
ormanage
index privilege for the target data stream, index, or alias.- Parameters:
fn
- a function that initializes a builder to create theSearchShardsRequest
- See Also:
-
searchShards
Get the search shards.Get the indices and shards that a search request would be run against. This information can be useful for working out issues or planning optimizations with routing and shard preferences. When filtered aliases are used, the filter is returned as part of the
indices
section.If the Elasticsearch security features are enabled, you must have the
view_index_metadata
ormanage
index privilege for the target data stream, index, or alias.- See Also:
-
searchTemplate
public <TDocument> CompletableFuture<SearchTemplateResponse<TDocument>> searchTemplate(SearchTemplateRequest request, Class<TDocument> tDocumentClass) Run a search with a search template.- See Also:
-
searchTemplate
public final <TDocument> CompletableFuture<SearchTemplateResponse<TDocument>> searchTemplate(Function<SearchTemplateRequest.Builder, ObjectBuilder<SearchTemplateRequest>> fn, Class<TDocument> tDocumentClass) Run a search with a search template.- Parameters:
fn
- a function that initializes a builder to create theSearchTemplateRequest
- See Also:
-
searchTemplate
public CompletableFuture<SearchTemplateResponse<Void>> searchTemplate(SearchTemplateRequest request) Overload ofsearchTemplate(SearchTemplateRequest, Class)
, where Class is defined as Void, meaning the documents will not be deserialized. -
searchTemplate
public final CompletableFuture<SearchTemplateResponse<Void>> searchTemplate(Function<SearchTemplateRequest.Builder, ObjectBuilder<SearchTemplateRequest>> fn) Overload ofsearchTemplate(Function, Class)
, where Class is defined as Void, meaning the documents will not be deserialized. -
searchTemplate
public <TDocument> CompletableFuture<SearchTemplateResponse<TDocument>> searchTemplate(SearchTemplateRequest request, Type tDocumentType) Run a search with a search template.- See Also:
-
searchTemplate
public final <TDocument> CompletableFuture<SearchTemplateResponse<TDocument>> searchTemplate(Function<SearchTemplateRequest.Builder, ObjectBuilder<SearchTemplateRequest>> fn, Type tDocumentType) Run a search with a search template.- Parameters:
fn
- a function that initializes a builder to create theSearchTemplateRequest
- See Also:
-
termsEnum
Get terms in an index.Discover terms that match a partial string in an index. This API is designed for low-latency look-ups used in auto-complete scenarios.
info The terms enum API may return terms from deleted documents. Deleted documents are initially only marked as deleted. It is not until their segments are merged that documents are actually deleted. Until that happens, the terms enum API will return terms from these documents.
- See Also:
-
termsEnum
public final CompletableFuture<TermsEnumResponse> termsEnum(Function<TermsEnumRequest.Builder, ObjectBuilder<TermsEnumRequest>> fn) Get terms in an index.Discover terms that match a partial string in an index. This API is designed for low-latency look-ups used in auto-complete scenarios.
info The terms enum API may return terms from deleted documents. Deleted documents are initially only marked as deleted. It is not until their segments are merged that documents are actually deleted. Until that happens, the terms enum API will return terms from these documents.
- Parameters:
fn
- a function that initializes a builder to create theTermsEnumRequest
- See Also:
-
termvectors
public <TDocument> CompletableFuture<TermvectorsResponse> termvectors(TermvectorsRequest<TDocument> request) Get term vector information.Get information and statistics about terms in the fields of a particular document.
You can retrieve term vectors for documents stored in the index or for artificial documents passed in the body of the request. You can specify the fields you are interested in through the
fields
parameter or by adding the fields to the request body. For example:GET /my-index-000001/_termvectors/1?fields=message
Fields can be specified using wildcards, similar to the multi match query.
Term vectors are real-time by default, not near real-time. This can be changed by setting
realtime
parameter tofalse
.You can request three types of values: term information, term statistics, and field statistics. By default, all term information and field statistics are returned for all fields but term statistics are excluded.
Term information
- term frequency in the field (always returned)
- term positions (
positions: true
) - start and end offsets (
offsets: true
) - term payloads (
payloads: true
), as base64 encoded bytes
If the requested information wasn't stored in the index, it will be computed on the fly if possible. Additionally, term vectors could be computed for documents not even existing in the index, but instead provided by the user.
warn Start and end offsets assume UTF-16 encoding is being used. If you want to use these offsets in order to get the original text that produced this token, you should make sure that the string you are taking a sub-string of is also encoded using UTF-16.
Behaviour
The term and field statistics are not accurate. Deleted documents are not taken into account. The information is only retrieved for the shard the requested document resides in. The term and field statistics are therefore only useful as relative measures whereas the absolute numbers have no meaning in this context. By default, when requesting term vectors of artificial documents, a shard to get the statistics from is randomly selected. Use
routing
only to hit a particular shard.- See Also:
-
termvectors
public final <TDocument> CompletableFuture<TermvectorsResponse> termvectors(Function<TermvectorsRequest.Builder<TDocument>, ObjectBuilder<TermvectorsRequest<TDocument>>> fn) Get term vector information.Get information and statistics about terms in the fields of a particular document.
You can retrieve term vectors for documents stored in the index or for artificial documents passed in the body of the request. You can specify the fields you are interested in through the
fields
parameter or by adding the fields to the request body. For example:GET /my-index-000001/_termvectors/1?fields=message
Fields can be specified using wildcards, similar to the multi match query.
Term vectors are real-time by default, not near real-time. This can be changed by setting
realtime
parameter tofalse
.You can request three types of values: term information, term statistics, and field statistics. By default, all term information and field statistics are returned for all fields but term statistics are excluded.
Term information
- term frequency in the field (always returned)
- term positions (
positions: true
) - start and end offsets (
offsets: true
) - term payloads (
payloads: true
), as base64 encoded bytes
If the requested information wasn't stored in the index, it will be computed on the fly if possible. Additionally, term vectors could be computed for documents not even existing in the index, but instead provided by the user.
warn Start and end offsets assume UTF-16 encoding is being used. If you want to use these offsets in order to get the original text that produced this token, you should make sure that the string you are taking a sub-string of is also encoded using UTF-16.
Behaviour
The term and field statistics are not accurate. Deleted documents are not taken into account. The information is only retrieved for the shard the requested document resides in. The term and field statistics are therefore only useful as relative measures whereas the absolute numbers have no meaning in this context. By default, when requesting term vectors of artificial documents, a shard to get the statistics from is randomly selected. Use
routing
only to hit a particular shard.- Parameters:
fn
- a function that initializes a builder to create theTermvectorsRequest
- See Also:
-
update
public <TDocument,TPartialDocument> CompletableFuture<UpdateResponse<TDocument>> update(UpdateRequest<TDocument, TPartialDocument> request, Class<TDocument> tDocumentClass) Update a document.Update a document by running a script or passing a partial document.
If the Elasticsearch security features are enabled, you must have the
index
orwrite
index privilege for the target index or index alias.The script can update, delete, or skip modifying the document. The API also supports passing a partial document, which is merged into the existing document. To fully replace an existing document, use the index API. This operation:
- Gets the document (collocated with the shard) from the index.
- Runs the specified script.
- Indexes the result.
The document must still be reindexed, but using this API removes some network roundtrips and reduces chances of version conflicts between the GET and the index operation.
The
_source
field must be enabled to use this API. In addition to_source
, you can access the following variables through thectx
map:_index
,_type
,_id
,_version
,_routing
, and_now
(the current timestamp).- See Also:
-
update
public final <TDocument,TPartialDocument> CompletableFuture<UpdateResponse<TDocument>> update(Function<UpdateRequest.Builder<TDocument, TPartialDocument>, ObjectBuilder<UpdateRequest<TDocument, TPartialDocument>>> fn, Class<TDocument> tDocumentClass) Update a document.Update a document by running a script or passing a partial document.
If the Elasticsearch security features are enabled, you must have the
index
orwrite
index privilege for the target index or index alias.The script can update, delete, or skip modifying the document. The API also supports passing a partial document, which is merged into the existing document. To fully replace an existing document, use the index API. This operation:
- Gets the document (collocated with the shard) from the index.
- Runs the specified script.
- Indexes the result.
The document must still be reindexed, but using this API removes some network roundtrips and reduces chances of version conflicts between the GET and the index operation.
The
_source
field must be enabled to use this API. In addition to_source
, you can access the following variables through thectx
map:_index
,_type
,_id
,_version
,_routing
, and_now
(the current timestamp).- Parameters:
fn
- a function that initializes a builder to create theUpdateRequest
- See Also:
-
update
public <TDocument,TPartialDocument> CompletableFuture<UpdateResponse<TDocument>> update(UpdateRequest<TDocument, TPartialDocument> request, Type tDocumentType) Update a document.Update a document by running a script or passing a partial document.
If the Elasticsearch security features are enabled, you must have the
index
orwrite
index privilege for the target index or index alias.The script can update, delete, or skip modifying the document. The API also supports passing a partial document, which is merged into the existing document. To fully replace an existing document, use the index API. This operation:
- Gets the document (collocated with the shard) from the index.
- Runs the specified script.
- Indexes the result.
The document must still be reindexed, but using this API removes some network roundtrips and reduces chances of version conflicts between the GET and the index operation.
The
_source
field must be enabled to use this API. In addition to_source
, you can access the following variables through thectx
map:_index
,_type
,_id
,_version
,_routing
, and_now
(the current timestamp).- See Also:
-
update
public final <TDocument,TPartialDocument> CompletableFuture<UpdateResponse<TDocument>> update(Function<UpdateRequest.Builder<TDocument, TPartialDocument>, ObjectBuilder<UpdateRequest<TDocument, TPartialDocument>>> fn, Type tDocumentType) Update a document.Update a document by running a script or passing a partial document.
If the Elasticsearch security features are enabled, you must have the
index
orwrite
index privilege for the target index or index alias.The script can update, delete, or skip modifying the document. The API also supports passing a partial document, which is merged into the existing document. To fully replace an existing document, use the index API. This operation:
- Gets the document (collocated with the shard) from the index.
- Runs the specified script.
- Indexes the result.
The document must still be reindexed, but using this API removes some network roundtrips and reduces chances of version conflicts between the GET and the index operation.
The
_source
field must be enabled to use this API. In addition to_source
, you can access the following variables through thectx
map:_index
,_type
,_id
,_version
,_routing
, and_now
(the current timestamp).- Parameters:
fn
- a function that initializes a builder to create theUpdateRequest
- See Also:
-
updateByQuery
Update documents. Updates documents that match the specified query. If no query is specified, performs an update on every document in the data stream or index without modifying the source, which is useful for picking up mapping changes.If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or alias:
read
index
orwrite
You can specify the query criteria in the request URI or the request body using the same syntax as the search API.
When you submit an update by query request, Elasticsearch gets a snapshot of the data stream or index when it begins processing the request and updates matching documents using internal versioning. When the versions match, the document is updated and the version number is incremented. If a document changes between the time that the snapshot is taken and the update operation is processed, it results in a version conflict and the operation fails. You can opt to count version conflicts instead of halting and returning by setting
conflicts
toproceed
. Note that if you opt to count version conflicts, the operation could attempt to update more documents from the source thanmax_docs
until it has successfully updatedmax_docs
documents or it has gone through every document in the source query.NOTE: Documents with a version equal to 0 cannot be updated using update by query because internal versioning does not support 0 as a valid version number.
While processing an update by query request, Elasticsearch performs multiple search requests sequentially to find all of the matching documents. A bulk update request is performed for each batch of matching documents. Any query or update failures cause the update by query request to fail and the failures are shown in the response. Any update requests that completed successfully still stick, they are not rolled back.
Throttling update requests
To control the rate at which update by query issues batches of update operations, you can set
requests_per_second
to any positive decimal number. This pads each batch with a wait time to throttle the rate. Setrequests_per_second
to-1
to turn off throttling.Throttling uses a wait time between batches so that the internal scroll requests can be given a timeout that takes the request padding into account. The padding time is the difference between the batch size divided by the
requests_per_second
and the time spent writing. By default the batch size is 1000, so ifrequests_per_second
is set to500
:target_time = 1000 / 500 per second = 2 seconds wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
Since the batch is issued as a single _bulk request, large batch sizes cause Elasticsearch to create many requests and wait before starting the next set. This is "bursty" instead of "smooth".
Slicing
Update by query supports sliced scroll to parallelize the update process. This can improve efficiency and provide a convenient way to break the request down into smaller parts.
Setting
slices
toauto
chooses a reasonable number for most data streams and indices. This setting will use one slice per shard, up to a certain limit. If there are multiple source data streams or indices, it will choose the number of slices based on the index or backing index with the smallest number of shards.Adding
slices
to_update_by_query
just automates the manual process of creating sub-requests, which means it has some quirks:- You can see these requests in the tasks APIs. These sub-requests are "child" tasks of the task for the request with slices.
- Fetching the status of the task for the request with
slices
only contains the status of completed slices. - These sub-requests are individually addressable for things like cancellation and rethrottling.
- Rethrottling the request with
slices
will rethrottle the unfinished sub-request proportionally. - Canceling the request with slices will cancel each sub-request.
- Due to the nature of slices each sub-request won't get a perfectly even portion of the documents. All documents will be addressed, but some slices may be larger than others. Expect larger slices to have a more even distribution.
- Parameters like
requests_per_second
andmax_docs
on a request with slices are distributed proportionally to each sub-request. Combine that with the point above about distribution being uneven and you should conclude that usingmax_docs
withslices
might not result in exactlymax_docs
documents being updated. - Each sub-request gets a slightly different snapshot of the source data stream or index though these are all taken at approximately the same time.
If you're slicing manually or otherwise tuning automatic slicing, keep in mind that:
- Query performance is most efficient when the number of slices is equal to the number of shards in the index or backing index. If that number is large (for example, 500), choose a lower number as too many slices hurts performance. Setting slices higher than the number of shards generally does not improve efficiency and adds overhead.
- Update performance scales linearly across available resources with the number of slices.
Whether query or update performance dominates the runtime depends on the documents being reindexed and cluster resources.
Update the document source
Update by query supports scripts to update the document source. As with the update API, you can set
ctx.op
to change the operation that is performed.Set
ctx.op = "noop"
if your script decides that it doesn't have to make any changes. The update by query operation skips updating the document and increments thenoop
counter.Set
ctx.op = "delete"
if your script decides that the document should be deleted. The update by query operation deletes the document and increments thedeleted
counter.Update by query supports only
index
,noop
, anddelete
. Settingctx.op
to anything else is an error. Setting any other field inctx
is an error. This API enables you to only modify the source of matching documents; you cannot move them.- See Also:
-
updateByQuery
public final CompletableFuture<UpdateByQueryResponse> updateByQuery(Function<UpdateByQueryRequest.Builder, ObjectBuilder<UpdateByQueryRequest>> fn) Update documents. Updates documents that match the specified query. If no query is specified, performs an update on every document in the data stream or index without modifying the source, which is useful for picking up mapping changes.If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or alias:
read
index
orwrite
You can specify the query criteria in the request URI or the request body using the same syntax as the search API.
When you submit an update by query request, Elasticsearch gets a snapshot of the data stream or index when it begins processing the request and updates matching documents using internal versioning. When the versions match, the document is updated and the version number is incremented. If a document changes between the time that the snapshot is taken and the update operation is processed, it results in a version conflict and the operation fails. You can opt to count version conflicts instead of halting and returning by setting
conflicts
toproceed
. Note that if you opt to count version conflicts, the operation could attempt to update more documents from the source thanmax_docs
until it has successfully updatedmax_docs
documents or it has gone through every document in the source query.NOTE: Documents with a version equal to 0 cannot be updated using update by query because internal versioning does not support 0 as a valid version number.
While processing an update by query request, Elasticsearch performs multiple search requests sequentially to find all of the matching documents. A bulk update request is performed for each batch of matching documents. Any query or update failures cause the update by query request to fail and the failures are shown in the response. Any update requests that completed successfully still stick, they are not rolled back.
Throttling update requests
To control the rate at which update by query issues batches of update operations, you can set
requests_per_second
to any positive decimal number. This pads each batch with a wait time to throttle the rate. Setrequests_per_second
to-1
to turn off throttling.Throttling uses a wait time between batches so that the internal scroll requests can be given a timeout that takes the request padding into account. The padding time is the difference between the batch size divided by the
requests_per_second
and the time spent writing. By default the batch size is 1000, so ifrequests_per_second
is set to500
:target_time = 1000 / 500 per second = 2 seconds wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
Since the batch is issued as a single _bulk request, large batch sizes cause Elasticsearch to create many requests and wait before starting the next set. This is "bursty" instead of "smooth".
Slicing
Update by query supports sliced scroll to parallelize the update process. This can improve efficiency and provide a convenient way to break the request down into smaller parts.
Setting
slices
toauto
chooses a reasonable number for most data streams and indices. This setting will use one slice per shard, up to a certain limit. If there are multiple source data streams or indices, it will choose the number of slices based on the index or backing index with the smallest number of shards.Adding
slices
to_update_by_query
just automates the manual process of creating sub-requests, which means it has some quirks:- You can see these requests in the tasks APIs. These sub-requests are "child" tasks of the task for the request with slices.
- Fetching the status of the task for the request with
slices
only contains the status of completed slices. - These sub-requests are individually addressable for things like cancellation and rethrottling.
- Rethrottling the request with
slices
will rethrottle the unfinished sub-request proportionally. - Canceling the request with slices will cancel each sub-request.
- Due to the nature of slices each sub-request won't get a perfectly even portion of the documents. All documents will be addressed, but some slices may be larger than others. Expect larger slices to have a more even distribution.
- Parameters like
requests_per_second
andmax_docs
on a request with slices are distributed proportionally to each sub-request. Combine that with the point above about distribution being uneven and you should conclude that usingmax_docs
withslices
might not result in exactlymax_docs
documents being updated. - Each sub-request gets a slightly different snapshot of the source data stream or index though these are all taken at approximately the same time.
If you're slicing manually or otherwise tuning automatic slicing, keep in mind that:
- Query performance is most efficient when the number of slices is equal to the number of shards in the index or backing index. If that number is large (for example, 500), choose a lower number as too many slices hurts performance. Setting slices higher than the number of shards generally does not improve efficiency and adds overhead.
- Update performance scales linearly across available resources with the number of slices.
Whether query or update performance dominates the runtime depends on the documents being reindexed and cluster resources.
Update the document source
Update by query supports scripts to update the document source. As with the update API, you can set
ctx.op
to change the operation that is performed.Set
ctx.op = "noop"
if your script decides that it doesn't have to make any changes. The update by query operation skips updating the document and increments thenoop
counter.Set
ctx.op = "delete"
if your script decides that the document should be deleted. The update by query operation deletes the document and increments thedeleted
counter.Update by query supports only
index
,noop
, anddelete
. Settingctx.op
to anything else is an error. Setting any other field inctx
is an error. This API enables you to only modify the source of matching documents; you cannot move them.- Parameters:
fn
- a function that initializes a builder to create theUpdateByQueryRequest
- See Also:
-
updateByQueryRethrottle
public CompletableFuture<UpdateByQueryRethrottleResponse> updateByQueryRethrottle(UpdateByQueryRethrottleRequest request) Throttle an update by query operation.Change the number of requests per second for a particular update by query operation. Rethrottling that speeds up the query takes effect immediately but rethrotting that slows down the query takes effect after completing the current batch to prevent scroll timeouts.
- See Also:
-
updateByQueryRethrottle
public final CompletableFuture<UpdateByQueryRethrottleResponse> updateByQueryRethrottle(Function<UpdateByQueryRethrottleRequest.Builder, ObjectBuilder<UpdateByQueryRethrottleRequest>> fn) Throttle an update by query operation.Change the number of requests per second for a particular update by query operation. Rethrottling that speeds up the query takes effect immediately but rethrotting that slows down the query takes effect after completing the current batch to prevent scroll timeouts.
- Parameters:
fn
- a function that initializes a builder to create theUpdateByQueryRethrottleRequest
- See Also:
-