org.elasticsearch.hadoop.mr
Class ESInputFormat<K,V>
java.lang.Object
org.apache.hadoop.mapreduce.InputFormat<K,V>
org.elasticsearch.hadoop.mr.ESInputFormat<K,V>
- All Implemented Interfaces:
- InputFormat<K,V>, ConfigurationOptions
- Direct Known Subclasses:
- ESHiveInputFormat, ESPigInputFormat
public class ESInputFormat<K,V>
- extends InputFormat<K,V>
- implements InputFormat<K,V>, ConfigurationOptions
ElasticSearch InputFormat
for streaming data (typically based on a query) from ElasticSearch.
Returns the document ID as key and its content as value.
This class implements both the "old" (org.apache.hadoop.mapred) and the "new" (org.apache.hadoop.mapreduce) API.
Fields inherited from interface org.elasticsearch.hadoop.cfg.ConfigurationOptions |
ES_BATCH_SIZE_BYTES, ES_BATCH_SIZE_BYTES_DEFAULT, ES_BATCH_SIZE_ENTRIES, ES_BATCH_SIZE_ENTRIES_DEFAULT, ES_BATCH_WRITE_REFRESH, ES_BATCH_WRITE_REFRESH_DEFAULT, ES_HOST, ES_HOST_DEFAULT, ES_HTTP_TIMEOUT, ES_HTTP_TIMEOUT_DEFAULT, ES_INDEX_AUTO_CREATE, ES_INDEX_AUTO_CREATE_DEFAULT, ES_INDEX_READ_MISSING_AS_EMPTY, ES_INDEX_READ_MISSING_AS_EMPTY_DEFAULT, ES_PORT, ES_PORT_DEFAULT, ES_RESOURCE, ES_SCROLL_KEEPALIVE, ES_SCROLL_KEEPALIVE_DEFAULT, ES_SCROLL_SIZE, ES_SCROLL_SIZE_DEFAULT, ES_SERIALIZATION_READER_CLASS, ES_SERIALIZATION_WRITER_CLASS |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ESInputFormat
public ESInputFormat()
getSplits
public List<InputSplit> getSplits(JobContext context)
throws IOException
- Specified by:
getSplits
in class InputFormat<K,V>
- Throws:
IOException
createRecordReader
public ESInputFormat.ShardRecordReader createRecordReader(InputSplit split,
TaskAttemptContext context)
- Specified by:
createRecordReader
in class InputFormat<K,V>
getSplits
public InputSplit[] getSplits(JobConf job,
int numSplits)
throws IOException
- Specified by:
getSplits
in interface InputFormat<K,V>
- Throws:
IOException
getRecordReader
public ESInputFormat.ShardRecordReader getRecordReader(InputSplit split,
JobConf job,
Reporter reporter)
- Specified by:
getRecordReader
in interface InputFormat<K,V>