org.elasticsearch.hadoop.mr
Class ESInputFormat<K,V>

java.lang.Object
  extended by org.apache.hadoop.mapreduce.InputFormat<K,V>
      extended by org.elasticsearch.hadoop.mr.ESInputFormat<K,V>
All Implemented Interfaces:
InputFormat<K,V>, ConfigurationOptions
Direct Known Subclasses:
ESHiveInputFormat, ESPigInputFormat

public class ESInputFormat<K,V>
extends InputFormat<K,V>
implements InputFormat<K,V>, ConfigurationOptions

ElasticSearch InputFormat for streaming data (typically based on a query) from ElasticSearch. Returns the document ID as key and its content as value.

This class implements both the "old" (org.apache.hadoop.mapred) and the "new" (org.apache.hadoop.mapreduce) API.


Nested Class Summary
protected static class ESInputFormat.ShardInputSplit
           
protected static class ESInputFormat.ShardRecordReader<K,V>
           
protected static class ESInputFormat.WritableShardRecordReader
           
 
Field Summary
 
Fields inherited from interface org.elasticsearch.hadoop.cfg.ConfigurationOptions
ES_BATCH_SIZE_BYTES, ES_BATCH_SIZE_BYTES_DEFAULT, ES_BATCH_SIZE_ENTRIES, ES_BATCH_SIZE_ENTRIES_DEFAULT, ES_BATCH_WRITE_REFRESH, ES_BATCH_WRITE_REFRESH_DEFAULT, ES_HOST, ES_HOST_DEFAULT, ES_HTTP_TIMEOUT, ES_HTTP_TIMEOUT_DEFAULT, ES_INDEX_AUTO_CREATE, ES_INDEX_AUTO_CREATE_DEFAULT, ES_INDEX_READ_MISSING_AS_EMPTY, ES_INDEX_READ_MISSING_AS_EMPTY_DEFAULT, ES_PORT, ES_PORT_DEFAULT, ES_RESOURCE, ES_SCROLL_KEEPALIVE, ES_SCROLL_KEEPALIVE_DEFAULT, ES_SCROLL_SIZE, ES_SCROLL_SIZE_DEFAULT, ES_SERIALIZATION_READER_CLASS, ES_SERIALIZATION_WRITER_CLASS
 
Constructor Summary
ESInputFormat()
           
 
Method Summary
 ESInputFormat.ShardRecordReader createRecordReader(InputSplit split, TaskAttemptContext context)
           
 ESInputFormat.ShardRecordReader getRecordReader(InputSplit split, JobConf job, Reporter reporter)
           
 InputSplit[] getSplits(JobConf job, int numSplits)
           
 List<InputSplit> getSplits(JobContext context)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ESInputFormat

public ESInputFormat()
Method Detail

getSplits

public List<InputSplit> getSplits(JobContext context)
                           throws IOException
Specified by:
getSplits in class InputFormat<K,V>
Throws:
IOException

createRecordReader

public ESInputFormat.ShardRecordReader createRecordReader(InputSplit split,
                                                          TaskAttemptContext context)
Specified by:
createRecordReader in class InputFormat<K,V>

getSplits

public InputSplit[] getSplits(JobConf job,
                              int numSplits)
                       throws IOException
Specified by:
getSplits in interface InputFormat<K,V>
Throws:
IOException

getRecordReader

public ESInputFormat.ShardRecordReader getRecordReader(InputSplit split,
                                                       JobConf job,
                                                       Reporter reporter)
Specified by:
getRecordReader in interface InputFormat<K,V>