org.elasticsearch.hadoop.hive
Class ESHiveInputFormat
java.lang.Object
org.apache.hadoop.mapreduce.InputFormat<K,V>
org.elasticsearch.hadoop.mr.ESInputFormat<Text,MapWritable>
org.elasticsearch.hadoop.hive.ESHiveInputFormat
- All Implemented Interfaces:
- InputFormat<Text,MapWritable>, ConfigurationOptions
public class ESHiveInputFormat
- extends ESInputFormat<Text,MapWritable>
Hive specific InputFormat. Since Hive code base makes a lot of assumptions about the tables being actual files in HDFS (using instanceof checks without proper else) this class tries to 'fix' this by
adding a dummy FileInputFormat
to ESInputFormat.
Fields inherited from interface org.elasticsearch.hadoop.cfg.ConfigurationOptions |
ES_BATCH_SIZE_BYTES, ES_BATCH_SIZE_BYTES_DEFAULT, ES_BATCH_SIZE_ENTRIES, ES_BATCH_SIZE_ENTRIES_DEFAULT, ES_BATCH_WRITE_REFRESH, ES_BATCH_WRITE_REFRESH_DEFAULT, ES_HOST, ES_HOST_DEFAULT, ES_HTTP_TIMEOUT, ES_HTTP_TIMEOUT_DEFAULT, ES_INDEX_AUTO_CREATE, ES_INDEX_AUTO_CREATE_DEFAULT, ES_INDEX_READ_MISSING_AS_EMPTY, ES_INDEX_READ_MISSING_AS_EMPTY_DEFAULT, ES_PORT, ES_PORT_DEFAULT, ES_RESOURCE, ES_SCROLL_KEEPALIVE, ES_SCROLL_KEEPALIVE_DEFAULT, ES_SCROLL_SIZE, ES_SCROLL_SIZE_DEFAULT, ES_SERIALIZATION_READER_CLASS, ES_SERIALIZATION_WRITER_CLASS |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ESHiveInputFormat
public ESHiveInputFormat()
getSplits
public FileSplit[] getSplits(JobConf job,
int numSplits)
throws IOException
- Specified by:
getSplits
in interface InputFormat<Text,MapWritable>
- Overrides:
getSplits
in class ESInputFormat<Text,MapWritable>
- Throws:
IOException
getRecordReader
public ESInputFormat.ShardRecordReader getRecordReader(InputSplit split,
JobConf job,
Reporter reporter)
- Specified by:
getRecordReader
in interface InputFormat<Text,MapWritable>
- Overrides:
getRecordReader
in class ESInputFormat<Text,MapWritable>