ESHiveInputFormat (Elasticsearch Hadoop 1.3.0.M1 API)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.elasticsearch.hadoop.hive
Class ESHiveInputFormat

java.lang.Object
  org.apache.hadoop.mapreduce.InputFormat<K,V>
      org.elasticsearch.hadoop.mr.ESInputFormat<Text,MapWritable>
          org.elasticsearch.hadoop.hive.ESHiveInputFormat

All Implemented Interfaces:: InputFormat<Text,MapWritable>, ConfigurationOptions

public class ESHiveInputFormat
extends ESInputFormat<Text,MapWritable>
extends ESInputFormat<Text,MapWritable>

Hive specific InputFormat. Since Hive code base makes a lot of assumptions about the tables being actual files in HDFS (using instanceof checks without proper else) this class tries to 'fix' this by adding a dummy FileInputFormat to ESInputFormat.

Nested Class Summary

Nested classes/interfaces inherited from class org.elasticsearch.hadoop.mr.ESInputFormat
`ESInputFormat.ShardInputSplit, ESInputFormat.ShardRecordReader<K,V>, ESInputFormat.WritableShardRecordReader`

Field Summary

Fields inherited from interface org.elasticsearch.hadoop.cfg.ConfigurationOptions
ES_BATCH_SIZE_BYTES, ES_BATCH_SIZE_BYTES_DEFAULT, ES_BATCH_SIZE_ENTRIES, ES_BATCH_SIZE_ENTRIES_DEFAULT, ES_BATCH_WRITE_REFRESH, ES_BATCH_WRITE_REFRESH_DEFAULT, ES_HOST, ES_HOST_DEFAULT, ES_HTTP_TIMEOUT, ES_HTTP_TIMEOUT_DEFAULT, ES_INDEX_AUTO_CREATE, ES_INDEX_AUTO_CREATE_DEFAULT, ES_INDEX_READ_MISSING_AS_EMPTY, ES_INDEX_READ_MISSING_AS_EMPTY_DEFAULT, ES_PORT, ES_PORT_DEFAULT, ES_RESOURCE, ES_SCROLL_KEEPALIVE, ES_SCROLL_KEEPALIVE_DEFAULT, ES_SCROLL_SIZE, ES_SCROLL_SIZE_DEFAULT, ES_SERIALIZATION_READER_CLASS, ES_SERIALIZATION_WRITER_CLASS

Fields inherited from interface org.elasticsearch.hadoop.cfg.ConfigurationOptions

ES_BATCH_SIZE_BYTES, ES_BATCH_SIZE_BYTES_DEFAULT, ES_BATCH_SIZE_ENTRIES, ES_BATCH_SIZE_ENTRIES_DEFAULT, ES_BATCH_WRITE_REFRESH, ES_BATCH_WRITE_REFRESH_DEFAULT, ES_HOST, ES_HOST_DEFAULT, ES_HTTP_TIMEOUT, ES_HTTP_TIMEOUT_DEFAULT, ES_INDEX_AUTO_CREATE, ES_INDEX_AUTO_CREATE_DEFAULT, ES_INDEX_READ_MISSING_AS_EMPTY, ES_INDEX_READ_MISSING_AS_EMPTY_DEFAULT, ES_PORT, ES_PORT_DEFAULT, ES_RESOURCE, ES_SCROLL_KEEPALIVE, ES_SCROLL_KEEPALIVE_DEFAULT, ES_SCROLL_SIZE, ES_SCROLL_SIZE_DEFAULT, ES_SERIALIZATION_READER_CLASS, ES_SERIALIZATION_WRITER_CLASS

Constructor Summary
`ESHiveInputFormat()`

Method Summary
`ESInputFormat.ShardRecordReader`	`getRecordReader(InputSplit split, JobConf job, Reporter reporter)`
`FileSplit[]`	`getSplits(JobConf job, int numSplits)`

Methods inherited from class org.elasticsearch.hadoop.mr.ESInputFormat
`createRecordReader, getSplits`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

ESHiveInputFormat

public ESHiveInputFormat()

Method Detail

getSplits

public FileSplit[] getSplits(JobConf job,
                             int numSplits)
                      throws IOException

Specified by:: getSplits in interface InputFormat<Text,MapWritable>
Overrides:: getSplits in class ESInputFormat<Text,MapWritable>

Throws:: IOException

getRecordReader

public ESInputFormat.ShardRecordReader getRecordReader(InputSplit split,
                                                       JobConf job,
                                                       Reporter reporter)

Specified by:: getRecordReader in interface InputFormat<Text,MapWritable>
Overrides:: getRecordReader in class ESInputFormat<Text,MapWritable>