MultiTableInputFormatBase (Apache HBase - Server 0.98.18-hadoop2 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.hadoop.hbase.mapreduce
Class MultiTableInputFormatBase

java.lang.Object
  org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>
      org.apache.hadoop.hbase.mapreduce.MultiTableInputFormatBase

Direct Known Subclasses:: MultiTableInputFormat

@InterfaceAudience.Public @InterfaceStability.Evolving public abstract class MultiTableInputFormatBase
extends org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>
extends org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>

A base for MultiTableInputFormats. Receives a list of Scan instances that define the input tables and filters etc. Subclasses may use other TableRecordReader implementations.

Constructor Summary
`MultiTableInputFormatBase()`

Method Summary
`org.apache.hadoop.mapreduce.RecordReader<ImmutableBytesWritable,Result>`	`createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context)` Builds a TableRecordReader.
`protected List<Scan>`	`getScans()` Allows subclasses to get the list of `Scan` objects.
`List<org.apache.hadoop.mapreduce.InputSplit>`	`getSplits(org.apache.hadoop.mapreduce.JobContext context)` Calculates the splits that will serve as input for the map tasks.
`protected boolean`	`includeRegionInSplit(byte[] startKey, byte[] endKey)` Test if the given region is to be included in the InputSplit while splitting the regions of a table.
`protected void`	`setScans(List<Scan> scans)` Allows subclasses to set the list of `Scan` objects.
`protected void`	`setTableRecordReader(TableRecordReader tableRecordReader)` Allows subclasses to set the `TableRecordReader`.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

MultiTableInputFormatBase

public MultiTableInputFormatBase()

Method Detail

createRecordReader

public org.apache.hadoop.mapreduce.RecordReader<ImmutableBytesWritable,Result> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
                                                                                                  org.apache.hadoop.mapreduce.TaskAttemptContext context)
                                                                                           throws IOException,
                                                                                                  InterruptedException

Builds a TableRecordReader. If no TableRecordReader was provided, uses the default.

Specified by:: createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>

Parameters:: split - The split to work with.; context - The current context.
Returns:: The newly created record reader.
Throws:: IOException - When creating the reader fails.; InterruptedException - when record reader initialization fails
See Also:: InputFormat.createRecordReader( org.apache.hadoop.mapreduce.InputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext)

getSplits

public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context)
                                                       throws IOException

Calculates the splits that will serve as input for the map tasks. The number of splits matches the number of regions in a table.

Specified by:: getSplits in class org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>

Parameters:: context - The current job context.
Returns:: The list of input splits.
Throws:: IOException - When creating the list of splits fails.
See Also:: InputFormat.getSplits(org.apache.hadoop.mapreduce.JobContext)

includeRegionInSplit

protected boolean includeRegionInSplit(byte[] startKey,
                                       byte[] endKey)

Test if the given region is to be included in the InputSplit while splitting the regions of a table.

This optimization is effective when there is a specific reasoning to exclude an entire region from the M-R job, (and hence, not contributing to the InputSplit), given the start and end keys of the same.
Useful when we need to remember the last-processed top record and revisit the [last, current) interval for M-R processing, continuously. In addition to reducing InputSplits, reduces the load on the region server as well, due to the ordering of the keys.

Note: It is possible that endKey.length() == 0 , for the last (recent) region.
Override this method, if you want to bulk exclude regions altogether from M-R. By default, no region is excluded( i.e. all regions are included).

Parameters:: startKey - Start key of the region; endKey - End key of the region
Returns:: true, if this region needs to be included as part of the input (default).

getScans

protected List<Scan> getScans()

Allows subclasses to get the list of Scan objects.

setScans

protected void setScans(List<Scan> scans)

Allows subclasses to set the list of Scan objects.

Parameters:: scans - The list of Scan used to define the input

setTableRecordReader

protected void setTableRecordReader(TableRecordReader tableRecordReader)

Allows subclasses to set the TableRecordReader.

Parameters:: tableRecordReader - A different TableRecordReader implementation.