org.apache.hadoop.hbase.mapreduce
Class MultiTableInputFormatBase

java.lang.Object
  extended by org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>
      extended by org.apache.hadoop.hbase.mapreduce.MultiTableInputFormatBase
Direct Known Subclasses:
MultiTableInputFormat

@InterfaceAudience.Public
@InterfaceStability.Evolving
public abstract class MultiTableInputFormatBase
extends org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>

A base for MultiTableInputFormats. Receives a list of Scan instances that define the input tables and filters etc. Subclasses may use other TableRecordReader implementations.


Constructor Summary
MultiTableInputFormatBase()
           
 
Method Summary
 org.apache.hadoop.mapreduce.RecordReader<ImmutableBytesWritable,Result> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context)
          Builds a TableRecordReader.
protected  List<Scan> getScans()
          Allows subclasses to get the list of Scan objects.
 List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context)
          Calculates the splits that will serve as input for the map tasks.
protected  boolean includeRegionInSplit(byte[] startKey, byte[] endKey)
          Test if the given region is to be included in the InputSplit while splitting the regions of a table.
protected  void setScans(List<Scan> scans)
          Allows subclasses to set the list of Scan objects.
protected  void setTableRecordReader(TableRecordReader tableRecordReader)
          Allows subclasses to set the TableRecordReader.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MultiTableInputFormatBase

public MultiTableInputFormatBase()
Method Detail

createRecordReader

public org.apache.hadoop.mapreduce.RecordReader<ImmutableBytesWritable,Result> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
                                                                                                  org.apache.hadoop.mapreduce.TaskAttemptContext context)
                                                                                           throws IOException,
                                                                                                  InterruptedException
Builds a TableRecordReader. If no TableRecordReader was provided, uses the default.

Specified by:
createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>
Parameters:
split - The split to work with.
context - The current context.
Returns:
The newly created record reader.
Throws:
IOException - When creating the reader fails.
InterruptedException - when record reader initialization fails
See Also:
InputFormat.createRecordReader( org.apache.hadoop.mapreduce.InputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext)

getSplits

public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context)
                                                       throws IOException
Calculates the splits that will serve as input for the map tasks. The number of splits matches the number of regions in a table.

Specified by:
getSplits in class org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>
Parameters:
context - The current job context.
Returns:
The list of input splits.
Throws:
IOException - When creating the list of splits fails.
See Also:
InputFormat.getSplits(org.apache.hadoop.mapreduce.JobContext)

includeRegionInSplit

protected boolean includeRegionInSplit(byte[] startKey,
                                       byte[] endKey)
Test if the given region is to be included in the InputSplit while splitting the regions of a table.

This optimization is effective when there is a specific reasoning to exclude an entire region from the M-R job, (and hence, not contributing to the InputSplit), given the start and end keys of the same.
Useful when we need to remember the last-processed top record and revisit the [last, current) interval for M-R processing, continuously. In addition to reducing InputSplits, reduces the load on the region server as well, due to the ordering of the keys.

Note: It is possible that endKey.length() == 0 , for the last (recent) region.
Override this method, if you want to bulk exclude regions altogether from M-R. By default, no region is excluded( i.e. all regions are included).

Parameters:
startKey - Start key of the region
endKey - End key of the region
Returns:
true, if this region needs to be included as part of the input (default).

getScans

protected List<Scan> getScans()
Allows subclasses to get the list of Scan objects.


setScans

protected void setScans(List<Scan> scans)
Allows subclasses to set the list of Scan objects.

Parameters:
scans - The list of Scan used to define the input

setTableRecordReader

protected void setTableRecordReader(TableRecordReader tableRecordReader)
Allows subclasses to set the TableRecordReader.

Parameters:
tableRecordReader - A different TableRecordReader implementation.


Copyright © 2007-2016 The Apache Software Foundation. All Rights Reserved.