TableInputFormatBase (Apache HBase - Server 0.98.18-hadoop2 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.hadoop.hbase.mapreduce
Class TableInputFormatBase

java.lang.Object
  org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>
      org.apache.hadoop.hbase.mapreduce.TableInputFormatBase

Direct Known Subclasses:: TableInputFormat

@InterfaceAudience.Public @InterfaceStability.Stable public abstract class TableInputFormatBase
extends org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>
extends org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>

A base for TableInputFormats. Receives a HTable, an Scan instance that defines the input columns etc. Subclasses may use other TableRecordReader implementations.

An example of a subclass:

   public static class ExampleTIF extends TableInputFormatBase implements JobConfigurable {












Field Summary



static String
INPUT_AUTOBALANCE_MAXSKEWRATIO



          Specify if ratio for data skew in M/R jobs, it goes well with the enabling hbase.mapreduce
 .input.autobalance property.



static String
MAPREDUCE_INPUT_AUTOBALANCE



          Specify if we enable auto-balance for input in M/R jobs.



static String
TABLE_ROW_TEXTKEY



          Specify if the row key in table is text (ASCII between 32~126),
 default is true.


 






Constructor Summary


TableInputFormatBase()



           


 






Method Summary



 List<org.apache.hadoop.mapreduce.InputSplit>
calculateRebalancedSplits(List<org.apache.hadoop.mapreduce.InputSplit> list,
                          org.apache.hadoop.mapreduce.JobContext context,
                          long average)



          Calculates the number of MapReduce input splits for the map tasks.



 org.apache.hadoop.mapreduce.RecordReader<ImmutableBytesWritable,Result>
createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
                   org.apache.hadoop.mapreduce.TaskAttemptContext context)



          Builds a TableRecordReader.



protected  HTable
getHTable()



          Allows subclasses to get the HTable.



 Scan
getScan()



          Gets the scan defining the actual details like columns etc.



static byte[]
getSplitKey(byte[] start,
            byte[] end,
            boolean isText)



          select a split point in the region.



 List<org.apache.hadoop.mapreduce.InputSplit>
getSplits(org.apache.hadoop.mapreduce.JobContext context)



          Calculates the splits that will serve as input for the map tasks.



protected  Pair<byte[][],byte[][]>
getStartEndKeys()



           



protected  boolean
includeRegionInSplit(byte[] startKey,
                     byte[] endKey)



          Test if the given region is to be included in the InputSplit while splitting
 the regions of a table.



 String
reverseDNS(InetAddress ipAddress)



           



protected  void
setHTable(HTable table)



          Allows subclasses to set the HTable.



 void
setScan(Scan scan)



          Sets the scan defining the actual details like columns etc.



protected  void
setTableRecordReader(TableRecordReader tableRecordReader)



          Allows subclasses to set the TableRecordReader.


 


Methods inherited from class java.lang.Object


clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait


 








Field Detail




MAPREDUCE_INPUT_AUTOBALANCE
public static final String MAPREDUCE_INPUT_AUTOBALANCE

Specify if we enable auto-balance for input in M/R jobs.


See Also:
Constant Field Values




INPUT_AUTOBALANCE_MAXSKEWRATIO
public static final String INPUT_AUTOBALANCE_MAXSKEWRATIO

Specify if ratio for data skew in M/R jobs, it goes well with the enabling hbase.mapreduce
 .input.autobalance property.


See Also:
Constant Field Values




TABLE_ROW_TEXTKEY
public static final String TABLE_ROW_TEXTKEY

Specify if the row key in table is text (ASCII between 32~126),
 default is true. False means the table is using binary row key


See Also:
Constant Field Values








Constructor Detail




TableInputFormatBase
public TableInputFormatBase()









Method Detail




createRecordReader
public org.apache.hadoop.mapreduce.RecordReader<ImmutableBytesWritable,Result> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
                                                                                                  org.apache.hadoop.mapreduce.TaskAttemptContext context)
                                                                                           throws IOException

Builds a TableRecordReader. If no TableRecordReader was provided, uses
 the default.


Specified by:
createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>


Parameters:
split - The split to work with.
context - The current context.
Returns:
The newly created record reader.
Throws:
IOException - When creating the reader fails.
See Also:
InputFormat.createRecordReader(
   org.apache.hadoop.mapreduce.InputSplit,
   org.apache.hadoop.mapreduce.TaskAttemptContext)





getStartEndKeys
protected Pair<byte[][],byte[][]> getStartEndKeys()
                                           throws IOException



Throws:
IOException





getSplits
public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context)
                                                       throws IOException

Calculates the splits that will serve as input for the map tasks. The
 number of splits matches the number of regions in a table.


Specified by:
getSplits in class org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>


Parameters:
context - The current job context.
Returns:
The list of input splits.
Throws:
IOException - When creating the list of splits fails.
See Also:
InputFormat.getSplits(
   org.apache.hadoop.mapreduce.JobContext)





reverseDNS
public String reverseDNS(InetAddress ipAddress)
                  throws NamingException,
                         UnknownHostException



Throws:
NamingException
UnknownHostException





calculateRebalancedSplits
public List<org.apache.hadoop.mapreduce.InputSplit> calculateRebalancedSplits(List<org.apache.hadoop.mapreduce.InputSplit> list,
                                                                              org.apache.hadoop.mapreduce.JobContext context,
                                                                              long average)
                                                                       throws IOException

Calculates the number of MapReduce input splits for the map tasks. The number of
 MapReduce input splits depends on the average region size and the "data skew ratio" user set in
 configuration.


Parameters:
list - The list of input splits before balance.
context - The current job context.
average - The average size of all regions .
Returns:
The list of input splits.
Throws:
IOException - When creating the list of splits fails.
See Also:
InputFormat.getSplits(
   org.apache.hadoop.mapreduce.JobContext)





getSplitKey
public static byte[] getSplitKey(byte[] start,
                                 byte[] end,
                                 boolean isText)

select a split point in the region. The selection of the split point is based on an uniform
 distribution assumption for the keys in a region.
 Here are some examples:
 startKey: aaabcdefg  endKey: aaafff    split point: aaad
 startKey: 111000  endKey: 1125790    split point: 111b
 startKey: 1110  endKey: 1120    split point: 111_
 startKey: binary key { 13, -19, 126, 127 }, endKey: binary key { 13, -19, 127, 0 },
 split point: binary key { 13, -19, 127, -64 }
 Set this function as "public static", make it easier for test.


Parameters:
start - Start key of the region
end - End key of the region
isText - It determines to use text key mode or binary key mode
Returns:
The split point in the region.





includeRegionInSplit
protected boolean includeRegionInSplit(byte[] startKey,
                                       byte[] endKey)

Test if the given region is to be included in the InputSplit while splitting
 the regions of a table.
 
 This optimization is effective when there is a specific reasoning to exclude an entire region from the M-R job,
 (and hence, not contributing to the InputSplit), given the start and end keys of the same. 

 Useful when we need to remember the last-processed top record and revisit the [last, current) interval for M-R processing,
 continuously. In addition to reducing InputSplits, reduces the load on the region server as 
 well, due to the ordering of the keys.
 

 

 Note: It is possible that endKey.length() == 0  , for the last (recent) region.
 

 Override this method, if you want to bulk exclude regions altogether from M-R.
 By default, no region is excluded( i.e. all regions are included).


Parameters:
startKey - Start key of the region
endKey - End key of the region
Returns:
true, if this region needs to be included as part of the input (default).





getHTable
protected HTable getHTable()

Allows subclasses to get the HTable.








setHTable
protected void setHTable(HTable table)

Allows subclasses to set the HTable.


Parameters:
table - The table to get the data from.





getScan
public Scan getScan()

Gets the scan defining the actual details like columns etc.



Returns:
The internal scan instance.





setScan
public void setScan(Scan scan)

Sets the scan defining the actual details like columns etc.


Parameters:
scan - The scan to set.





setTableRecordReader
protected void setTableRecordReader(TableRecordReader tableRecordReader)

Allows subclasses to set the TableRecordReader.


Parameters:
tableRecordReader - A different TableRecordReader
   implementation.














  
      Overview 
      Package 
    Class 
      Use 
      Tree 
      Deprecated 
      Index 
      Help 
  









 PREV CLASS 
 NEXT CLASS

  FRAMES   
 NO FRAMES   
 







  SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD






Copyright © 2007-2016 The Apache Software Foundation. All Rights Reserved.

Field Summary
`static String`	`INPUT_AUTOBALANCE_MAXSKEWRATIO` Specify if ratio for data skew in M/R jobs, it goes well with the enabling hbase.mapreduce .input.autobalance property.
`static String`	`MAPREDUCE_INPUT_AUTOBALANCE` Specify if we enable auto-balance for input in M/R jobs.
`static String`	`TABLE_ROW_TEXTKEY` Specify if the row key in table is text (ASCII between 32~126), default is true.

Method Summary
`List<org.apache.hadoop.mapreduce.InputSplit>`	`calculateRebalancedSplits(List<org.apache.hadoop.mapreduce.InputSplit> list, org.apache.hadoop.mapreduce.JobContext context, long average)` Calculates the number of MapReduce input splits for the map tasks.
`org.apache.hadoop.mapreduce.RecordReader<ImmutableBytesWritable,Result>`	`createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context)` Builds a TableRecordReader.
`protected HTable`	`getHTable()` Allows subclasses to get the `HTable`.
`Scan`	`getScan()` Gets the scan defining the actual details like columns etc.
`static byte[]`	`getSplitKey(byte[] start, byte[] end, boolean isText)` select a split point in the region.
`List<org.apache.hadoop.mapreduce.InputSplit>`	`getSplits(org.apache.hadoop.mapreduce.JobContext context)` Calculates the splits that will serve as input for the map tasks.
`protected Pair<byte[][],byte[][]>`	`getStartEndKeys()`
`protected boolean`	`includeRegionInSplit(byte[] startKey, byte[] endKey)` Test if the given region is to be included in the InputSplit while splitting the regions of a table.
`String`	`reverseDNS(InetAddress ipAddress)`
`protected void`	`setHTable(HTable table)` Allows subclasses to set the `HTable`.
`void`	`setScan(Scan scan)` Sets the scan defining the actual details like columns etc.
`protected void`	`setTableRecordReader(TableRecordReader tableRecordReader)` Allows subclasses to set the `TableRecordReader`.

org.apache.hadoop.hbase.mapreduce Class TableInputFormatBase

MAPREDUCE_INPUT_AUTOBALANCE

INPUT_AUTOBALANCE_MAXSKEWRATIO

TABLE_ROW_TEXTKEY

TableInputFormatBase

createRecordReader

getStartEndKeys

getSplits

reverseDNS

calculateRebalancedSplits

getSplitKey

includeRegionInSplit

getHTable

setHTable

getScan

setScan

setTableRecordReader

org.apache.hadoop.hbase.mapreduce
Class TableInputFormatBase