org.apache.hadoop.hbase.mapreduce
Class GroupingTableMapper

java.lang.Object
  extended by org.apache.hadoop.mapreduce.Mapper<ImmutableBytesWritable,Result,KEYOUT,VALUEOUT>
      extended by org.apache.hadoop.hbase.mapreduce.TableMapper<ImmutableBytesWritable,Result>
          extended by org.apache.hadoop.hbase.mapreduce.GroupingTableMapper
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable

@InterfaceAudience.Public
@InterfaceStability.Stable
public class GroupingTableMapper
extends TableMapper<ImmutableBytesWritable,Result>
implements org.apache.hadoop.conf.Configurable

Extract grouping columns from input record.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.Mapper
org.apache.hadoop.mapreduce.Mapper.Context
 
Field Summary
protected  byte[][] columns
          The grouping columns.
static String GROUP_COLUMNS
          JobConf parameter to specify the columns used to produce the key passed to collect from the map phase.
 
Constructor Summary
GroupingTableMapper()
           
 
Method Summary
protected  ImmutableBytesWritable createGroupKey(byte[][] vals)
          Create a key by concatenating multiple column values.
protected  byte[][] extractKeyValues(Result r)
          Extract columns values from the current record.
 org.apache.hadoop.conf.Configuration getConf()
          Returns the current configuration.
static void initJob(String table, Scan scan, String groupColumns, Class<? extends TableMapper> mapper, org.apache.hadoop.mapreduce.Job job)
          Use this before submitting a TableMap job.
 void map(ImmutableBytesWritable key, Result value, org.apache.hadoop.mapreduce.Mapper.Context context)
          Extract the grouping columns from value to construct a new key.
 void setConf(org.apache.hadoop.conf.Configuration configuration)
          Sets the configuration.
 
Methods inherited from class org.apache.hadoop.mapreduce.Mapper
cleanup, run, setup
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

GROUP_COLUMNS

public static final String GROUP_COLUMNS
JobConf parameter to specify the columns used to produce the key passed to collect from the map phase.

See Also:
Constant Field Values

columns

protected byte[][] columns
The grouping columns.

Constructor Detail

GroupingTableMapper

public GroupingTableMapper()
Method Detail

initJob

public static void initJob(String table,
                           Scan scan,
                           String groupColumns,
                           Class<? extends TableMapper> mapper,
                           org.apache.hadoop.mapreduce.Job job)
                    throws IOException
Use this before submitting a TableMap job. It will appropriately set up the job.

Parameters:
table - The table to be processed.
scan - The scan with the columns etc.
groupColumns - A space separated list of columns used to form the key used in collect.
mapper - The mapper class.
job - The current job.
Throws:
IOException - When setting up the job fails.

map

public void map(ImmutableBytesWritable key,
                Result value,
                org.apache.hadoop.mapreduce.Mapper.Context context)
         throws IOException,
                InterruptedException
Extract the grouping columns from value to construct a new key. Pass the new key and value to reduce. If any of the grouping columns are not found in the value, the record is skipped.

Overrides:
map in class org.apache.hadoop.mapreduce.Mapper<ImmutableBytesWritable,Result,ImmutableBytesWritable,Result>
Parameters:
key - The current key.
value - The current value.
context - The current context.
Throws:
IOException - When writing the record fails.
InterruptedException - When the job is aborted.

extractKeyValues

protected byte[][] extractKeyValues(Result r)
Extract columns values from the current record. This method returns null if any of the columns are not found.

Override this method if you want to deal with nulls differently.

Parameters:
r - The current values.
Returns:
Array of byte values.

createGroupKey

protected ImmutableBytesWritable createGroupKey(byte[][] vals)
Create a key by concatenating multiple column values.

Override this function in order to produce different types of keys.

Parameters:
vals - The current key/values.
Returns:
A key generated by concatenating multiple column values.

getConf

public org.apache.hadoop.conf.Configuration getConf()
Returns the current configuration.

Specified by:
getConf in interface org.apache.hadoop.conf.Configurable
Returns:
The current configuration.
See Also:
Configurable.getConf()

setConf

public void setConf(org.apache.hadoop.conf.Configuration configuration)
Sets the configuration. This is used to set up the grouping details.

Specified by:
setConf in interface org.apache.hadoop.conf.Configurable
Parameters:
configuration - The configuration to set.
See Also:
Configurable.setConf( org.apache.hadoop.conf.Configuration)


Copyright © 2007-2016 The Apache Software Foundation. All Rights Reserved.