org.apache.accumulo.core.client.mapreduce
Class AbstractInputFormat<K,V>

java.lang.Object
  extended by org.apache.hadoop.mapreduce.InputFormat<K,V>
      extended by org.apache.accumulo.core.client.mapreduce.AbstractInputFormat<K,V>
Direct Known Subclasses:
AccumuloMultiTableInputFormat, InputFormatBase

public abstract class AbstractInputFormat<K,V>
extends org.apache.hadoop.mapreduce.InputFormat<K,V>

An abstract input format to provide shared methods common to all other input format classes. At the very least, any classes inheriting from this class will need to define their own AbstractInputFormat.AbstractRecordReader.


Nested Class Summary
protected static class AbstractInputFormat.AbstractRecordReader<K,V>
          An abstract base class to be used to create RecordReader instances that convert from Accumulo Key/Value pairs to the user's K/V types.
 
Field Summary
protected static Class<?> CLASS
           
protected static org.apache.log4j.Logger log
           
 
Constructor Summary
AbstractInputFormat()
           
 
Method Summary
protected static AuthenticationToken getAuthenticationToken(org.apache.hadoop.mapreduce.JobContext context)
          Gets the authenticated token from either the specified token file or directly from the configuration, whichever was used when the job was configured.
protected static InputTableConfig getInputTableConfig(org.apache.hadoop.mapreduce.JobContext context, String tableName)
          Fetches a InputTableConfig that has been set on the configuration for a specific table.
protected static Map<String,InputTableConfig> getInputTableConfigs(org.apache.hadoop.mapreduce.JobContext context)
          Fetches all InputTableConfigs that have been set on the given job.
protected static Instance getInstance(org.apache.hadoop.mapreduce.JobContext context)
          Initializes an Accumulo Instance based on the configuration.
protected static org.apache.log4j.Level getLogLevel(org.apache.hadoop.mapreduce.JobContext context)
          Gets the log level from this configuration.
protected static String getPrincipal(org.apache.hadoop.mapreduce.JobContext context)
          Gets the user name from the configuration.
protected static Authorizations getScanAuthorizations(org.apache.hadoop.mapreduce.JobContext context)
          Gets the authorizations to set for the scans from the configuration.
 List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context)
          Gets the splits of the tables that have been set on the job.
protected static TabletLocator getTabletLocator(org.apache.hadoop.mapreduce.JobContext context, String table)
          Initializes an Accumulo TabletLocator based on the configuration.
protected static byte[] getToken(org.apache.hadoop.mapreduce.JobContext context)
          Deprecated. since 1.6.0; Use getAuthenticationToken(JobContext) instead.
protected static String getTokenClass(org.apache.hadoop.mapreduce.JobContext context)
          Deprecated. since 1.6.0; Use getAuthenticationToken(JobContext) instead.
protected static Boolean isConnectorInfoSet(org.apache.hadoop.mapreduce.JobContext context)
          Determines if the connector has been configured.
static void setConnectorInfo(org.apache.hadoop.mapreduce.Job job, String principal, AuthenticationToken token)
          Sets the connector information needed to communicate with Accumulo in this job.
static void setConnectorInfo(org.apache.hadoop.mapreduce.Job job, String principal, String tokenFile)
          Sets the connector information needed to communicate with Accumulo in this job.
static void setLogLevel(org.apache.hadoop.mapreduce.Job job, org.apache.log4j.Level level)
          Sets the log level for this job.
static void setMockInstance(org.apache.hadoop.mapreduce.Job job, String instanceName)
          Configures a MockInstance for this job.
static void setScanAuthorizations(org.apache.hadoop.mapreduce.Job job, Authorizations auths)
          Sets the Authorizations used to scan.
static void setZooKeeperInstance(org.apache.hadoop.mapreduce.Job job, ClientConfiguration clientConfig)
          Configures a ZooKeeperInstance for this job.
static void setZooKeeperInstance(org.apache.hadoop.mapreduce.Job job, String instanceName, String zooKeepers)
          Deprecated. since 1.6.0; Use setZooKeeperInstance(Job, ClientConfiguration) instead.
protected static void validateOptions(org.apache.hadoop.mapreduce.JobContext context)
          Check whether a configuration is fully configured to be used with an Accumulo InputFormat.
 
Methods inherited from class org.apache.hadoop.mapreduce.InputFormat
createRecordReader
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CLASS

protected static final Class<?> CLASS

log

protected static final org.apache.log4j.Logger log
Constructor Detail

AbstractInputFormat

public AbstractInputFormat()
Method Detail

setConnectorInfo

public static void setConnectorInfo(org.apache.hadoop.mapreduce.Job job,
                                    String principal,
                                    AuthenticationToken token)
                             throws AccumuloSecurityException
Sets the connector information needed to communicate with Accumulo in this job.

WARNING: The serialized token is stored in the configuration and shared with all MapReduce tasks. It is BASE64 encoded to provide a charset safe conversion to a string, and is not intended to be secure.

Parameters:
job - the Hadoop job instance to be configured
principal - a valid Accumulo user name (user must have Table.CREATE permission)
token - the user's password
Throws:
AccumuloSecurityException
Since:
1.5.0

setConnectorInfo

public static void setConnectorInfo(org.apache.hadoop.mapreduce.Job job,
                                    String principal,
                                    String tokenFile)
                             throws AccumuloSecurityException
Sets the connector information needed to communicate with Accumulo in this job.

Stores the password in a file in HDFS and pulls that into the Distributed Cache in an attempt to be more secure than storing it in the Configuration.

Parameters:
job - the Hadoop job instance to be configured
principal - a valid Accumulo user name (user must have Table.CREATE permission)
tokenFile - the path to the token file
Throws:
AccumuloSecurityException
Since:
1.6.0

isConnectorInfoSet

protected static Boolean isConnectorInfoSet(org.apache.hadoop.mapreduce.JobContext context)
Determines if the connector has been configured.

Parameters:
context - the Hadoop context for the configured job
Returns:
true if the connector has been configured, false otherwise
Since:
1.5.0
See Also:
setConnectorInfo(Job, String, AuthenticationToken)

getPrincipal

protected static String getPrincipal(org.apache.hadoop.mapreduce.JobContext context)
Gets the user name from the configuration.

Parameters:
context - the Hadoop context for the configured job
Returns:
the user name
Since:
1.5.0
See Also:
setConnectorInfo(Job, String, AuthenticationToken)

getTokenClass

@Deprecated
protected static String getTokenClass(org.apache.hadoop.mapreduce.JobContext context)
Deprecated. since 1.6.0; Use getAuthenticationToken(JobContext) instead.

Gets the serialized token class from either the configuration or the token file.

Since:
1.5.0

getToken

@Deprecated
protected static byte[] getToken(org.apache.hadoop.mapreduce.JobContext context)
Deprecated. since 1.6.0; Use getAuthenticationToken(JobContext) instead.

Gets the serialized token from either the configuration or the token file.

Since:
1.5.0

getAuthenticationToken

protected static AuthenticationToken getAuthenticationToken(org.apache.hadoop.mapreduce.JobContext context)
Gets the authenticated token from either the specified token file or directly from the configuration, whichever was used when the job was configured.

Parameters:
context - the Hadoop context for the configured job
Returns:
the principal's authentication token
Since:
1.6.0
See Also:
setConnectorInfo(Job, String, AuthenticationToken), setConnectorInfo(Job, String, String)

setZooKeeperInstance

@Deprecated
public static void setZooKeeperInstance(org.apache.hadoop.mapreduce.Job job,
                                                   String instanceName,
                                                   String zooKeepers)
Deprecated. since 1.6.0; Use setZooKeeperInstance(Job, ClientConfiguration) instead.

Configures a ZooKeeperInstance for this job.

Parameters:
job - the Hadoop job instance to be configured
instanceName - the Accumulo instance name
zooKeepers - a comma-separated list of zookeeper servers
Since:
1.5.0

setZooKeeperInstance

public static void setZooKeeperInstance(org.apache.hadoop.mapreduce.Job job,
                                        ClientConfiguration clientConfig)
Configures a ZooKeeperInstance for this job.

Parameters:
job - the Hadoop job instance to be configured
clientConfig - client configuration containing connection options
Since:
1.6.0

setMockInstance

public static void setMockInstance(org.apache.hadoop.mapreduce.Job job,
                                   String instanceName)
Configures a MockInstance for this job.

Parameters:
job - the Hadoop job instance to be configured
instanceName - the Accumulo instance name
Since:
1.5.0

getInstance

protected static Instance getInstance(org.apache.hadoop.mapreduce.JobContext context)
Initializes an Accumulo Instance based on the configuration.

Parameters:
context - the Hadoop context for the configured job
Returns:
an Accumulo instance
Since:
1.5.0
See Also:
setZooKeeperInstance(Job, String, String), setMockInstance(Job, String)

setLogLevel

public static void setLogLevel(org.apache.hadoop.mapreduce.Job job,
                               org.apache.log4j.Level level)
Sets the log level for this job.

Parameters:
job - the Hadoop job instance to be configured
level - the logging level
Since:
1.5.0

getLogLevel

protected static org.apache.log4j.Level getLogLevel(org.apache.hadoop.mapreduce.JobContext context)
Gets the log level from this configuration.

Parameters:
context - the Hadoop context for the configured job
Returns:
the log level
Since:
1.5.0
See Also:
setLogLevel(Job, Level)

setScanAuthorizations

public static void setScanAuthorizations(org.apache.hadoop.mapreduce.Job job,
                                         Authorizations auths)
Sets the Authorizations used to scan. Must be a subset of the user's authorization. Defaults to the empty set.

Parameters:
job - the Hadoop job instance to be configured
auths - the user's authorizations

getScanAuthorizations

protected static Authorizations getScanAuthorizations(org.apache.hadoop.mapreduce.JobContext context)
Gets the authorizations to set for the scans from the configuration.

Parameters:
context - the Hadoop context for the configured job
Returns:
the Accumulo scan authorizations
Since:
1.5.0
See Also:
setScanAuthorizations(Job, Authorizations)

getInputTableConfigs

protected static Map<String,InputTableConfig> getInputTableConfigs(org.apache.hadoop.mapreduce.JobContext context)
Fetches all InputTableConfigs that have been set on the given job.

Parameters:
context - the Hadoop job instance to be configured
Returns:
the InputTableConfig objects for the job
Since:
1.6.0

getInputTableConfig

protected static InputTableConfig getInputTableConfig(org.apache.hadoop.mapreduce.JobContext context,
                                                      String tableName)
Fetches a InputTableConfig that has been set on the configuration for a specific table.

null is returned in the event that the table doesn't exist.

Parameters:
context - the Hadoop job instance to be configured
tableName - the table name for which to grab the config object
Returns:
the InputTableConfig for the given table
Since:
1.6.0

getTabletLocator

protected static TabletLocator getTabletLocator(org.apache.hadoop.mapreduce.JobContext context,
                                                String table)
                                         throws TableNotFoundException
Initializes an Accumulo TabletLocator based on the configuration.

Parameters:
context - the Hadoop context for the configured job
table - the table for which to initialize the locator
Returns:
an Accumulo tablet locator
Throws:
TableNotFoundException - if the table name set on the configuration doesn't exist
Since:
1.6.0

validateOptions

protected static void validateOptions(org.apache.hadoop.mapreduce.JobContext context)
                               throws IOException
Check whether a configuration is fully configured to be used with an Accumulo InputFormat.

Parameters:
context - the Hadoop context for the configured job
Throws:
IOException - if the context is improperly configured
Since:
1.5.0

getSplits

public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context)
                                                       throws IOException
Gets the splits of the tables that have been set on the job.

Specified by:
getSplits in class org.apache.hadoop.mapreduce.InputFormat<K,V>
Parameters:
context - the configuration of the job
Returns:
the splits from the tables based on the ranges.
Throws:
IOException - if a table set on the job doesn't exist or an error occurs initializing the tablet locator


Copyright © 2015 Apache Accumulo Project. All rights reserved.