org.apache.accumulo.core.client.mapred
Class AbstractInputFormat<K,V>

java.lang.Object
  extended by org.apache.accumulo.core.client.mapred.AbstractInputFormat<K,V>
All Implemented Interfaces:
org.apache.hadoop.mapred.InputFormat<K,V>
Direct Known Subclasses:
AccumuloMultiTableInputFormat, InputFormatBase

public abstract class AbstractInputFormat<K,V>
extends Object
implements org.apache.hadoop.mapred.InputFormat<K,V>

An abstract input format to provide shared methods common to all other input format classes. At the very least, any classes inheriting from this class will need to define their own AbstractInputFormat.AbstractRecordReader.


Nested Class Summary
protected static class AbstractInputFormat.AbstractRecordReader<K,V>
          An abstract base class to be used to create RecordReader instances that convert from Accumulo Key/Value pairs to the user's K/V types.
 
Field Summary
protected static Class<?> CLASS
           
protected static org.apache.log4j.Logger log
           
 
Constructor Summary
AbstractInputFormat()
           
 
Method Summary
protected static AuthenticationToken getAuthenticationToken(org.apache.hadoop.mapred.JobConf job)
          Gets the authenticated token from either the specified token file or directly from the configuration, whichever was used when the job was configured.
static InputTableConfig getInputTableConfig(org.apache.hadoop.mapred.JobConf job, String tableName)
          Fetches a InputTableConfig that has been set on the configuration for a specific table.
static Map<String,InputTableConfig> getInputTableConfigs(org.apache.hadoop.mapred.JobConf job)
          Fetches all InputTableConfigs that have been set on the given Hadoop job.
protected static Instance getInstance(org.apache.hadoop.mapred.JobConf job)
          Initializes an Accumulo Instance based on the configuration.
protected static org.apache.log4j.Level getLogLevel(org.apache.hadoop.mapred.JobConf job)
          Gets the log level from this configuration.
protected static String getPrincipal(org.apache.hadoop.mapred.JobConf job)
          Gets the user name from the configuration.
protected static Authorizations getScanAuthorizations(org.apache.hadoop.mapred.JobConf job)
          Gets the authorizations to set for the scans from the configuration.
 org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf job, int numSplits)
          Read the metadata table to get tablets and match up ranges to them.
protected static TabletLocator getTabletLocator(org.apache.hadoop.mapred.JobConf job, String tableId)
          Initializes an Accumulo TabletLocator based on the configuration.
protected static byte[] getToken(org.apache.hadoop.mapred.JobConf job)
          Deprecated. since 1.6.0; Use getAuthenticationToken(JobConf) instead.
protected static String getTokenClass(org.apache.hadoop.mapred.JobConf job)
          Deprecated. since 1.6.0; Use getAuthenticationToken(JobConf) instead.
protected static Boolean isConnectorInfoSet(org.apache.hadoop.mapred.JobConf job)
          Determines if the connector has been configured.
static void setConnectorInfo(org.apache.hadoop.mapred.JobConf job, String principal, AuthenticationToken token)
          Sets the connector information needed to communicate with Accumulo in this job.
static void setConnectorInfo(org.apache.hadoop.mapred.JobConf job, String principal, String tokenFile)
          Sets the connector information needed to communicate with Accumulo in this job.
static void setLogLevel(org.apache.hadoop.mapred.JobConf job, org.apache.log4j.Level level)
          Sets the log level for this job.
static void setMockInstance(org.apache.hadoop.mapred.JobConf job, String instanceName)
          Configures a MockInstance for this job.
static void setScanAuthorizations(org.apache.hadoop.mapred.JobConf job, Authorizations auths)
          Sets the Authorizations used to scan.
static void setZooKeeperInstance(org.apache.hadoop.mapred.JobConf job, ClientConfiguration clientConfig)
          Configures a ZooKeeperInstance for this job.
static void setZooKeeperInstance(org.apache.hadoop.mapred.JobConf job, String instanceName, String zooKeepers)
          Deprecated. since 1.6.0; Use setZooKeeperInstance(JobConf, ClientConfiguration) instead.
protected static void validateOptions(org.apache.hadoop.mapred.JobConf job)
          Check whether a configuration is fully configured to be used with an Accumulo InputFormat.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.mapred.InputFormat
getRecordReader
 

Field Detail

CLASS

protected static final Class<?> CLASS

log

protected static final org.apache.log4j.Logger log
Constructor Detail

AbstractInputFormat

public AbstractInputFormat()
Method Detail

setConnectorInfo

public static void setConnectorInfo(org.apache.hadoop.mapred.JobConf job,
                                    String principal,
                                    AuthenticationToken token)
                             throws AccumuloSecurityException
Sets the connector information needed to communicate with Accumulo in this job.

WARNING: The serialized token is stored in the configuration and shared with all MapReduce tasks. It is BASE64 encoded to provide a charset safe conversion to a string, and is not intended to be secure.

Parameters:
job - the Hadoop job instance to be configured
principal - a valid Accumulo user name (user must have Table.CREATE permission)
token - the user's password
Throws:
AccumuloSecurityException
Since:
1.5.0

setConnectorInfo

public static void setConnectorInfo(org.apache.hadoop.mapred.JobConf job,
                                    String principal,
                                    String tokenFile)
                             throws AccumuloSecurityException
Sets the connector information needed to communicate with Accumulo in this job.

Stores the password in a file in HDFS and pulls that into the Distributed Cache in an attempt to be more secure than storing it in the Configuration.

Parameters:
job - the Hadoop job instance to be configured
principal - a valid Accumulo user name (user must have Table.CREATE permission)
tokenFile - the path to the token file
Throws:
AccumuloSecurityException
Since:
1.6.0

isConnectorInfoSet

protected static Boolean isConnectorInfoSet(org.apache.hadoop.mapred.JobConf job)
Determines if the connector has been configured.

Parameters:
job - the Hadoop context for the configured job
Returns:
true if the connector has been configured, false otherwise
Since:
1.5.0
See Also:
setConnectorInfo(JobConf, String, AuthenticationToken)

getPrincipal

protected static String getPrincipal(org.apache.hadoop.mapred.JobConf job)
Gets the user name from the configuration.

Parameters:
job - the Hadoop context for the configured job
Returns:
the user name
Since:
1.5.0
See Also:
setConnectorInfo(JobConf, String, AuthenticationToken)

getTokenClass

@Deprecated
protected static String getTokenClass(org.apache.hadoop.mapred.JobConf job)
Deprecated. since 1.6.0; Use getAuthenticationToken(JobConf) instead.

Gets the serialized token class from either the configuration or the token file.

Since:
1.5.0

getToken

@Deprecated
protected static byte[] getToken(org.apache.hadoop.mapred.JobConf job)
Deprecated. since 1.6.0; Use getAuthenticationToken(JobConf) instead.

Gets the serialized token from either the configuration or the token file.

Since:
1.5.0

getAuthenticationToken

protected static AuthenticationToken getAuthenticationToken(org.apache.hadoop.mapred.JobConf job)
Gets the authenticated token from either the specified token file or directly from the configuration, whichever was used when the job was configured.

Parameters:
job - the Hadoop context for the configured job
Returns:
the principal's authentication token
Since:
1.6.0
See Also:
setConnectorInfo(JobConf, String, AuthenticationToken), setConnectorInfo(JobConf, String, String)

setZooKeeperInstance

@Deprecated
public static void setZooKeeperInstance(org.apache.hadoop.mapred.JobConf job,
                                                   String instanceName,
                                                   String zooKeepers)
Deprecated. since 1.6.0; Use setZooKeeperInstance(JobConf, ClientConfiguration) instead.

Configures a ZooKeeperInstance for this job.

Parameters:
job - the Hadoop job instance to be configured
instanceName - the Accumulo instance name
zooKeepers - a comma-separated list of zookeeper servers
Since:
1.5.0

setZooKeeperInstance

public static void setZooKeeperInstance(org.apache.hadoop.mapred.JobConf job,
                                        ClientConfiguration clientConfig)
Configures a ZooKeeperInstance for this job.

Parameters:
job - the Hadoop job instance to be configured
clientConfig - client configuration containing connection options
Since:
1.6.0

setMockInstance

public static void setMockInstance(org.apache.hadoop.mapred.JobConf job,
                                   String instanceName)
Configures a MockInstance for this job.

Parameters:
job - the Hadoop job instance to be configured
instanceName - the Accumulo instance name
Since:
1.5.0

getInstance

protected static Instance getInstance(org.apache.hadoop.mapred.JobConf job)
Initializes an Accumulo Instance based on the configuration.

Parameters:
job - the Hadoop context for the configured job
Returns:
an Accumulo instance
Since:
1.5.0
See Also:
setZooKeeperInstance(JobConf, String, String), setMockInstance(JobConf, String)

setLogLevel

public static void setLogLevel(org.apache.hadoop.mapred.JobConf job,
                               org.apache.log4j.Level level)
Sets the log level for this job.

Parameters:
job - the Hadoop job instance to be configured
level - the logging level
Since:
1.5.0

getLogLevel

protected static org.apache.log4j.Level getLogLevel(org.apache.hadoop.mapred.JobConf job)
Gets the log level from this configuration.

Parameters:
job - the Hadoop context for the configured job
Returns:
the log level
Since:
1.5.0
See Also:
setLogLevel(JobConf, Level)

setScanAuthorizations

public static void setScanAuthorizations(org.apache.hadoop.mapred.JobConf job,
                                         Authorizations auths)
Sets the Authorizations used to scan. Must be a subset of the user's authorization. Defaults to the empty set.

Parameters:
job - the Hadoop job instance to be configured
auths - the user's authorizations
Since:
1.5.0

getScanAuthorizations

protected static Authorizations getScanAuthorizations(org.apache.hadoop.mapred.JobConf job)
Gets the authorizations to set for the scans from the configuration.

Parameters:
job - the Hadoop context for the configured job
Returns:
the Accumulo scan authorizations
Since:
1.5.0
See Also:
setScanAuthorizations(JobConf, Authorizations)

getTabletLocator

protected static TabletLocator getTabletLocator(org.apache.hadoop.mapred.JobConf job,
                                                String tableId)
                                         throws TableNotFoundException
Initializes an Accumulo TabletLocator based on the configuration.

Parameters:
job - the Hadoop context for the configured job
Returns:
an Accumulo tablet locator
Throws:
TableNotFoundException - if the table name set on the configuration doesn't exist
Since:
1.6.0

validateOptions

protected static void validateOptions(org.apache.hadoop.mapred.JobConf job)
                               throws IOException
Check whether a configuration is fully configured to be used with an Accumulo InputFormat.

Parameters:
job - the Hadoop context for the configured job
Throws:
IOException - if the context is improperly configured
Since:
1.5.0

getInputTableConfigs

public static Map<String,InputTableConfig> getInputTableConfigs(org.apache.hadoop.mapred.JobConf job)
Fetches all InputTableConfigs that have been set on the given Hadoop job.

Parameters:
job - the Hadoop job instance to be configured
Returns:
the InputTableConfig objects set on the job
Since:
1.6.0

getInputTableConfig

public static InputTableConfig getInputTableConfig(org.apache.hadoop.mapred.JobConf job,
                                                   String tableName)
Fetches a InputTableConfig that has been set on the configuration for a specific table.

null is returned in the event that the table doesn't exist.

Parameters:
job - the Hadoop job instance to be configured
tableName - the table name for which to grab the config object
Returns:
the InputTableConfig for the given table
Since:
1.6.0

getSplits

public org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf job,
                                                       int numSplits)
                                                throws IOException
Read the metadata table to get tablets and match up ranges to them.

Specified by:
getSplits in interface org.apache.hadoop.mapred.InputFormat<K,V>
Throws:
IOException


Copyright © 2015 Apache Accumulo Project. All rights reserved.