Class InputConfigurator
- java.lang.Object
-
- org.apache.accumulo.core.client.mapreduce.lib.impl.ConfiguratorBase
-
- org.apache.accumulo.core.client.mapreduce.lib.impl.InputConfigurator
-
public class InputConfigurator extends ConfiguratorBase
- Since:
- 1.6.0
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
InputConfigurator.Features
Configuration keys for various features.static class
InputConfigurator.ScanOpts
Configuration keys forScanner
.-
Nested classes/interfaces inherited from class org.apache.accumulo.core.client.mapreduce.lib.impl.ConfiguratorBase
ConfiguratorBase.ConnectorInfo, ConfiguratorBase.GeneralOpts, ConfiguratorBase.InstanceOpts, ConfiguratorBase.TokenSource
-
-
Constructor Summary
Constructors Constructor Description InputConfigurator()
-
Method Summary
All Methods Static Methods Concrete Methods Deprecated Methods Modifier and Type Method Description static void
addIterator(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, IteratorSetting cfg)
Encode an iterator on the input for the single input table associated with this job.static Map<String,Map<KeyExtent,List<Range>>>
binOffline(String tableId, List<Range> ranges, Instance instance, Connector conn)
static Set<Pair<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>>
deserializeFetchedColumns(Collection<String> serialized)
static void
fetchColumns(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, Collection<Pair<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>> columnFamilyColumnQualifierPairs)
Restricts the columns that will be mapped over for the single input table on this job.static Boolean
getAutoAdjustRanges(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
Determines whether a configuration has auto-adjust ranges enabled.static String
getClassLoaderContext(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
Gets the name of the context classloader to use for scansprotected static Map.Entry<String,InputTableConfig>
getDefaultInputTableConfig(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, String tableName)
Returns theInputTableConfig
for the configuration based on the properties set using the single-table input methods.static Set<Pair<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>>
getFetchedColumns(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
Gets the columns to be mapped over from this job.static InputTableConfig
getInputTableConfig(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, String tableName)
Returns theInputTableConfig
for the given tablestatic Map<String,InputTableConfig>
getInputTableConfigs(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
Returns allInputTableConfig
objects associated with this job.static String
getInputTableName(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
Sets the name of the input table, over which this job will scan.static List<IteratorSetting>
getIterators(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
Gets a list of the iterator settings (for iterators to apply to a scanner) from this configuration.static List<Range>
getRanges(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
Gets the ranges to scan over from a job.static SamplerConfiguration
getSamplerConfiguration(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
static Authorizations
getScanAuthorizations(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
Gets the authorizations to set for the scans from the configuration.static TabletLocator
getTabletLocator(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, String tableId)
Initializes an AccumuloTabletLocator
based on the configuration.static Boolean
isBatchScan(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
Determines whether a configuration has the BatchScanner feature enabled.static Boolean
isIsolated(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
Determines whether a configuration has isolation enabled.static Boolean
isOfflineScan(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
Determines whether a configuration has the offline table scan feature enabled.static String[]
serializeColumns(Collection<Pair<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>> columnFamilyColumnQualifierPairs)
static void
setAutoAdjustRanges(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, boolean enableFeature)
Controls the automatic adjustment of ranges for this job.static void
setBatchScan(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, boolean enableFeature)
Controls the use of theBatchScanner
in this job.static void
setClassLoaderContext(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, String context)
Sets the name of the context classloader to use for scansstatic void
setInputTableConfigs(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, Map<String,InputTableConfig> configs)
Sets configurations for multiple tables at a time.static void
setInputTableName(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, String tableName)
Sets the name of the input table, over which this job will scan.static void
setLocalIterators(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, boolean enableFeature)
Controls the use of theClientSideIteratorScanner
in this job.static void
setOfflineTableScan(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, boolean enableFeature)
Enable reading offline tables.static void
setRanges(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, Collection<Range> ranges)
Sets the input ranges to scan on all input tables for this job.static void
setSamplerConfiguration(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, SamplerConfiguration samplerConfig)
static void
setScanAuthorizations(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, Authorizations auths)
Sets theAuthorizations
used to scan.static void
setScanIsolation(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, boolean enableFeature)
Controls the use of theIsolatedScanner
in this job.static Boolean
usesLocalIterators(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
Determines whether a configuration uses local iterators.static Instance
validateInstance(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
Validates and extracts anInstance
from the configurationstatic void
validateOptions(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
Deprecated.static void
validatePermissions(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, Connector conn)
Validates that the user has permissions on the requested tables-
Methods inherited from class org.apache.accumulo.core.client.mapreduce.lib.impl.ConfiguratorBase
enumToConfKey, enumToConfKey, getAuthenticationToken, getClientConfiguration, getInstance, getLogLevel, getPrincipal, getTokenFromFile, getVisibilityCacheSize, isConnectorInfoSet, setConnectorInfo, setConnectorInfo, setLogLevel, setMockInstance, setVisibilityCacheSize, setZooKeeperInstance, unwrapAuthenticationToken, unwrapAuthenticationToken
-
-
-
-
Method Detail
-
setClassLoaderContext
public static void setClassLoaderContext(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, String context)
Sets the name of the context classloader to use for scans- Parameters:
implementingClass
- the class whose name will be used as a prefix for the property configuration keyconf
- the Hadoop configuration object to configurecontext
- the name of the context classloader- Since:
- 1.8.0
-
getClassLoaderContext
public static String getClassLoaderContext(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
Gets the name of the context classloader to use for scans- Parameters:
implementingClass
- the class whose name will be used as a prefix for the property configuration keyconf
- the Hadoop configuration object to configure- Returns:
- the classloader context name
- Since:
- 1.8.0
-
setInputTableName
public static void setInputTableName(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, String tableName)
Sets the name of the input table, over which this job will scan.- Parameters:
implementingClass
- the class whose name will be used as a prefix for the property configuration keyconf
- the Hadoop configuration object to configuretableName
- the table to use when the tablename is null in the write call- Since:
- 1.6.0
-
getInputTableName
public static String getInputTableName(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
Sets the name of the input table, over which this job will scan.- Parameters:
implementingClass
- the class whose name will be used as a prefix for the property configuration keyconf
- the Hadoop configuration object to configure- Since:
- 1.6.0
-
setScanAuthorizations
public static void setScanAuthorizations(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, Authorizations auths)
Sets theAuthorizations
used to scan. Must be a subset of the user's authorization. Defaults to the empty set.- Parameters:
implementingClass
- the class whose name will be used as a prefix for the property configuration keyconf
- the Hadoop configuration object to configureauths
- the user's authorizations- Since:
- 1.6.0
-
getScanAuthorizations
public static Authorizations getScanAuthorizations(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
Gets the authorizations to set for the scans from the configuration.- Parameters:
implementingClass
- the class whose name will be used as a prefix for the property configuration keyconf
- the Hadoop configuration object to configure- Returns:
- the Accumulo scan authorizations
- Since:
- 1.6.0
- See Also:
setScanAuthorizations(Class, Configuration, Authorizations)
-
setRanges
public static void setRanges(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, Collection<Range> ranges)
Sets the input ranges to scan on all input tables for this job. If not set, the entire table will be scanned.- Parameters:
implementingClass
- the class whose name will be used as a prefix for the property configuration keyconf
- the Hadoop configuration object to configureranges
- the ranges that will be mapped over- Throws:
IllegalArgumentException
- if the ranges cannot be encoded into base 64- Since:
- 1.6.0
-
getRanges
public static List<Range> getRanges(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf) throws IOException
Gets the ranges to scan over from a job.- Parameters:
implementingClass
- the class whose name will be used as a prefix for the property configuration keyconf
- the Hadoop configuration object to configure- Returns:
- the ranges
- Throws:
IOException
- if the ranges have been encoded improperly- Since:
- 1.6.0
- See Also:
setRanges(Class, Configuration, Collection)
-
getIterators
public static List<IteratorSetting> getIterators(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
Gets a list of the iterator settings (for iterators to apply to a scanner) from this configuration.- Parameters:
implementingClass
- the class whose name will be used as a prefix for the property configuration keyconf
- the Hadoop configuration object to configure- Returns:
- a list of iterators
- Since:
- 1.6.0
- See Also:
addIterator(Class, Configuration, IteratorSetting)
-
fetchColumns
public static void fetchColumns(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, Collection<Pair<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>> columnFamilyColumnQualifierPairs)
Restricts the columns that will be mapped over for the single input table on this job.- Parameters:
implementingClass
- the class whose name will be used as a prefix for the property configuration keyconf
- the Hadoop configuration object to configurecolumnFamilyColumnQualifierPairs
- a pair ofText
objects corresponding to column family and column qualifier. If the column qualifier is null, the entire column family is selected. An empty set is the default and is equivalent to scanning the all columns.- Throws:
IllegalArgumentException
- if the column family is null- Since:
- 1.6.0
-
serializeColumns
public static String[] serializeColumns(Collection<Pair<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>> columnFamilyColumnQualifierPairs)
-
getFetchedColumns
public static Set<Pair<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>> getFetchedColumns(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
Gets the columns to be mapped over from this job.- Parameters:
implementingClass
- the class whose name will be used as a prefix for the property configuration keyconf
- the Hadoop configuration object to configure- Returns:
- a set of columns
- Since:
- 1.6.0
- See Also:
fetchColumns(Class, Configuration, Collection)
-
deserializeFetchedColumns
public static Set<Pair<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>> deserializeFetchedColumns(Collection<String> serialized)
-
addIterator
public static void addIterator(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, IteratorSetting cfg)
Encode an iterator on the input for the single input table associated with this job.- Parameters:
implementingClass
- the class whose name will be used as a prefix for the property configuration keyconf
- the Hadoop configuration object to configurecfg
- the configuration of the iterator- Throws:
IllegalArgumentException
- if the iterator can't be serialized into the configuration- Since:
- 1.6.0
-
setAutoAdjustRanges
public static void setAutoAdjustRanges(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, boolean enableFeature)
Controls the automatic adjustment of ranges for this job. This feature merges overlapping ranges, then splits them to align with tablet boundaries. Disabling this feature will cause exactly one Map task to be created for each specified range. The default setting is enabled. *By default, this feature is enabled.
- Parameters:
implementingClass
- the class whose name will be used as a prefix for the property configuration keyconf
- the Hadoop configuration object to configureenableFeature
- the feature is enabled if true, disabled otherwise- Since:
- 1.6.0
- See Also:
setRanges(Class, Configuration, Collection)
-
getAutoAdjustRanges
public static Boolean getAutoAdjustRanges(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
Determines whether a configuration has auto-adjust ranges enabled.- Parameters:
implementingClass
- the class whose name will be used as a prefix for the property configuration keyconf
- the Hadoop configuration object to configure- Returns:
- false if the feature is disabled, true otherwise
- Since:
- 1.6.0
- See Also:
setAutoAdjustRanges(Class, Configuration, boolean)
-
setScanIsolation
public static void setScanIsolation(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, boolean enableFeature)
Controls the use of theIsolatedScanner
in this job.By default, this feature is disabled.
- Parameters:
implementingClass
- the class whose name will be used as a prefix for the property configuration keyconf
- the Hadoop configuration object to configureenableFeature
- the feature is enabled if true, disabled otherwise- Since:
- 1.6.0
-
isIsolated
public static Boolean isIsolated(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
Determines whether a configuration has isolation enabled.- Parameters:
implementingClass
- the class whose name will be used as a prefix for the property configuration keyconf
- the Hadoop configuration object to configure- Returns:
- true if the feature is enabled, false otherwise
- Since:
- 1.6.0
- See Also:
setScanIsolation(Class, Configuration, boolean)
-
setLocalIterators
public static void setLocalIterators(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, boolean enableFeature)
Controls the use of theClientSideIteratorScanner
in this job. Enabling this feature will cause the iterator stack to be constructed within the Map task, rather than within the Accumulo TServer. To use this feature, all classes needed for those iterators must be available on the classpath for the task.By default, this feature is disabled.
- Parameters:
implementingClass
- the class whose name will be used as a prefix for the property configuration keyconf
- the Hadoop configuration object to configureenableFeature
- the feature is enabled if true, disabled otherwise- Since:
- 1.6.0
-
usesLocalIterators
public static Boolean usesLocalIterators(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
Determines whether a configuration uses local iterators.- Parameters:
implementingClass
- the class whose name will be used as a prefix for the property configuration keyconf
- the Hadoop configuration object to configure- Returns:
- true if the feature is enabled, false otherwise
- Since:
- 1.6.0
- See Also:
setLocalIterators(Class, Configuration, boolean)
-
setOfflineTableScan
public static void setOfflineTableScan(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, boolean enableFeature)
Enable reading offline tables. By default, this feature is disabled and only online tables are scanned. This will make the map reduce job directly read the table's files. If the table is not offline, then the job will fail. If the table comes online during the map reduce job, it is likely that the job will fail.To use this option, the map reduce user will need access to read the Accumulo directory in HDFS.
Reading the offline table will create the scan time iterator stack in the map process. So any iterators that are configured for the table will need to be on the mapper's classpath.
One way to use this feature is to clone a table, take the clone offline, and use the clone as the input table for a map reduce job. If you plan to map reduce over the data many times, it may be better to the compact the table, clone it, take it offline, and use the clone for all map reduce jobs. The reason to do this is that compaction will reduce each tablet in the table to one file, and it is faster to read from one file.
There are two possible advantages to reading a tables file directly out of HDFS. First, you may see better read performance. Second, it will support speculative execution better. When reading an online table speculative execution can put more load on an already slow tablet server.
By default, this feature is disabled.
- Parameters:
implementingClass
- the class whose name will be used as a prefix for the property configuration keyconf
- the Hadoop configuration object to configureenableFeature
- the feature is enabled if true, disabled otherwise- Since:
- 1.6.0
-
isOfflineScan
public static Boolean isOfflineScan(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
Determines whether a configuration has the offline table scan feature enabled.- Parameters:
implementingClass
- the class whose name will be used as a prefix for the property configuration keyconf
- the Hadoop configuration object to configure- Returns:
- true if the feature is enabled, false otherwise
- Since:
- 1.6.0
- See Also:
setOfflineTableScan(Class, Configuration, boolean)
-
setBatchScan
public static void setBatchScan(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, boolean enableFeature)
Controls the use of theBatchScanner
in this job. Using this feature will group ranges by their source tablet per InputSplit and use BatchScanner to read them.By default, this feature is disabled.
- Parameters:
implementingClass
- the class whose name will be used as a prefix for the property configuration keyconf
- the Hadoop configuration object to configureenableFeature
- the feature is enabled if true, disabled otherwise- Since:
- 1.7.0
-
isBatchScan
public static Boolean isBatchScan(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
Determines whether a configuration has the BatchScanner feature enabled.- Parameters:
implementingClass
- the class whose name will be used as a prefix for the property configuration keyconf
- the Hadoop configuration object to configure- Returns:
- true if the feature is enabled, false otherwise
- Since:
- 1.7.0
- See Also:
setBatchScan(Class, Configuration, boolean)
-
setInputTableConfigs
public static void setInputTableConfigs(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, Map<String,InputTableConfig> configs)
Sets configurations for multiple tables at a time.- Parameters:
implementingClass
- the class whose name will be used as a prefix for the property configuration keyconf
- the Hadoop configuration object to configureconfigs
- an array ofInputTableConfig
objects to associate with the job- Since:
- 1.6.0
-
getInputTableConfigs
public static Map<String,InputTableConfig> getInputTableConfigs(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
Returns allInputTableConfig
objects associated with this job.- Parameters:
implementingClass
- the class whose name will be used as a prefix for the property configuration keyconf
- the Hadoop configuration object to configure- Returns:
- all of the table query configs for the job
- Since:
- 1.6.0
-
getInputTableConfig
public static InputTableConfig getInputTableConfig(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, String tableName)
Returns theInputTableConfig
for the given table- Parameters:
implementingClass
- the class whose name will be used as a prefix for the property configuration keyconf
- the Hadoop configuration object to configuretableName
- the table name for which to fetch the table query config- Returns:
- the table query config for the given table name (if it exists) and null if it does not
- Since:
- 1.6.0
-
getTabletLocator
public static TabletLocator getTabletLocator(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, String tableId) throws TableNotFoundException
Initializes an AccumuloTabletLocator
based on the configuration.- Parameters:
implementingClass
- the class whose name will be used as a prefix for the property configuration keyconf
- the Hadoop configuration object to configuretableId
- The table id for which to initialize theTabletLocator
- Returns:
- an Accumulo tablet locator
- Throws:
TableNotFoundException
- if the table name set on the configuration doesn't exist- Since:
- 1.6.0
-
validateInstance
public static Instance validateInstance(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf) throws IOException
Validates and extracts anInstance
from the configuration- Parameters:
implementingClass
- the class whose name will be used as a prefix for the property configuration keyconf
- the Hadoop configuration object to configure- Throws:
IOException
- Since:
- 1.7.0
-
validatePermissions
public static void validatePermissions(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, Connector conn) throws IOException
Validates that the user has permissions on the requested tables- Parameters:
implementingClass
- the class whose name will be used as a prefix for the property configuration keyconf
- the Hadoop configuration object to configureconn
- the Connector- Throws:
IOException
- Since:
- 1.7.0
-
validateOptions
@Deprecated public static void validateOptions(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf) throws IOException
Deprecated.Check whether a configuration is fully configured to be used with an AccumuloInputFormat
.The implementation (JobContext or JobConf which created the Configuration) needs to be used to extract the proper
AuthenticationToken
forDelegationTokenImpl
support.- Parameters:
implementingClass
- the class whose name will be used as a prefix for the property configuration keyconf
- the Hadoop configuration object to configure- Throws:
IOException
- if the context is improperly configured- Since:
- 1.6.0
- See Also:
validateInstance(Class, Configuration)
,validatePermissions(Class, Configuration, Connector)
-
getDefaultInputTableConfig
protected static Map.Entry<String,InputTableConfig> getDefaultInputTableConfig(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, String tableName)
Returns theInputTableConfig
for the configuration based on the properties set using the single-table input methods.- Parameters:
implementingClass
- the class whose name will be used as a prefix for the property configuration keyconf
- the Hadoop instance for which to retrieve the configurationtableName
- the table name for which to retrieve the configuration- Returns:
- the config object built from the single input table properties set on the job
- Since:
- 1.6.0
-
binOffline
public static Map<String,Map<KeyExtent,List<Range>>> binOffline(String tableId, List<Range> ranges, Instance instance, Connector conn) throws AccumuloException, TableNotFoundException
-
setSamplerConfiguration
public static void setSamplerConfiguration(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf, SamplerConfiguration samplerConfig)
-
getSamplerConfiguration
public static SamplerConfiguration getSamplerConfiguration(Class<?> implementingClass, org.apache.hadoop.conf.Configuration conf)
-
-