Class InputFormatBuilderImpl<T>
- java.lang.Object
-
- org.apache.accumulo.hadoopImpl.mapreduce.InputFormatBuilderImpl<T>
-
- All Implemented Interfaces:
InputFormatBuilder
,InputFormatBuilder.ClientParams<T>
,InputFormatBuilder.InputFormatOptions<T>
,InputFormatBuilder.TableParams<T>
public class InputFormatBuilderImpl<T> extends Object implements InputFormatBuilder, InputFormatBuilder.ClientParams<T>, InputFormatBuilder.TableParams<T>, InputFormatBuilder.InputFormatOptions<T>
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface org.apache.accumulo.hadoop.mapreduce.InputFormatBuilder
InputFormatBuilder.ClientParams<T>, InputFormatBuilder.InputFormatOptions<T>, InputFormatBuilder.TableParams<T>
-
-
Constructor Summary
Constructors Constructor Description InputFormatBuilderImpl(Class<?> callingClass)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description InputFormatBuilder.InputFormatOptions<T>
addIterator(IteratorSetting cfg)
Encode an iterator on the single input table for this job.InputFormatBuilder.InputFormatOptions<T>
auths(Authorizations auths)
Sets theAuthorizations
used to scan.InputFormatBuilder.InputFormatOptions<T>
autoAdjustRanges(boolean value)
Disables the automatic adjustment of ranges for this job.InputFormatBuilder.InputFormatOptions<T>
batchScan(boolean value)
Enables the use of theBatchScanner
in this job.InputFormatBuilder.InputFormatOptions<T>
classLoaderContext(String context)
Sets the name of the classloader context on this scannerInputFormatBuilder.TableParams<T>
clientProperties(Properties clientProperties)
Set client properties needed to communicate with Accumulo for this job.InputFormatBuilder.TableParams<T>
clientPropertiesPath(String clientPropsPath)
Set path to DFS location containing accumulo-client.properties file.InputFormatBuilder.InputFormatOptions<T>
consistencyLevel(ScannerBase.ConsistencyLevel level)
Enables the user to set the consistency levelInputFormatBuilder.InputFormatOptions<T>
executionHints(Map<String,String> hints)
Set these execution hints on scanners created for input splits.InputFormatBuilder.InputFormatOptions<T>
fetchColumns(Collection<IteratorSetting.Column> fetchColumns)
Restricts the columns that will be mapped over for this job for the default input table.InputFormatBuilder.InputFormatOptions<T>
localIterators(boolean value)
Enables the use of theClientSideIteratorScanner
in this job.InputFormatBuilder.InputFormatOptions<T>
offlineScan(boolean value)
Enable reading offline tables.InputFormatBuilder.InputFormatOptions<T>
ranges(Collection<Range> ranges)
Sets the input ranges to scan for the single input table associated with this job.InputFormatBuilder.InputFormatOptions<T>
samplerConfiguration(SamplerConfiguration samplerConfig)
Causes input format to read sample data.InputFormatBuilder.InputFormatOptions<T>
scanIsolation(boolean value)
Enables the use of theIsolatedScanner
in this job.void
store(T j)
Finish configuring, verify and serialize options into the JobConf or JobInputFormatBuilder.InputFormatOptions<T>
table(String tableName)
Sets the name of the input table, over which this job will scan.
-
-
-
Constructor Detail
-
InputFormatBuilderImpl
public InputFormatBuilderImpl(Class<?> callingClass)
-
-
Method Detail
-
clientProperties
public InputFormatBuilder.TableParams<T> clientProperties(Properties clientProperties)
Description copied from interface:InputFormatBuilder.ClientParams
Set client properties needed to communicate with Accumulo for this job. This information will be serialized into the configuration. Therefore, it is more secure to useInputFormatBuilder.ClientParams.clientPropertiesPath(String)
. Client properties can be created usingAccumulo.newClientProperties()
- Specified by:
clientProperties
in interfaceInputFormatBuilder.ClientParams<T>
- Parameters:
clientProperties
- Accumulo connection information
-
clientPropertiesPath
public InputFormatBuilder.TableParams<T> clientPropertiesPath(String clientPropsPath)
Description copied from interface:InputFormatBuilder.ClientParams
Set path to DFS location containing accumulo-client.properties file. This setting is more secure thanInputFormatBuilder.ClientParams.clientProperties(Properties)
- Specified by:
clientPropertiesPath
in interfaceInputFormatBuilder.ClientParams<T>
- Parameters:
clientPropsPath
- DFS path to accumulo-client.properties
-
table
public InputFormatBuilder.InputFormatOptions<T> table(String tableName)
Description copied from interface:InputFormatBuilder.TableParams
Sets the name of the input table, over which this job will scan. At least one table is required before calling store(Job)- Specified by:
table
in interfaceInputFormatBuilder.TableParams<T>
- Parameters:
tableName
- the table to use when the tablename is null in the write call
-
auths
public InputFormatBuilder.InputFormatOptions<T> auths(Authorizations auths)
Description copied from interface:InputFormatBuilder.InputFormatOptions
Sets theAuthorizations
used to scan. Must be a subset of the user's authorizations. By Default, all of the users auths are set.- Specified by:
auths
in interfaceInputFormatBuilder.InputFormatOptions<T>
- Parameters:
auths
- the user's authorizations
-
classLoaderContext
public InputFormatBuilder.InputFormatOptions<T> classLoaderContext(String context)
Description copied from interface:InputFormatBuilder.InputFormatOptions
Sets the name of the classloader context on this scanner- Specified by:
classLoaderContext
in interfaceInputFormatBuilder.InputFormatOptions<T>
- Parameters:
context
- name of the classloader context
-
ranges
public InputFormatBuilder.InputFormatOptions<T> ranges(Collection<Range> ranges)
Description copied from interface:InputFormatBuilder.InputFormatOptions
Sets the input ranges to scan for the single input table associated with this job.- Specified by:
ranges
in interfaceInputFormatBuilder.InputFormatOptions<T>
- Parameters:
ranges
- the ranges that will be mapped over- See Also:
TableOperations.splitRangeByTablets(String, Range, int)
-
fetchColumns
public InputFormatBuilder.InputFormatOptions<T> fetchColumns(Collection<IteratorSetting.Column> fetchColumns)
Description copied from interface:InputFormatBuilder.InputFormatOptions
Restricts the columns that will be mapped over for this job for the default input table.- Specified by:
fetchColumns
in interfaceInputFormatBuilder.InputFormatOptions<T>
- Parameters:
fetchColumns
- a collection of IteratorSetting.Column objects corresponding to column family and column qualifier. If the column qualifier is null, the entire column family is selected. An empty set is the default and is equivalent to scanning all columns.
-
addIterator
public InputFormatBuilder.InputFormatOptions<T> addIterator(IteratorSetting cfg)
Description copied from interface:InputFormatBuilder.InputFormatOptions
Encode an iterator on the single input table for this job. It is safe to call this method multiple times. If an iterator is added with the same name, it will be overridden.- Specified by:
addIterator
in interfaceInputFormatBuilder.InputFormatOptions<T>
- Parameters:
cfg
- the configuration of the iterator
-
executionHints
public InputFormatBuilder.InputFormatOptions<T> executionHints(Map<String,String> hints)
Description copied from interface:InputFormatBuilder.InputFormatOptions
Set these execution hints on scanners created for input splits. SeeScannerBase.setExecutionHints(java.util.Map)
- Specified by:
executionHints
in interfaceInputFormatBuilder.InputFormatOptions<T>
-
samplerConfiguration
public InputFormatBuilder.InputFormatOptions<T> samplerConfiguration(SamplerConfiguration samplerConfig)
Description copied from interface:InputFormatBuilder.InputFormatOptions
Causes input format to read sample data. If sample data was created using a different configuration or a tables sampler configuration changes while reading data, then the input format will throw an error.- Specified by:
samplerConfiguration
in interfaceInputFormatBuilder.InputFormatOptions<T>
- Parameters:
samplerConfig
- The sampler configuration that sample must have been created with inorder for reading sample data to succeed.- See Also:
ScannerBase.setSamplerConfiguration(SamplerConfiguration)
-
autoAdjustRanges
public InputFormatBuilder.InputFormatOptions<T> autoAdjustRanges(boolean value)
Description copied from interface:InputFormatBuilder.InputFormatOptions
Disables the automatic adjustment of ranges for this job. This feature merges overlapping ranges, then splits them to align with tablet boundaries. Disabling this feature will cause exactly one Map task to be created for each specified range. Disabling has no effect for batch scans at it will always automatically adjust ranges.By default, this feature is enabled.
- Specified by:
autoAdjustRanges
in interfaceInputFormatBuilder.InputFormatOptions<T>
- See Also:
InputFormatBuilder.InputFormatOptions.ranges(Collection)
-
scanIsolation
public InputFormatBuilder.InputFormatOptions<T> scanIsolation(boolean value)
Description copied from interface:InputFormatBuilder.InputFormatOptions
Enables the use of theIsolatedScanner
in this job.By default, this feature is disabled.
- Specified by:
scanIsolation
in interfaceInputFormatBuilder.InputFormatOptions<T>
-
localIterators
public InputFormatBuilder.InputFormatOptions<T> localIterators(boolean value)
Description copied from interface:InputFormatBuilder.InputFormatOptions
Enables the use of theClientSideIteratorScanner
in this job. This feature will cause the iterator stack to be constructed within the Map task, rather than within the Accumulo TServer. To use this feature, all classes needed for those iterators must be available on the classpath for the task.By default, this feature is disabled.
- Specified by:
localIterators
in interfaceInputFormatBuilder.InputFormatOptions<T>
-
offlineScan
public InputFormatBuilder.InputFormatOptions<T> offlineScan(boolean value)
Description copied from interface:InputFormatBuilder.InputFormatOptions
Enable reading offline tables. By default, this feature is disabled and only online tables are scanned. This will make the map reduce job directly read the table's files. If the table is not offline, then the job will fail. If the table comes online during the map reduce job, it is likely that the job will fail.To use this option, the map reduce user will need access to read the Accumulo directory in HDFS.
Reading the offline table will create the scan time iterator stack in the map process. So any iterators that are configured for the table will need to be on the mapper's classpath.
One way to use this feature is to clone a table, take the clone offline, and use the clone as the input table for a map reduce job. If you plan to map reduce over the data many times, it may be better to the compact the table, clone it, take it offline, and use the clone for all map reduce jobs. The reason to do this is that compaction will reduce each tablet in the table to one file, and it is faster to read from one file.
There are two possible advantages to reading a tables file directly out of HDFS. First, you may see better read performance. Second, it will support speculative execution better. When reading an online table speculative execution can put more load on an already slow tablet server.
By default, this feature is disabled.
- Specified by:
offlineScan
in interfaceInputFormatBuilder.InputFormatOptions<T>
-
batchScan
public InputFormatBuilder.InputFormatOptions<T> batchScan(boolean value)
Description copied from interface:InputFormatBuilder.InputFormatOptions
Enables the use of theBatchScanner
in this job. Using this feature will group Ranges by their source tablet, producing an InputSplit per tablet rather than per Range. This batching helps to reduce overhead when querying a large number of small ranges. (ex: when doing quad-tree decomposition for spatial queries)In order to achieve good locality of InputSplits this option always clips the input Ranges to tablet boundaries. This may result in one input Range contributing to several InputSplits.
Note: calls to
InputFormatBuilder.InputFormatOptions.autoAdjustRanges(boolean)
is ignored when BatchScan is enabled.This configuration is incompatible with:
InputFormatBuilder.InputFormatOptions.offlineScan(boolean)
InputFormatBuilder.InputFormatOptions.localIterators(boolean)
InputFormatBuilder.InputFormatOptions.scanIsolation(boolean)
By default, this feature is disabled.
- Specified by:
batchScan
in interfaceInputFormatBuilder.InputFormatOptions<T>
-
consistencyLevel
public InputFormatBuilder.InputFormatOptions<T> consistencyLevel(ScannerBase.ConsistencyLevel level)
Description copied from interface:InputFormatBuilder.InputFormatOptions
Enables the user to set the consistency level- Specified by:
consistencyLevel
in interfaceInputFormatBuilder.InputFormatOptions<T>
-
store
public void store(T j) throws AccumuloException, AccumuloSecurityException
Description copied from interface:InputFormatBuilder.TableParams
Finish configuring, verify and serialize options into the JobConf or Job- Specified by:
store
in interfaceInputFormatBuilder.TableParams<T>
- Throws:
AccumuloException
AccumuloSecurityException
-
-