InputFormatBase (Core 1.6.2 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.accumulo.core.client.mapreduce
Class InputFormatBase<K,V>

java.lang.Object
  org.apache.hadoop.mapreduce.InputFormat<K,V>
      org.apache.accumulo.core.client.mapreduce.AbstractInputFormat<K,V>
          org.apache.accumulo.core.client.mapreduce.InputFormatBase<K,V>

Direct Known Subclasses:: AccumuloInputFormat, AccumuloRowInputFormat

public abstract class InputFormatBase<K,V>
extends AbstractInputFormat<K,V>
extends AbstractInputFormat<K,V>

This abstract InputFormat class allows MapReduce jobs to use Accumulo as the source of K,V pairs.

Subclasses must implement a InputFormat.createRecordReader(InputSplit, TaskAttemptContext) to provide a RecordReader for K,V.

A static base class, RecordReaderBase, is provided to retrieve Accumulo Key/Value pairs, but one must implement its RecordReader.nextKeyValue() to transform them to the desired generic types K,V.

See AccumuloInputFormat for an example implementation.

Nested Class Summary
`static class`	`InputFormatBase.RangeInputSplit` Deprecated. since 1.5.2; Use `RangeInputSplit` instead.
`protected static class`	`InputFormatBase.RecordReaderBase<K,V>`

Nested classes/interfaces inherited from class org.apache.accumulo.core.client.mapreduce.AbstractInputFormat
`AbstractInputFormat.AbstractRecordReader<K,V>`

Field Summary

Fields inherited from class org.apache.accumulo.core.client.mapreduce.AbstractInputFormat
`CLASS, log`

Constructor Summary
`InputFormatBase()`

Method Summary
`static void`	`addIterator(org.apache.hadoop.mapreduce.Job job, IteratorSetting cfg)` Encode an iterator on the single input table for this job.
`static void`	`fetchColumns(org.apache.hadoop.mapreduce.Job job, Collection<Pair<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>> columnFamilyColumnQualifierPairs)` Restricts the columns that will be mapped over for this job for the default input table.
`protected static boolean`	`getAutoAdjustRanges(org.apache.hadoop.mapreduce.JobContext context)` Determines whether a configuration has auto-adjust ranges enabled.
`protected static Set<Pair<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>>`	`getFetchedColumns(org.apache.hadoop.mapreduce.JobContext context)` Gets the columns to be mapped over from this job.
`protected static String`	`getInputTableName(org.apache.hadoop.mapreduce.JobContext context)` Gets the table name from the configuration.
`protected static List<IteratorSetting>`	`getIterators(org.apache.hadoop.mapreduce.JobContext context)` Gets a list of the iterator settings (for iterators to apply to a scanner) from this configuration.
`protected static List<Range>`	`getRanges(org.apache.hadoop.mapreduce.JobContext context)` Gets the ranges to scan over from a job.
`protected static TabletLocator`	`getTabletLocator(org.apache.hadoop.mapreduce.JobContext context)` Deprecated. since 1.6.0
`protected static boolean`	`isIsolated(org.apache.hadoop.mapreduce.JobContext context)` Determines whether a configuration has isolation enabled.
`protected static boolean`	`isOfflineScan(org.apache.hadoop.mapreduce.JobContext context)` Determines whether a configuration has the offline table scan feature enabled.
`static void`	`setAutoAdjustRanges(org.apache.hadoop.mapreduce.Job job, boolean enableFeature)` Controls the automatic adjustment of ranges for this job.
`static void`	`setInputTableName(org.apache.hadoop.mapreduce.Job job, String tableName)` Sets the name of the input table, over which this job will scan.
`static void`	`setLocalIterators(org.apache.hadoop.mapreduce.Job job, boolean enableFeature)` Controls the use of the `ClientSideIteratorScanner` in this job.
`static void`	`setOfflineTableScan(org.apache.hadoop.mapreduce.Job job, boolean enableFeature)` Enable reading offline tables.
`static void`	`setRanges(org.apache.hadoop.mapreduce.Job job, Collection<Range> ranges)` Sets the input ranges to scan for the single input table associated with this job.
`static void`	`setScanIsolation(org.apache.hadoop.mapreduce.Job job, boolean enableFeature)` Controls the use of the `IsolatedScanner` in this job.
`protected static boolean`	`usesLocalIterators(org.apache.hadoop.mapreduce.JobContext context)` Determines whether a configuration uses local iterators.

Methods inherited from class org.apache.accumulo.core.client.mapreduce.AbstractInputFormat
`getAuthenticationToken, getInputTableConfig, getInputTableConfigs, getInstance, getLogLevel, getPrincipal, getScanAuthorizations, getSplits, getTabletLocator, getToken, getTokenClass, isConnectorInfoSet, setConnectorInfo, setConnectorInfo, setLogLevel, setMockInstance, setScanAuthorizations, setZooKeeperInstance, setZooKeeperInstance, validateOptions`

Methods inherited from class org.apache.hadoop.mapreduce.InputFormat
`createRecordReader`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

InputFormatBase

public InputFormatBase()

Method Detail

getInputTableName

protected static String getInputTableName(org.apache.hadoop.mapreduce.JobContext context)

Gets the table name from the configuration.

Parameters:: context - the Hadoop context for the configured job
Returns:: the table name
Since:: 1.5.0
See Also:: setInputTableName(Job, String)

setInputTableName

public static void setInputTableName(org.apache.hadoop.mapreduce.Job job,
                                     String tableName)

Sets the name of the input table, over which this job will scan.

Parameters:: job - the Hadoop job instance to be configured; tableName - the table to use when the tablename is null in the write call
Since:: 1.5.0

setRanges

public static void setRanges(org.apache.hadoop.mapreduce.Job job,
                             Collection<Range> ranges)

Sets the input ranges to scan for the single input table associated with this job.

Parameters:: job - the Hadoop job instance to be configured; ranges - the ranges that will be mapped over
Since:: 1.5.0

getRanges

protected static List<Range> getRanges(org.apache.hadoop.mapreduce.JobContext context)
                                throws IOException

Gets the ranges to scan over from a job.

Parameters:: context - the Hadoop context for the configured job
Returns:: the ranges
Throws:: IOException
Since:: 1.5.0
See Also:: setRanges(Job, Collection)

fetchColumns

public static void fetchColumns(org.apache.hadoop.mapreduce.Job job,
                                Collection<Pair<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>> columnFamilyColumnQualifierPairs)

Restricts the columns that will be mapped over for this job for the default input table.

Parameters:: job - the Hadoop job instance to be configured; columnFamilyColumnQualifierPairs - a pair of Text objects corresponding to column family and column qualifier. If the column qualifier is null, the entire column family is selected. An empty set is the default and is equivalent to scanning the all columns.
Since:: 1.5.0

getFetchedColumns

protected static Set<Pair<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>> getFetchedColumns(org.apache.hadoop.mapreduce.JobContext context)

Gets the columns to be mapped over from this job.

Parameters:: context - the Hadoop context for the configured job
Returns:: a set of columns
Since:: 1.5.0
See Also:: fetchColumns(Job, Collection)

addIterator

public static void addIterator(org.apache.hadoop.mapreduce.Job job,
                               IteratorSetting cfg)

Encode an iterator on the single input table for this job.

Parameters:: job - the Hadoop job instance to be configured; cfg - the configuration of the iterator
Since:: 1.5.0

getIterators

protected static List<IteratorSetting> getIterators(org.apache.hadoop.mapreduce.JobContext context)

Gets a list of the iterator settings (for iterators to apply to a scanner) from this configuration.

Parameters:: context - the Hadoop context for the configured job
Returns:: a list of iterators
Since:: 1.5.0
See Also:: addIterator(Job, IteratorSetting)

setAutoAdjustRanges

public static void setAutoAdjustRanges(org.apache.hadoop.mapreduce.Job job,
                                       boolean enableFeature)

Controls the automatic adjustment of ranges for this job. This feature merges overlapping ranges, then splits them to align with tablet boundaries. Disabling this feature will cause exactly one Map task to be created for each specified range. The default setting is enabled. *

By default, this feature is enabled.

Parameters:: job - the Hadoop job instance to be configured; enableFeature - the feature is enabled if true, disabled otherwise
Since:: 1.5.0
See Also:: setRanges(Job, Collection)

getAutoAdjustRanges

protected static boolean getAutoAdjustRanges(org.apache.hadoop.mapreduce.JobContext context)

Determines whether a configuration has auto-adjust ranges enabled.

Parameters:: context - the Hadoop context for the configured job
Returns:: false if the feature is disabled, true otherwise
Since:: 1.5.0
See Also:: setAutoAdjustRanges(Job, boolean)

setScanIsolation

public static void setScanIsolation(org.apache.hadoop.mapreduce.Job job,
                                    boolean enableFeature)

Controls the use of the IsolatedScanner in this job.

By default, this feature is disabled.

Parameters:: job - the Hadoop job instance to be configured; enableFeature - the feature is enabled if true, disabled otherwise
Since:: 1.5.0

isIsolated

protected static boolean isIsolated(org.apache.hadoop.mapreduce.JobContext context)

Determines whether a configuration has isolation enabled.

Parameters:: context - the Hadoop context for the configured job
Returns:: true if the feature is enabled, false otherwise
Since:: 1.5.0
See Also:: setScanIsolation(Job, boolean)

setLocalIterators

public static void setLocalIterators(org.apache.hadoop.mapreduce.Job job,
                                     boolean enableFeature)

Controls the use of the ClientSideIteratorScanner in this job. Enabling this feature will cause the iterator stack to be constructed within the Map task, rather than within the Accumulo TServer. To use this feature, all classes needed for those iterators must be available on the classpath for the task.

By default, this feature is disabled.

Parameters:: job - the Hadoop job instance to be configured; enableFeature - the feature is enabled if true, disabled otherwise
Since:: 1.5.0

usesLocalIterators

protected static boolean usesLocalIterators(org.apache.hadoop.mapreduce.JobContext context)

Determines whether a configuration uses local iterators.

Parameters:: context - the Hadoop context for the configured job
Returns:: true if the feature is enabled, false otherwise
Since:: 1.5.0
See Also:: setLocalIterators(Job, boolean)

setOfflineTableScan

public static void setOfflineTableScan(org.apache.hadoop.mapreduce.Job job,
                                       boolean enableFeature)

Enable reading offline tables. By default, this feature is disabled and only online tables are scanned. This will make the map reduce job directly read the table's files. If the table is not offline, then the job will fail. If the table comes online during the map reduce job, it is likely that the job will fail.

To use this option, the map reduce user will need access to read the Accumulo directory in HDFS.

Reading the offline table will create the scan time iterator stack in the map process. So any iterators that are configured for the table will need to be on the mapper's classpath.

One way to use this feature is to clone a table, take the clone offline, and use the clone as the input table for a map reduce job. If you plan to map reduce over the data many times, it may be better to the compact the table, clone it, take it offline, and use the clone for all map reduce jobs. The reason to do this is that compaction will reduce each tablet in the table to one file, and it is faster to read from one file.

There are two possible advantages to reading a tables file directly out of HDFS. First, you may see better read performance. Second, it will support speculative execution better. When reading an online table speculative execution can put more load on an already slow tablet server.

By default, this feature is disabled.

Parameters:: job - the Hadoop job instance to be configured; enableFeature - the feature is enabled if true, disabled otherwise
Since:: 1.5.0

isOfflineScan

protected static boolean isOfflineScan(org.apache.hadoop.mapreduce.JobContext context)

Determines whether a configuration has the offline table scan feature enabled.

Parameters:: context - the Hadoop context for the configured job
Returns:: true if the feature is enabled, false otherwise
Since:: 1.5.0
See Also:: setOfflineTableScan(Job, boolean)

getTabletLocator

@Deprecated
protected static TabletLocator getTabletLocator(org.apache.hadoop.mapreduce.JobContext context)
                                         throws TableNotFoundException

Deprecated. since 1.6.0

Initializes an Accumulo TabletLocator based on the configuration.

Parameters:: context - the Hadoop context for the configured job
Returns:: an Accumulo tablet locator
Throws:: TableNotFoundException - if the table name set on the configuration doesn't exist
Since:: 1.5.0

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.accumulo.core.client.mapreduce Class InputFormatBase<K,V>

InputFormatBase

getInputTableName

setInputTableName

setRanges

getRanges

fetchColumns

getFetchedColumns

addIterator

getIterators

setAutoAdjustRanges

getAutoAdjustRanges

setScanIsolation

isIsolated

setLocalIterators

usesLocalIterators

setOfflineTableScan

isOfflineScan

getTabletLocator

org.apache.accumulo.core.client.mapreduce
Class InputFormatBase<K,V>