Class HadoopInputData<K,V,X>
java.lang.Object
net.sansa_stack.spark.io.rdf.input.api.HadoopInputData<K,V,X>
A class to capture the arguments of
JavaSparkContext.newAPIHadoopFile(String, Class, Class, Class, Configuration)
.
Furthermore, captures a mapping of the initial JavaPairRDD to a custom target type (typically another JavaRDD).
To create RDDs from objects of this class use InputFormatUtils.createRdd(JavaSparkContext, HadoopInputData)
.
HadoopInputData based on the RecordReaderGenericBase
can be wrapped using
InputFormatUtils.wrapWithAnalyzer(HadoopInputData)
which returns another HadoopInputData object.
RDDs created from the wrapper compute information
about the detected records, such as the byte offsets and the time it took to parse them.-
Field Summary
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionorg.apache.hadoop.conf.Configuration
getPath()
<Y> HadoopInputData<K,
V, Y> Return a freshHadoopInputData
instance where "nextMapper" is applied to the result of the current mapper
-
Field Details
-
path
-
inputFormatClass
-
keyClass
-
valueClass
-
configuration
protected org.apache.hadoop.conf.Configuration configuration -
mapper
-
-
Constructor Details
-
HadoopInputData
-
-
Method Details
-
getPath
-
getKeyClass
-
getValueClass
-
getInputFormatClass
-
getConfiguration
public org.apache.hadoop.conf.Configuration getConfiguration() -
getMapper
-
map
Return a freshHadoopInputData
instance where "nextMapper" is applied to the result of the current mapper
-