Class HadoopInputData<K,V,X>

java.lang.Object
net.sansa_stack.spark.io.rdf.input.api.HadoopInputData<K,V,X>

public class HadoopInputData<K,V,X> extends Object
A class to capture the arguments of JavaSparkContext.newAPIHadoopFile(String, Class, Class, Class, Configuration). Furthermore, captures a mapping of the initial JavaPairRDD to a custom target type (typically another JavaRDD). To create RDDs from objects of this class use InputFormatUtils.createRdd(JavaSparkContext, HadoopInputData). HadoopInputData based on the RecordReaderGenericBase can be wrapped using InputFormatUtils.wrapWithAnalyzer(HadoopInputData) which returns another HadoopInputData object. RDDs created from the wrapper compute information about the detected records, such as the byte offsets and the time it took to parse them.
  • Field Details

    • path

      protected String path
    • inputFormatClass

      protected Class<? extends org.apache.hadoop.mapreduce.InputFormat<K,V>> inputFormatClass
    • keyClass

      protected Class<K> keyClass
    • valueClass

      protected Class<V> valueClass
    • configuration

      protected org.apache.hadoop.conf.Configuration configuration
    • mapper

      protected Function<org.apache.spark.api.java.JavaPairRDD<K,V>,X> mapper
  • Constructor Details

    • HadoopInputData

      public HadoopInputData(String path, Class<? extends org.apache.hadoop.mapreduce.InputFormat<K,V>> inputFormatClass, Class<K> keyClass, Class<V> valueClass, org.apache.hadoop.conf.Configuration configuration, Function<org.apache.spark.api.java.JavaPairRDD<K,V>,X> mapper)
  • Method Details

    • getPath

      public String getPath()
    • getKeyClass

      public Class<K> getKeyClass()
    • getValueClass

      public Class<V> getValueClass()
    • getInputFormatClass

      public Class<? extends org.apache.hadoop.mapreduce.InputFormat<K,V>> getInputFormatClass()
    • getConfiguration

      public org.apache.hadoop.conf.Configuration getConfiguration()
    • getMapper

      public Function<org.apache.spark.api.java.JavaPairRDD<K,V>,X> getMapper()
    • map

      public <Y> HadoopInputData<K,V,Y> map(Function<? super X,Y> nextMapper)
      Return a fresh HadoopInputData instance where "nextMapper" is applied to the result of the current mapper