Package com.yahoo.vespa.hadoop.mapreduce
Class VespaSimpleJsonInputFormat
java.lang.Object
org.apache.hadoop.mapreduce.InputFormat<K,V>
org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.NullWritable>
com.yahoo.vespa.hadoop.mapreduce.VespaSimpleJsonInputFormat
public class VespaSimpleJsonInputFormat
extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.NullWritable>
Simple JSON reader which splits the input file along JSON object boundaries.
There are two cases handled here:
1. Each line contains a JSON object, i.e. { ... }
2. The file contains an array of objects with arbitrary line breaks, i.e. [ {...}, {...} ]
Not suitable for cases where you want to extract objects from some other arbitrary structure.
TODO: Support config which points to a array in the JSON as start point for object extraction,
ala how it is done in VespaHttpClient.parseResultJson, i.e. support rootNode config.
- Author:
- lesters
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.Counter
-
Field Summary
Fields inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
DEFAULT_LIST_STATUS_NUM_THREADS, INPUT_DIR, INPUT_DIR_NONRECURSIVE_IGNORE_SUBDIRS, INPUT_DIR_RECURSIVE, LIST_STATUS_NUM_THREADS, NUM_INPUT_FILES, PATHFILTER_CLASS, SPLIT_MAXSIZE, SPLIT_MINSIZE
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionorg.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.Text,
org.apache.hadoop.io.NullWritable> createRecordReader
(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context) Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputDirRecursive, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, getSplits, isSplitable, listStatus, makeSplit, makeSplit, setInputDirRecursive, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize
-
Constructor Details
-
VespaSimpleJsonInputFormat
public VespaSimpleJsonInputFormat()
-
-
Method Details
-
createRecordReader
public org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.io.NullWritable> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context) throws IOException, InterruptedException - Specified by:
createRecordReader
in classorg.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.Text,
org.apache.hadoop.io.NullWritable> - Throws:
IOException
InterruptedException
-