net.sansa_stack.spark.io.rdf.output.RddRdfWriter<T>

Type Parameters:: T -

public class RddRdfWriter<T> extends RddRdfWriterSettings<RddRdfWriter<T>>

Important: Instances of this class should only be created using RddRdfWriterFactory because the factory is RDD-independent and can validate settings at an early stage.

This class implements a fluent API for configuration of how to save an RDD of RDF data to disk. This class uniformly handles Triples, Quads, Model, Datasets, etc using a set of lambdas for relevant conversion. Instances of this class should be created using the appropriate createFor[Type] methods.

Field Summary

Fields

Modifier and Type

Field

Description

protected RddRdfOpsImpl<T>

dispatcher

References the lambdas in RddRdfOpsImpl directly (saves one entry in the call stack per record)

protected org.apache.hadoop.conf.Configuration

hadoopConfiguration

protected org.apache.spark.api.java.JavaRDD<? extends T>

rdd

protected org.apache.spark.api.java.JavaSparkContext

sparkContext

Fields inherited from class net.sansa_stack.spark.io.rdf.output.RddRdfWriterSettings
deferOutputForUsedPrefixes, globalPrefixMapping, mapQuadsToTriplesForTripleLangs, outputFormat

Fields inherited from class net.sansa_stack.spark.io.rdf.output.RddWriterSettings
allowOverwriteFiles, consoleOutSupplier, deletePartitionFolderAfterMerge, partitionFolder, partitionFolderFs, partitionsAsIndependentFiles, postProcessingSettings, targetFile, targetFileFs, useCoalesceOne
Constructor Summary

Constructors

Constructor

Description

RddRdfWriter(RddRdfOpsImpl<T> dispatcher)
Method Summary

Modifier and Type

Method

Description

static RddRdfWriter<org.aksw.jenax.arq.dataset.api.DatasetOneNg>

createForDataset()

static RddRdfWriter<org.aksw.jenax.arq.dataset.api.DatasetGraphOneNg>

createForDatasetGraph()

static RddRdfWriter<org.apache.jena.graph.Graph>

createForGraph()

static RddRdfWriter<org.apache.jena.rdf.model.Model>

createForModel()

static RddRdfWriter<org.apache.jena.sparql.core.Quad>

createForQuad()

static RddRdfWriter<org.apache.jena.graph.Triple>

createForTriple()

static Function<OutputStream,org.apache.jena.riot.system.StreamRDF>

createStreamRDFFactory(org.apache.jena.riot.RDFFormat rdfFormat, boolean mapQuadsToTriplesForTripleLangs, org.apache.jena.shared.PrefixMapping prefixMapping)

Create a function that can create a StreamRDF instance that is backed by the given OutputStream.

org.apache.spark.api.java.JavaRDD<T>

getEffectiveRdd(RdfPostProcessingSettings settings)

Create the effective RDD w.r.t.

org.apache.spark.api.java.JavaRDD<? extends T>

getRdd()

static Iterator<String>

partitionMapperNQuads(Iterator<org.apache.jena.sparql.core.Quad> it)

static Iterator<String>

partitionMapperNTriples(Iterator<org.apache.jena.graph.Triple> it)

static <T> org.aksw.commons.lambda.throwing.ThrowingFunction<Iterator<T>,Iterator<String>>

partitionMapperRDFStream(Function<OutputStream,org.apache.jena.riot.system.StreamRDF> streamRDFFactory, BiConsumer<? super T,org.apache.jena.riot.system.StreamRDF> sendRecordToWriter)

void

run()

void

runActual(RddWriterSettings<?> cxt)

protected void

runOutputToConsole()

void

runSpark()

Run the save action according to configuration

void

runUnchecked()

Same as run() but without the checked IOException

static <T> void

saveToFolder(org.apache.spark.api.java.JavaRDD<T> javaRdd, String path, org.apache.jena.riot.RDFFormat rdfFormat, boolean mapQuadsToTriplesForTripleLangs, org.apache.jena.shared.PrefixMapping globalPrefixMapping, BiConsumer<T,org.apache.jena.riot.system.StreamRDF> sendRecordToStreamRDF)

Deprecated.

static <T> void

saveUsingElephas(org.apache.spark.api.java.JavaRDD<T> rdd, org.apache.hadoop.fs.Path path, org.apache.jena.riot.Lang lang, org.aksw.commons.lambda.serializable.SerializableFunction<? super T,?> recordToWritable)

static <T> void

sendToStreamRDF(org.apache.spark.api.java.JavaRDD<T> javaRdd, org.aksw.commons.lambda.serializable.SerializableBiConsumer<T,org.apache.jena.riot.system.StreamRDF> sendRecordToStreamRDF, org.aksw.commons.lambda.serializable.SerializableSupplier<org.apache.jena.riot.system.StreamRDF> streamRdfSupplier)

RddRdfWriter<T>

setRdd(org.apache.spark.api.java.JavaRDD<? extends T> rdd)

static String

toString(org.apache.jena.shared.PrefixMapping prefixMapping, org.apache.jena.riot.RDFFormat rdfFormat)

Convert a prefix mapping to a string

static void

validate(RddRdfWriterSettings<?> settings)

Methods inherited from class net.sansa_stack.spark.io.rdf.output.RddRdfWriterSettings
configureFrom, getFallbackOutputFormat, getGlobalPrefixMapping, getOutputFormat, isMapQuadsToTriplesForTripleLangs, isPartitionsAsIndependentFiles, mutate, self, setDeferOutputForUsedPrefixes, setGlobalPrefixMapping, setGlobalPrefixMapping, setMapQuadsToTriplesForTripleLangs, setOutputFormat, setOutputFormat, setPartitionsAsIndependentFiles

Methods inherited from class net.sansa_stack.spark.io.rdf.output.RddWriterSettings
configureFrom, getConsoleOutSupplier, getHadoopConfiguration, getPartitionFolder, getPartitionFolderFs, getPostProcessingSettings, getTargetFile, getTargetFileFs, isAllowOverwriteFiles, isConsoleOutput, isDeletePartitionFolderAfterMerge, isUseCoalesceOne, setAllowOverwriteFiles, setConsoleOutput, setConsoleOutSupplier, setDeletePartitionFolderAfterMerge, setHadoopConfiguration, setPartitionFolder, setPartitionFolder, setPartitionFolderFs, setPostProcessingSettings, setTargetFile, setTargetFile, setTargetFileFs, setUseCoalesceOne

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- dispatcher
  
  protected RddRdfOpsImpl<T> dispatcher
  
  References the lambdas in RddRdfOpsImpl directly (saves one entry in the call stack per record)
- sparkContext
  
  protected org.apache.spark.api.java.JavaSparkContext sparkContext
- rdd
  
  protected org.apache.spark.api.java.JavaRDD<? extends T> rdd
- hadoopConfiguration
  
  protected org.apache.hadoop.conf.Configuration hadoopConfiguration
Constructor Details
- RddRdfWriter
  
  public RddRdfWriter(RddRdfOpsImpl<T> dispatcher)
Method Details
- setRdd
  
  public RddRdfWriter<T> setRdd(org.apache.spark.api.java.JavaRDD<? extends T> rdd)
- getRdd
  
  public org.apache.spark.api.java.JavaRDD<? extends T> getRdd()
- runUnchecked
  
  public void runUnchecked()
  
  Same as run() but without the checked IOException
- run
  
  public void run() throws IOException
  
  Throws:
  
  IOException
- getEffectiveRdd
  
  public org.apache.spark.api.java.JavaRDD<T> getEffectiveRdd(RdfPostProcessingSettings settings)
  
  Create the effective RDD w.r.t. configuration (sort, unqiue, optimize prefixes) If optimize prefixes is enabled then invoking this method will immediately perform that analysis The current behavior is that this writer's prefix map will be updated to the used prefixes. However, this is subject to change such that a new writer instance with the used prefixes is created.
- runOutputToConsole
  
  protected void runOutputToConsole() throws IOException
  
  Throws:
  
  IOException
- runActual
  
  public void runActual(RddWriterSettings<?> cxt) throws IOException
  
  Throws:
  
  IOException
- runSpark
  
  public void runSpark() throws IOException
  
  Run the save action according to configuration
  
  Throws:
  
  IOException
- toString
  
  public static String toString(org.apache.jena.shared.PrefixMapping prefixMapping, org.apache.jena.riot.RDFFormat rdfFormat)
  
  Convert a prefix mapping to a string
- partitionMapperNTriples
  
  public static Iterator<String> partitionMapperNTriples(Iterator<org.apache.jena.graph.Triple> it)
- partitionMapperNQuads
  
  public static Iterator<String> partitionMapperNQuads(Iterator<org.apache.jena.sparql.core.Quad> it)
- createStreamRDFFactory
  
  public static Function<OutputStream,org.apache.jena.riot.system.StreamRDF> createStreamRDFFactory(org.apache.jena.riot.RDFFormat rdfFormat, boolean mapQuadsToTriplesForTripleLangs, org.apache.jena.shared.PrefixMapping prefixMapping)
  
  Create a function that can create a StreamRDF instance that is backed by the given OutputStream.
  
  Parameters:
  
  rdfFormat -
  
  prefixMapping -
  
  Returns:
- partitionMapperRDFStream
  
  public static <T> org.aksw.commons.lambda.throwing.ThrowingFunction<Iterator<T>,Iterator<String>> partitionMapperRDFStream(Function<OutputStream,org.apache.jena.riot.system.StreamRDF> streamRDFFactory, BiConsumer<? super T,org.apache.jena.riot.system.StreamRDF> sendRecordToWriter)
- saveToFolder
  
  @Deprecated public static <T> void saveToFolder(org.apache.spark.api.java.JavaRDD<T> javaRdd, String path, org.apache.jena.riot.RDFFormat rdfFormat, boolean mapQuadsToTriplesForTripleLangs, org.apache.jena.shared.PrefixMapping globalPrefixMapping, BiConsumer<T,org.apache.jena.riot.system.StreamRDF> sendRecordToStreamRDF) throws IOException
  
  Deprecated.
  
  Save the data in Trig/Turtle or its sub-formats (n-quads/n-triples) format. If prefixes should be written out then they have to provided as an argument to the prefixMapping parameter. Prefix mappings are broadcasted to and processed in a .mapPartition operation. If the prefixMapping is non-empty then the first part file written out contains them. No other partition will write out prefixes.
  
  Parameters:
  
  path - the folder into which the file(s) will be written to mode the expected behavior of saving the data to a data source
  
  Throws:
  
  IOException
- saveUsingElephas
  
  public static <T> void saveUsingElephas(org.apache.spark.api.java.JavaRDD<T> rdd, org.apache.hadoop.fs.Path path, org.apache.jena.riot.Lang lang, org.aksw.commons.lambda.serializable.SerializableFunction<? super T,?> recordToWritable)
- createForTriple
  
  public static RddRdfWriter<org.apache.jena.graph.Triple> createForTriple()
- createForQuad
  
  public static RddRdfWriter<org.apache.jena.sparql.core.Quad> createForQuad()
- createForGraph
  
  public static RddRdfWriter<org.apache.jena.graph.Graph> createForGraph()
- createForDatasetGraph
  
  public static RddRdfWriter<org.aksw.jenax.arq.dataset.api.DatasetGraphOneNg> createForDatasetGraph()
- createForModel
  
  public static RddRdfWriter<org.apache.jena.rdf.model.Model> createForModel()
- createForDataset
  
  public static RddRdfWriter<org.aksw.jenax.arq.dataset.api.DatasetOneNg> createForDataset()
- validate
  
  public static void validate(RddRdfWriterSettings<?> settings)
- sendToStreamRDF
  
  public static <T> void sendToStreamRDF(org.apache.spark.api.java.JavaRDD<T> javaRdd, org.aksw.commons.lambda.serializable.SerializableBiConsumer<T,org.apache.jena.riot.system.StreamRDF> sendRecordToStreamRDF, org.aksw.commons.lambda.serializable.SerializableSupplier<org.apache.jena.riot.system.StreamRDF> streamRdfSupplier)

Class RddRdfWriter<T>

Field Summary

Fields inherited from class net.sansa_stack.spark.io.rdf.output.RddRdfWriterSettings

Fields inherited from class net.sansa_stack.spark.io.rdf.output.RddWriterSettings

Constructor Summary

Method Summary

Methods inherited from class net.sansa_stack.spark.io.rdf.output.RddRdfWriterSettings

Methods inherited from class net.sansa_stack.spark.io.rdf.output.RddWriterSettings

Methods inherited from class java.lang.Object

Field Details

dispatcher

sparkContext

rdd

hadoopConfiguration

Constructor Details

RddRdfWriter

Method Details

setRdd

getRdd

runUnchecked

run

getEffectiveRdd

runOutputToConsole

runActual

runSpark

toString

partitionMapperNTriples

partitionMapperNQuads

createStreamRDFFactory

partitionMapperRDFStream

saveToFolder

saveUsingElephas

createForTriple

createForQuad

createForGraph

createForDatasetGraph

createForModel

createForDataset

validate

sendToStreamRDF