Class RddRdfWriter<T>
java.lang.Object
net.sansa_stack.spark.io.rdf.output.RddWriterSettings<SELF>
net.sansa_stack.spark.io.rdf.output.RddRdfWriterSettings<RddRdfWriter<T>>
net.sansa_stack.spark.io.rdf.output.RddRdfWriter<T>
- Type Parameters:
T
-
Important: Instances of this class should only be created using
RddRdfWriterFactory
because the
factory is RDD-independent and can validate settings at an early stage.
This class implements a fluent API for configuration of how to save an RDD of RDF data to disk.
This class uniformly handles Triples, Quads, Model, Datasets, etc using a set of
lambdas for relevant conversion.
Instances of this class should be created using the appropriate createFor[Type] methods.-
Field Summary
Modifier and TypeFieldDescriptionprotected RddRdfOpsImpl<T>
References the lambdas in RddRdfOpsImpl directly (saves one entry in the call stack per record)protected org.apache.hadoop.conf.Configuration
protected org.apache.spark.api.java.JavaRDD<? extends T>
protected org.apache.spark.api.java.JavaSparkContext
Fields inherited from class net.sansa_stack.spark.io.rdf.output.RddRdfWriterSettings
deferOutputForUsedPrefixes, globalPrefixMapping, mapQuadsToTriplesForTripleLangs, outputFormat
Fields inherited from class net.sansa_stack.spark.io.rdf.output.RddWriterSettings
allowOverwriteFiles, consoleOutSupplier, deletePartitionFolderAfterMerge, partitionFolder, partitionFolderFs, partitionsAsIndependentFiles, postProcessingSettings, targetFile, targetFileFs, useCoalesceOne
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic RddRdfWriter<org.aksw.jenax.arq.dataset.api.DatasetOneNg>
static RddRdfWriter<org.aksw.jenax.arq.dataset.api.DatasetGraphOneNg>
static RddRdfWriter<org.apache.jena.graph.Graph>
static RddRdfWriter<org.apache.jena.rdf.model.Model>
static RddRdfWriter<org.apache.jena.sparql.core.Quad>
static RddRdfWriter<org.apache.jena.graph.Triple>
static Function<OutputStream,
org.apache.jena.riot.system.StreamRDF> createStreamRDFFactory
(org.apache.jena.riot.RDFFormat rdfFormat, boolean mapQuadsToTriplesForTripleLangs, org.apache.jena.shared.PrefixMapping prefixMapping) Create a function that can create a StreamRDF instance that is backed by the given OutputStream.org.apache.spark.api.java.JavaRDD<T>
getEffectiveRdd
(RdfPostProcessingSettings settings) Create the effective RDD w.r.t.org.apache.spark.api.java.JavaRDD<? extends T>
getRdd()
partitionMapperNQuads
(Iterator<org.apache.jena.sparql.core.Quad> it) partitionMapperNTriples
(Iterator<org.apache.jena.graph.Triple> it) partitionMapperRDFStream
(Function<OutputStream, org.apache.jena.riot.system.StreamRDF> streamRDFFactory, BiConsumer<? super T, org.apache.jena.riot.system.StreamRDF> sendRecordToWriter) void
run()
void
runActual
(RddWriterSettings<?> cxt) protected void
void
runSpark()
Run the save action according to configurationvoid
Same asrun()
but without the checked IOExceptionstatic <T> void
saveToFolder
(org.apache.spark.api.java.JavaRDD<T> javaRdd, String path, org.apache.jena.riot.RDFFormat rdfFormat, boolean mapQuadsToTriplesForTripleLangs, org.apache.jena.shared.PrefixMapping globalPrefixMapping, BiConsumer<T, org.apache.jena.riot.system.StreamRDF> sendRecordToStreamRDF) Deprecated.static <T> void
saveUsingElephas
(org.apache.spark.api.java.JavaRDD<T> rdd, org.apache.hadoop.fs.Path path, org.apache.jena.riot.Lang lang, org.aksw.commons.lambda.serializable.SerializableFunction<? super T, ?> recordToWritable) static <T> void
sendToStreamRDF
(org.apache.spark.api.java.JavaRDD<T> javaRdd, org.aksw.commons.lambda.serializable.SerializableBiConsumer<T, org.apache.jena.riot.system.StreamRDF> sendRecordToStreamRDF, org.aksw.commons.lambda.serializable.SerializableSupplier<org.apache.jena.riot.system.StreamRDF> streamRdfSupplier) static String
toString
(org.apache.jena.shared.PrefixMapping prefixMapping, org.apache.jena.riot.RDFFormat rdfFormat) Convert a prefix mapping to a stringstatic void
validate
(RddRdfWriterSettings<?> settings) Methods inherited from class net.sansa_stack.spark.io.rdf.output.RddRdfWriterSettings
configureFrom, getFallbackOutputFormat, getGlobalPrefixMapping, getOutputFormat, isMapQuadsToTriplesForTripleLangs, isPartitionsAsIndependentFiles, mutate, self, setDeferOutputForUsedPrefixes, setGlobalPrefixMapping, setGlobalPrefixMapping, setMapQuadsToTriplesForTripleLangs, setOutputFormat, setOutputFormat, setPartitionsAsIndependentFiles
Methods inherited from class net.sansa_stack.spark.io.rdf.output.RddWriterSettings
configureFrom, getConsoleOutSupplier, getHadoopConfiguration, getPartitionFolder, getPartitionFolderFs, getPostProcessingSettings, getTargetFile, getTargetFileFs, isAllowOverwriteFiles, isConsoleOutput, isDeletePartitionFolderAfterMerge, isUseCoalesceOne, setAllowOverwriteFiles, setConsoleOutput, setConsoleOutSupplier, setDeletePartitionFolderAfterMerge, setHadoopConfiguration, setPartitionFolder, setPartitionFolder, setPartitionFolderFs, setPostProcessingSettings, setTargetFile, setTargetFile, setTargetFileFs, setUseCoalesceOne
-
Field Details
-
dispatcher
References the lambdas in RddRdfOpsImpl directly (saves one entry in the call stack per record) -
sparkContext
protected org.apache.spark.api.java.JavaSparkContext sparkContext -
rdd
-
hadoopConfiguration
protected org.apache.hadoop.conf.Configuration hadoopConfiguration
-
-
Constructor Details
-
RddRdfWriter
-
-
Method Details
-
setRdd
-
getRdd
-
runUnchecked
public void runUnchecked()Same asrun()
but without the checked IOException -
run
- Throws:
IOException
-
getEffectiveRdd
Create the effective RDD w.r.t. configuration (sort, unqiue, optimize prefixes) If optimize prefixes is enabled then invoking this method will immediately perform that analysis The current behavior is that this writer's prefix map will be updated to the used prefixes. However, this is subject to change such that a new writer instance with the used prefixes is created. -
runOutputToConsole
- Throws:
IOException
-
runActual
- Throws:
IOException
-
runSpark
Run the save action according to configuration- Throws:
IOException
-
partitionMapperNTriples
-
partitionMapperNQuads
-
partitionMapperRDFStream
public static <T> org.aksw.commons.lambda.throwing.ThrowingFunction<Iterator<T>,Iterator<String>> partitionMapperRDFStream(Function<OutputStream, org.apache.jena.riot.system.StreamRDF> streamRDFFactory, BiConsumer<? super T, org.apache.jena.riot.system.StreamRDF> sendRecordToWriter) -
saveUsingElephas
public static <T> void saveUsingElephas(org.apache.spark.api.java.JavaRDD<T> rdd, org.apache.hadoop.fs.Path path, org.apache.jena.riot.Lang lang, org.aksw.commons.lambda.serializable.SerializableFunction<? super T, ?> recordToWritable) -
createForTriple
-
createForQuad
-
createForGraph
-
createForDatasetGraph
public static RddRdfWriter<org.aksw.jenax.arq.dataset.api.DatasetGraphOneNg> createForDatasetGraph() -
createForModel
-
createForDataset
-
validate
-
sendToStreamRDF
public static <T> void sendToStreamRDF(org.apache.spark.api.java.JavaRDD<T> javaRdd, org.aksw.commons.lambda.serializable.SerializableBiConsumer<T, org.apache.jena.riot.system.StreamRDF> sendRecordToStreamRDF, org.aksw.commons.lambda.serializable.SerializableSupplier<org.apache.jena.riot.system.StreamRDF> streamRdfSupplier)
-