public class HoodieDeltaStreamer extends Object implements Serializable
HiveIncrementalPuller
and apply it to the target
table. Does not maintain any state, queries at runtime to see how far behind the target table is from the source
table. This can be overriden to force sync from a timestamp.
In continuous mode, DeltaStreamer runs in loop-mode going through the below operations (a) pull-from-source (b) write-to-sink (c) Schedule Compactions if needed (d) Conditionally Sync to Hive each cycle. For MOR table with continuous mode enabled, a separate compactor thread is allocated to execute compactions
Modifier and Type | Class and Description |
---|---|
static class |
HoodieDeltaStreamer.Config |
static class |
HoodieDeltaStreamer.DeltaSyncService
Syncs data either in single-run or in continuous mode.
|
Modifier and Type | Field and Description |
---|---|
protected HoodieDeltaStreamer.Config |
cfg |
static String |
CHECKPOINT_KEY |
static String |
CHECKPOINT_RESET_KEY |
static String |
DELTASYNC_POOL_NAME |
protected Option<HoodieDeltaStreamer.DeltaSyncService> |
deltaSyncService |
Constructor and Description |
---|
HoodieDeltaStreamer(HoodieDeltaStreamer.Config cfg,
org.apache.spark.api.java.JavaSparkContext jssc) |
HoodieDeltaStreamer(HoodieDeltaStreamer.Config cfg,
org.apache.spark.api.java.JavaSparkContext jssc,
org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.conf.Configuration conf) |
HoodieDeltaStreamer(HoodieDeltaStreamer.Config cfg,
org.apache.spark.api.java.JavaSparkContext jssc,
org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.conf.Configuration conf,
Option<TypedProperties> props) |
HoodieDeltaStreamer(HoodieDeltaStreamer.Config cfg,
org.apache.spark.api.java.JavaSparkContext jssc,
Option<TypedProperties> props) |
Modifier and Type | Method and Description |
---|---|
HoodieDeltaStreamer.Config |
getConfig() |
static HoodieDeltaStreamer.Config |
getConfig(String[] args) |
HoodieDeltaStreamer.DeltaSyncService |
getDeltaSyncService() |
static void |
main(String[] args) |
void |
shutdownGracefully() |
void |
sync()
Main method to start syncing.
|
public static final String CHECKPOINT_KEY
public static final String CHECKPOINT_RESET_KEY
protected final transient HoodieDeltaStreamer.Config cfg
protected transient Option<HoodieDeltaStreamer.DeltaSyncService> deltaSyncService
public static final String DELTASYNC_POOL_NAME
public HoodieDeltaStreamer(HoodieDeltaStreamer.Config cfg, org.apache.spark.api.java.JavaSparkContext jssc) throws IOException
IOException
public HoodieDeltaStreamer(HoodieDeltaStreamer.Config cfg, org.apache.spark.api.java.JavaSparkContext jssc, Option<TypedProperties> props) throws IOException
IOException
public HoodieDeltaStreamer(HoodieDeltaStreamer.Config cfg, org.apache.spark.api.java.JavaSparkContext jssc, org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.conf.Configuration conf) throws IOException
IOException
public HoodieDeltaStreamer(HoodieDeltaStreamer.Config cfg, org.apache.spark.api.java.JavaSparkContext jssc, org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.conf.Configuration conf, Option<TypedProperties> props) throws IOException
IOException
public void shutdownGracefully()
public HoodieDeltaStreamer.Config getConfig()
public static final HoodieDeltaStreamer.Config getConfig(String[] args)
public HoodieDeltaStreamer.DeltaSyncService getDeltaSyncService()
Copyright © 2021 The Apache Software Foundation. All rights reserved.