activity

Type Members

case class ActivityFields[A <: ResourceObject](runsOn: Resource[A], dependsOn: Seq[PipelineActivity[_]] = Seq.empty, preconditions: Seq[Precondition] = Seq.empty, onFailAlarms: Seq[SnsAlarm] = Seq.empty, onSuccessAlarms: Seq[SnsAlarm] = Seq.empty, onLateActionAlarms: Seq[SnsAlarm] = Seq.empty, maximumRetries: Option[HInt] = None, attemptTimeout: Option[HDuration] = None, lateAfterTimeout: Option[HDuration] = None, retryDelay: Option[HDuration] = None, failureAndRerunMode: Option[FailureAndRerunMode] = None, maxActiveInstances: Option[HInt] = None) extends Product with Serializable
trait BaseShellCommandActivity extends PipelineActivity[Ec2Resource]
case class CopyActivity extends PipelineActivity[Ec2Resource] with Product with Serializable

The activity that copies data from one data node to the other.
The activity that copies data from one data node to the other.

Note
it seems that both input and output format needs to be in CsvDataFormat for this copy to work properly and it needs to be a specific variance of the CSV, for more information check the web page: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-copyactivity.html From our experience it's really hard to export using TsvDataFormat, in both import and export especially for tasks involving RedshiftCopyActivity. A general rule of thumb is always use default CsvDataFormat for tasks involving both exporting to S3 and copy to redshift.
trait EmrActivity[A <: EmrCluster] extends PipelineActivity[A]

The base trait for activities that run on an Amazon EMR cluster
trait EmrTaskActivity[A <: EmrCluster] extends EmrActivity[A]
case class EmrTaskActivityFields(preActivityTaskConfig: Option[ShellScriptConfig] = None, postActivityTaskConfig: Option[ShellScriptConfig] = None) extends Product with Serializable
trait FailureAndRerunMode extends AnyRef
case class HadoopActivity[A <: EmrCluster] extends EmrTaskActivity[A] with Product with Serializable

Runs a MapReduce job on a cluster.
Runs a MapReduce job on a cluster. The cluster can be an EMR cluster managed by AWS Data Pipeline or another resource if you use TaskRunner. Use HadoopActivity when you want to run work in parallel. This allows you to use the scheduling resources of the YARN framework or the MapReduce resource negotiator in Hadoop 1. If you would like to run work sequentially using the Amazon EMR Step action, you can still use EmrActivity.
case class HiveActivity[A <: EmrCluster] extends EmrTaskActivity[A] with Product with Serializable

Runs a Hive query on an Amazon EMR cluster.
Runs a Hive query on an Amazon EMR cluster. HiveActivity makes it easier to set up an Amzon EMR activity and automatically creates Hive tables based on input data coming in from either Amazon S3 or Amazon RDS. All you need to specify is the HiveQL to run on the source data. AWS Data Pipeline automatically creates Hive tables with ${input1}, ${input2}, etc. based on the input fields in the Hive Activity object. For S3 inputs, the dataFormat field is used to create the Hive column names. For MySQL (RDS) inputs, the column names for the SQL query are used to create the Hive column names.
case class HiveCopyActivity[A <: EmrCluster] extends EmrTaskActivity[A] with Product with Serializable

Runs a Hive query on an Amazon EMR cluster.
Runs a Hive query on an Amazon EMR cluster. HiveCopyActivity makes it easier to copy data between Amazon S3 and DynamoDB. HiveCopyActivity accepts a HiveQL statement to filter input data from Amazon S3 or DynomoDB at the column and row level.
case class JarActivity extends BaseShellCommandActivity with WithS3Input with WithS3Output with Product with Serializable

Shell command activity that runs a given Jar
class MainClass extends AnyRef
case class MapReduceActivity[A <: EmrCluster] extends EmrActivity[A] with Product with Serializable

Runs map reduce steps on an Amazon EMR cluster
case class MapReduceStep extends Product with Serializable

A MapReduce step that runs on MapReduce Cluster
case class PigActivity[A <: EmrCluster] extends EmrTaskActivity[A] with Product with Serializable

PigActivity provides native support for Pig scripts in AWS Data Pipeline without the requirement to use ShellCommandActivity or EmrActivity.
PigActivity provides native support for Pig scripts in AWS Data Pipeline without the requirement to use ShellCommandActivity or EmrActivity. In addition, PigActivity supports data staging. When the stage field is set to true, AWS Data Pipeline stages the input data as a schema in Pig without additional code from the user.
trait PipelineActivity[A <: ResourceObject] extends NamedPipelineObject

The activity trait.
The activity trait. All activities should mixin this trait.
case class RedshiftCopyActivity extends PipelineActivity[Ec2Resource] with Product with Serializable

Copies data directly from DynamoDB or Amazon S3 to Amazon Redshift.
Copies data directly from DynamoDB or Amazon S3 to Amazon Redshift. You can load data into a new table, or easily merge data into an existing table.
case class RedshiftCopyOption(repr: Seq[String]) extends Product with Serializable
trait RedshiftUnloadOption extends AnyRef
sealed trait Script extends AnyRef
sealed case class ScriptContent(content: Option[HString]) extends Script with Product with Serializable
sealed case class ScriptUri(uri: Option[HS3Uri]) extends Script with Product with Serializable
case class ShellCommandActivity extends BaseShellCommandActivity with WithS3Input with WithS3Output with Product with Serializable

Runs a command or script
case class ShellCommandActivityFields(script: Script, scriptArguments: Seq[HString] = Seq.empty, stdout: Option[HString] = None, stderr: Option[HString] = None, stage: Option[HBoolean] = None, input: Seq[S3DataNode] = Seq.empty, output: Seq[S3DataNode] = Seq.empty) extends Product with Serializable
case class ShellScriptConfig(baseFields: BaseFields, scriptUri: HS3Uri, scriptArguments: Seq[HString]) extends NamedPipelineObject with Product with Serializable
case class SparkActivity extends EmrActivity[SparkCluster] with Product with Serializable

Runs spark steps on given spark cluster with Amazon EMR
case class SparkStep extends Product with Serializable

A Spark step that runs on Spark Cluster
case class SparkTaskActivity extends EmrTaskActivity[SparkCluster] with Product with Serializable

Runs a Spark job on a cluster.
Runs a Spark job on a cluster. The cluster can be an EMR cluster managed by AWS Data Pipeline or another resource if you use TaskRunner. Use SparkActivity when you want to run work in parallel. This allows you to use the scheduling resources of the YARN framework or the MapReduce resource negotiator in Hadoop 1. If you would like to run work sequentially using the Amazon EMR Step action, you can still use SparkActivity.
case class SqlActivity extends PipelineActivity[Ec2Resource] with Product with Serializable

Runs an SQL query on a RedShift cluster.
Runs an SQL query on a RedShift cluster. If the query writes out to a table that does not exist, a new table with that name is created.
trait WithS3Input extends AnyRef
trait WithS3Output extends AnyRef

Value Members

object CopyActivity extends RunnableObject with Serializable
object FailureAndRerunMode
object HadoopActivity extends RunnableObject with Serializable
object HiveActivity extends RunnableObject with Serializable
object HiveCopyActivity extends RunnableObject with Serializable
object JarActivity extends RunnableObject with Serializable
object MainClass
object MapReduceActivity extends RunnableObject with Serializable
object MapReduceStep extends Serializable
object PigActivity extends RunnableObject with Serializable
object RedshiftCopyActivity extends Enumeration with RunnableObject
object RedshiftCopyOption extends Serializable
object RedshiftUnloadOption
object Script
object ShellCommandActivity extends RunnableObject with Serializable
object ShellScriptConfig extends Serializable
object SparkActivity extends RunnableObject with SparkCommandRunner with Serializable
object SparkStep extends Serializable
object SparkTaskActivity extends RunnableObject with SparkCommandRunner with Serializable
object SqlActivity extends RunnableObject with Serializable

package activity

Type Members

trait BaseShellCommandActivity extends PipelineActivity[Ec2Resource]

case class CopyActivity extends PipelineActivity[Ec2Resource] with Product with Serializable

trait EmrActivity[A <: EmrCluster] extends PipelineActivity[A]

trait EmrTaskActivity[A <: EmrCluster] extends EmrActivity[A]

case class EmrTaskActivityFields(preActivityTaskConfig: Option[ShellScriptConfig] = None, postActivityTaskConfig: Option[ShellScriptConfig] = None) extends Product with Serializable

trait FailureAndRerunMode extends AnyRef

case class HadoopActivity[A <: EmrCluster] extends EmrTaskActivity[A] with Product with Serializable

case class HiveActivity[A <: EmrCluster] extends EmrTaskActivity[A] with Product with Serializable

case class HiveCopyActivity[A <: EmrCluster] extends EmrTaskActivity[A] with Product with Serializable

case class JarActivity extends BaseShellCommandActivity with WithS3Input with WithS3Output with Product with Serializable

class MainClass extends AnyRef

case class MapReduceActivity[A <: EmrCluster] extends EmrActivity[A] with Product with Serializable

case class MapReduceStep extends Product with Serializable

case class PigActivity[A <: EmrCluster] extends EmrTaskActivity[A] with Product with Serializable

trait PipelineActivity[A <: ResourceObject] extends NamedPipelineObject

case class RedshiftCopyActivity extends PipelineActivity[Ec2Resource] with Product with Serializable

case class RedshiftCopyOption(repr: Seq[String]) extends Product with Serializable

trait RedshiftUnloadOption extends AnyRef

sealed trait Script extends AnyRef

sealed case class ScriptContent(content: Option[HString]) extends Script with Product with Serializable

sealed case class ScriptUri(uri: Option[HS3Uri]) extends Script with Product with Serializable

case class ShellCommandActivity extends BaseShellCommandActivity with WithS3Input with WithS3Output with Product with Serializable

case class ShellCommandActivityFields(script: Script, scriptArguments: Seq[HString] = Seq.empty, stdout: Option[HString] = None, stderr: Option[HString] = None, stage: Option[HBoolean] = None, input: Seq[S3DataNode] = Seq.empty, output: Seq[S3DataNode] = Seq.empty) extends Product with Serializable

case class ShellScriptConfig(baseFields: BaseFields, scriptUri: HS3Uri, scriptArguments: Seq[HString]) extends NamedPipelineObject with Product with Serializable

case class SparkActivity extends EmrActivity[SparkCluster] with Product with Serializable

case class SparkStep extends Product with Serializable

case class SparkTaskActivity extends EmrTaskActivity[SparkCluster] with Product with Serializable

case class SqlActivity extends PipelineActivity[Ec2Resource] with Product with Serializable

trait WithS3Input extends AnyRef

trait WithS3Output extends AnyRef

Value Members

object CopyActivity extends RunnableObject with Serializable

object FailureAndRerunMode

object HadoopActivity extends RunnableObject with Serializable

object HiveActivity extends RunnableObject with Serializable

object HiveCopyActivity extends RunnableObject with Serializable

object JarActivity extends RunnableObject with Serializable

object MainClass

object MapReduceActivity extends RunnableObject with Serializable

object MapReduceStep extends Serializable

object PigActivity extends RunnableObject with Serializable

object RedshiftCopyActivity extends Enumeration with RunnableObject

object RedshiftCopyOption extends Serializable

object RedshiftUnloadOption

object Script

object ShellCommandActivity extends RunnableObject with Serializable

object ShellScriptConfig extends Serializable

object SparkActivity extends RunnableObject with SparkCommandRunner with Serializable

object SparkStep extends Serializable

object SparkTaskActivity extends RunnableObject with SparkCommandRunner with Serializable

object SqlActivity extends RunnableObject with Serializable

Ungrouped