Did this page help you?

   Yes   No   Tell us about it...

com.amazonaws.services.elasticmapreduce.util
Class StepFactory

java.lang.Object
  extended by com.amazonaws.services.elasticmapreduce.util.StepFactory

public class StepFactory
extends Object

This class provides helper methods for creating common Elastic MapReduce step types. To use StepFactory, you should construct it with the appropriate bucket for your region. The official bucket format is "<region>.elasticmapreduce", so us-east-1 would use the bucket "us-east-1.elasticmapreduce".

Example usage, create an interactive Hive job flow with debugging enabled:

   AWSCredentials credentials = new BasicAWSCredentials(accessKey, secretKey);
   AmazonElasticMapReduce emr = new AmazonElasticMapReduceClient(credentials);

   StepFactory stepFactory = new StepFactory();

   StepConfig enableDebugging = new StepConfig()
       .withName("Enable Debugging")
       .withActionOnFailure("TERMINATE_JOB_FLOW")
       .withHadoopJarStep(stepFactory.newEnableDebuggingStep());

   StepConfig installHive = new StepConfig()
       .withName("Install Hive")
       .withActionOnFailure("TERMINATE_JOB_FLOW")
       .withHadoopJarStep(stepFactory.newInstallHiveStep());

   RunJobFlowRequest request = new RunJobFlowRequest()
       .withName("Hive Interactive")
       .withSteps(enableDebugging, installHive)
       .withLogUri("s3://log-bucket/")
       .withInstances(new JobFlowInstancesConfig()
           .withEc2KeyName("keypair")
           .withHadoopVersion("0.20")
           .withInstanceCount(5)
           .withKeepJobFlowAliveWhenNoSteps(true)
           .withMasterInstanceType("m1.small")
           .withSlaveInstanceType("m1.small"));

   RunJobFlowResult result = emr.runJobFlow(request);
 


Nested Class Summary
static class StepFactory.HiveVersion
          The available Hive versions.
 
Constructor Summary
StepFactory()
          Creates a new StepFactory using the default Elastic Map Reduce bucket (us-east-1.elasticmapreduce) for the default (us-east-1) region.
StepFactory(String bucket)
          Creates a new StepFactory using the specified Amazon S3 bucket to load resources.
 
Method Summary
 HadoopJarStepConfig newEnableDebuggingStep()
          When ran as the first step in your job flow, enables the Hadoop debugging UI in the AWS Management Console.
 HadoopJarStepConfig newInstallHiveStep()
          Step that installs the default version of Hive on your job flow.
 HadoopJarStepConfig newInstallHiveStep(StepFactory.HiveVersion... hiveVersions)
          Step that installs the specified versions of Hive on your job flow.
 HadoopJarStepConfig newInstallHiveStep(String... hiveVersions)
          Step that installs the specified versions of Hive on your job flow.
 HadoopJarStepConfig newInstallPigStep()
          Step that installs the default version of Pig on your job flow.
 HadoopJarStepConfig newInstallPigStep(String... pigVersions)
          Step that installs Pig on your job flow.
 HadoopJarStepConfig newRunHiveScriptStep(String script, String... args)
          Step that runs a Hive script on your job flow using the default Hive version.
 HadoopJarStepConfig newRunHiveScriptStepVersioned(String script, String hiveVersion, String... scriptArgs)
          Step that runs a Hive script on your job flow using the specified Hive version.
 HadoopJarStepConfig newRunPigScriptStep(String script, String... scriptArgs)
          Step that runs a Pig script on your job flow using the default Pig version.
 HadoopJarStepConfig newRunPigScriptStep(String script, String pigVersion, String... scriptArgs)
          Step that runs a Pig script on your job flow using the specified Pig version.
 HadoopJarStepConfig newScriptRunnerStep(String script, String... args)
          Runs a specified script on the master node of your cluster.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

StepFactory

public StepFactory()
Creates a new StepFactory using the default Elastic Map Reduce bucket (us-east-1.elasticmapreduce) for the default (us-east-1) region.


StepFactory

public StepFactory(String bucket)
Creates a new StepFactory using the specified Amazon S3 bucket to load resources.

The official bucket format is "<region>.elasticmapreduce", so if you're using the us-east-1 region, you should use the bucket "us-east-1.elasticmapreduce".

Parameters:
bucket - The Amazon S3 bucket from which to load resources.
Method Detail

newScriptRunnerStep

public HadoopJarStepConfig newScriptRunnerStep(String script,
                                               String... args)
Runs a specified script on the master node of your cluster.

Parameters:
script - The script to run.
args - Arguments that get passed to the script.
Returns:
HadoopJarStepConfig that can be passed to your job flow.

newEnableDebuggingStep

public HadoopJarStepConfig newEnableDebuggingStep()
When ran as the first step in your job flow, enables the Hadoop debugging UI in the AWS Management Console.

Returns:
HadoopJarStepConfig that can be passed to your job flow.

newInstallHiveStep

public HadoopJarStepConfig newInstallHiveStep(StepFactory.HiveVersion... hiveVersions)
Step that installs the specified versions of Hive on your job flow.

Parameters:
hiveVersions - the versions of Hive to install
Returns:
HadoopJarStepConfig that can be passed to your job flow.

newInstallHiveStep

public HadoopJarStepConfig newInstallHiveStep(String... hiveVersions)
Step that installs the specified versions of Hive on your job flow.

Parameters:
hiveVersions - the versions of Hive to install
Returns:
HadoopJarStepConfig that can be passed to your job flow.

newInstallHiveStep

public HadoopJarStepConfig newInstallHiveStep()
Step that installs the default version of Hive on your job flow. This is 0.4 for Hadoop 0.18 and 0.5 for Hadoop 0.20.

Returns:
HadoopJarStepConfig that can be passed to your job flow.

newRunHiveScriptStepVersioned

public HadoopJarStepConfig newRunHiveScriptStepVersioned(String script,
                                                         String hiveVersion,
                                                         String... scriptArgs)
Step that runs a Hive script on your job flow using the specified Hive version.

Parameters:
script - The script to run.
hiveVersion - The Hive version to use.
scriptArgs - Arguments that get passed to the script.
Returns:
HadoopJarStepConfig that can be passed to your job flow.

newRunHiveScriptStep

public HadoopJarStepConfig newRunHiveScriptStep(String script,
                                                String... args)
Step that runs a Hive script on your job flow using the default Hive version.

Parameters:
script - The script to run.
args - Arguments that get passed to the script.
Returns:
HadoopJarStepConfig that can be passed to your job flow.

newInstallPigStep

public HadoopJarStepConfig newInstallPigStep()
Step that installs the default version of Pig on your job flow.

Returns:
HadoopJarStepConfig that can be passed to your job flow.

newInstallPigStep

public HadoopJarStepConfig newInstallPigStep(String... pigVersions)
Step that installs Pig on your job flow.

Parameters:
pigVersions - the versions of Pig to install.
Returns:
HadoopJarStepConfig that can be passed to your job flow.

newRunPigScriptStep

public HadoopJarStepConfig newRunPigScriptStep(String script,
                                               String pigVersion,
                                               String... scriptArgs)
Step that runs a Pig script on your job flow using the specified Pig version.

Parameters:
script - The script to run.
pigVersion - The Pig version to use.
scriptArgs - Arguments that get passed to the script.
Returns:
HadoopJarStepConfig that can be passed to your job flow.

newRunPigScriptStep

public HadoopJarStepConfig newRunPigScriptStep(String script,
                                               String... scriptArgs)
Step that runs a Pig script on your job flow using the default Pig version.

Parameters:
script - The script to run.
scriptArgs - Arguments that get passed to the script.
Returns:
HadoopJarStepConfig that can be passed to your job flow.


Copyright © 2010 Amazon Web Services, Inc. All Rights Reserved.