public class GraphJob extends JobConf
JobConf
,用于定义、提交和管理一个 ODPS Graph 作业.
ODPS Graph 作业属于一类 BSP (Bulk Synchronous Parallel) 程序,通过构造一个有向图,然后迭代对图进行编辑处理完成计算任务,迭代终止条件允许自定义。
一个 ODPS Graph 程序逻辑如下:
Vertex
/Edge
)组成的有向图,点和边包含值;
GraphLoader
将表的记录解析为点及其出边;
Aggregator
汇总信息到全局信息;
Aggregator
的
terminate 返回true;
GraphJob 提供了两类接口:
第一类:用于定义一个 ODPS Graph 作业,这类接口继承自 JobConf
,主要包括:
指定 ODPS Graph 的具体实现类:
JobConf.setWorkerComputerClass(Class)
JobConf.setGraphLoaderClass(Class)
JobConf.setVertexClass(Class)
JobConf.setAggregatorClass(Class)
JobConf.setAggregatorClass(Class...)
JobConf.setPartitionerClass(Class)
JobConf.setCombinerClass(Class)
GraphLoader
和 Vertex
必须提供实现,其他根据需要可选。
指定作业的输入输出:
JobConf.addInput(TableInfo)
JobConf.addInput(TableInfo, String[])
JobConf.addOutput(TableInfo)
JobConf.addOutput(TableInfo, boolean)
#addOutput(TableInfo, String)
#addOutput(TableInfo, String, boolean)
声明本作业用到的 ODPS 资源:
JobConf.addCacheResources(String)
,此接口作用与 jar -resources 声明资源的效果一样
JobConf.addCacheResourcesToClassPath(String)
此接口作用与 jar -libjars
声明资源的效果一样
指定一些高级选项,指示 ODPS Graph 执行框架该如何执行这个作业,例如: :
JobConf.setSplitSize(long)
设置输入的切分大小(单位 MB,默认值 256),会影响 Worker 数目;
JobConf.setRuntimePartitioning(boolean)
指示 Worker 在加载点后是否进行重新分发,默认值是
true;
JobConf.setMaxIteration(int)
设置最大迭代次数,最大迭代次数是迭代终止的条件之一,默认值是
-1,若取值<=0,则表示最大迭代次数不作为迭代终止条件;
第二类:用于提交和管理一个 ODPS Graph 作业,主要包括:
run()
提交作业并等待作业结束,作业失败则会抛异常,阻塞(同步)方式;
submit()
提交作业立即返回,非阻塞(异步)方式;
isComplete()
查询作业是否结束(成功、失败或被杀),通常在非阻塞方式提交作业时使用;
isSuccessful()
查询作业是否成功,通常在非阻塞方式提交作业时使用;
getCounters()
获取作业计数信息;
代码示例,摘自PageRank:
public static void main(String[] args) throws IOException { *
GraphJob job = new GraphJob();
job.setGraphLoaderClass(PageRankGraphLoader.class);
job.setVertexClass(PageRankVertex.class);
job.addInput(new TableInfo(args[0]));
job.addOutput(new TableInfo(args[1]));
job.setMaxIteration(30);
job.run();
}
Vertex
,
GraphLoader
,
Aggregator
,
WorkerComputer
JobConf.JobState
构造器和说明 |
---|
GraphJob()
构造一个 ODPS Graph 作业.
|
GraphJob(boolean loadDefaults)
已过时。
|
GraphJob(Configuration conf)
构造一个 ODPS Graph 作业.
|
GraphJob(Configuration conf,
JobConf.JobState js)
已过时。
|
GraphJob(String config)
已过时。
|
限定符和类型 | 方法和说明 |
---|---|
Counters |
getCounters()
获取作业运行实例的 Counters 信息,ODPS Graph 运行框架会汇总所有 Worker 设置的 Counters.
|
boolean |
isComplete()
查询作业是否结束.
|
boolean |
isSuccessful()
查询作业实例是否运行成功.
|
void |
killJob()
Kill 此作业运行实例
|
void |
run()
阻塞(同步)方式提交 ODPS Graph 作业并等待作业结束.
|
void |
submit()
非阻塞(异步)方式提交 ODPS Graph 作业后立即返回.
|
addCacheResources, addCacheResourcesToClassPath, addInput, addInput, addOutput, addOutput, getAggregatorOwnerPartitionerClass, getAggregatorTreeDepth, getBroadcastMessageEnable, getCombinerClass, getComputingVertexResolver, getComputingVertexResolverClass, getGraphLoaderClass, getJobPriority, getLoadingVertexResolver, getLoadingVertexResolverClass, getMaxIteration, getMemoryThreshold, getNumWorkers, getPartitionerClass, getRuntimePartitioning, getSplitSize, getSyncBetweenResolveCompute, getUseDiskBackedMessage, getUseDiskBackedMutation, getUseMultipleInputOutput, getUseTreeAggregator, getVertexClass, getWorkerComputerClass, getWorkerCPU, getWorkerMemory, setAggregatorClass, setAggregatorClass, setAggregatorOwnerPartitionerClass, setAggregatorTreeDepth, setBroadcastMessageEnable, setCheckpointSuperstepFrequency, setCombinerClass, setComputingVertexResolver, setComputingVertexResolverClass, setGraphLoaderClass, setJobPriority, setLoadingVertexResolver, setLoadingVertexResolverClass, setLogLevel, setMaxIteration, setMemoryThreshold, setNumWorkers, setPartitionerClass, setRuntimePartitioning, setSplitSize, setSyncBetweenResolveCompute, setUseDiskBackedMessage, setUseDiskBackedMutation, setUseMultipleInputOutput, setUseTreeAggregator, setVertexClass, setWorkerComputerClass, setWorkerCPU, setWorkerMemory
addDefaultResource, addResource, addResource, addResource, clear, get, get, getBoolean, getClass, getClass, getClassByName, getClasses, getClassLoader, getConfResourceAsInputStream, getConfResourceAsReader, getFile, getFloat, getInt, getLong, getRange, getRaw, getResource, getStringCollection, getStrings, getStrings, iterator, readFields, reloadConfiguration, set, setBoolean, setBooleanIfUnset, setClass, setClassLoader, setFloat, setIfUnset, setInt, setLong, setQuietMode, setStrings, size, toString, write, writeXml
forEach, spliterator
public GraphJob()
@Deprecated public GraphJob(boolean loadDefaults)
loadDefaults
- 指示是否加载 CLASSPATH 路径上的 odps-graph.xml 配置文件@Deprecated public GraphJob(Configuration conf, JobConf.JobState js)
conf
- 配置管理器js
- 作业初始状态,定义或运行状态public GraphJob(Configuration conf)
conf
- 配置管理器@Deprecated public GraphJob(String config)
<configuration> <property> <name>com.mycomp.xxx</name> <value>xxx</value> </property> ... ... </configuration>
config
- Configuration-format XML 配置文件public boolean isComplete() throws IOException
IOException
public boolean isSuccessful() throws IOException
IOException
public void killJob() throws IOException
IOException
public void submit() throws IOException
只有当提交作业发生异常抛IOException
(注意:这与 run()
异常行为不同, run()
在作业失败时会抛异常)。
使用本方法提交作业,可以轮询作业状态,示例代码:
GraphJob job = new GraphJob(); ... //config job job.submit(); while (!job.isComplete()) { Thread.sleep(4000); // do your work or sleep } if (job.isSuccessful()) { System.out.println("Job Success!"); } else { System.err.println("Job Failed!"); }
IOException
- 作业提交失败时抛异常public void run() throws IOException
以下情况发生时抛IOException
:
submit()
异常行为不同
作业主程序(main函数)需要谨慎处理该异常,因为会影响到console的返回值:
如果不catch异常,作业失败时会抛出异常,console返回值为非0;如果catch异常且不再向外抛出,即使作业失败,console返回值也为0。
GraphJob job = new GraphJob(); ... //config job job.run();
IOException
- 如果发生提交作业异常、轮询作业状态异常或者作业失败,则抛 IOException 异常submit()
public Counters getCounters() throws IOException
IOException
Copyright © 2015 Alibaba Cloud Computing. All rights reserved.