Package tensorflow
Interface CoordinationConfig.CoordinationServiceConfigOrBuilder
- All Superinterfaces:
com.google.protobuf.MessageLiteOrBuilder,com.google.protobuf.MessageOrBuilder
- All Known Implementing Classes:
CoordinationConfig.CoordinationServiceConfig,CoordinationConfig.CoordinationServiceConfig.Builder
- Enclosing class:
CoordinationConfig
public static interface CoordinationConfig.CoordinationServiceConfigOrBuilder
extends com.google.protobuf.MessageOrBuilder
-
Method Summary
Modifier and TypeMethodDescriptionbooleanIf set, agents do not make an explicit Shutdown() call.booleanIf a task restarts with a new incarnation, we may allow it to reconnect silently.longMaximum wait time for all members in the cluster to be registered.booleanDenotes if we should synchronize the agents' register attempts by blocking on a barrier.getCoordinatedJobList(int index) repeated .tensorflow.CoordinatedJob coordinated_job_list = 10;intrepeated .tensorflow.CoordinatedJob coordinated_job_list = 10;repeated .tensorflow.CoordinatedJob coordinated_job_list = 10;getCoordinatedJobListOrBuilder(int index) repeated .tensorflow.CoordinatedJob coordinated_job_list = 10;List<? extends CoordinationConfig.CoordinatedJobOrBuilder> repeated .tensorflow.CoordinatedJob coordinated_job_list = 10;booleanWhether to enable the health check mechanism.booleanDisables coordination service.longHeartbeat timeout, if a task does not record heartbeat in this time window, it will be considered disconnected.booleanUse long polling to get error from coordination service as the error propagation mechanism.getRecoverableJobs(int index) The list of jobs which are recoverable.com.google.protobuf.ByteStringgetRecoverableJobsBytes(int index) The list of jobs which are recoverable.intThe list of jobs which are recoverable.The list of jobs which are recoverable.Address where the coordination service instance is hosted.com.google.protobuf.ByteStringAddress where the coordination service instance is hosted.Type of coordination service implementation to enable.com.google.protobuf.ByteStringType of coordination service implementation to enable.longDenotes how long to wait for all coordination agents to reach the barriers (after the first shutdown request) before disconnecting together.Methods inherited from interface com.google.protobuf.MessageLiteOrBuilder
isInitializedMethods inherited from interface com.google.protobuf.MessageOrBuilder
findInitializationErrors, getAllFields, getDefaultInstanceForType, getDescriptorForType, getField, getInitializationErrorString, getOneofFieldDescriptor, getRepeatedField, getRepeatedFieldCount, getUnknownFields, hasField, hasOneof
-
Method Details
-
getServiceType
String getServiceType()Type of coordination service implementation to enable. For example, setting the service type as "standalone" starts a service instance on the leader task to provide the coordination services such as heartbeats and consistent key-value store.
string service_type = 1;- Returns:
- The serviceType.
-
getServiceTypeBytes
com.google.protobuf.ByteString getServiceTypeBytes()Type of coordination service implementation to enable. For example, setting the service type as "standalone" starts a service instance on the leader task to provide the coordination services such as heartbeats and consistent key-value store.
string service_type = 1;- Returns:
- The bytes for serviceType.
-
getServiceLeader
String getServiceLeader()Address where the coordination service instance is hosted.
string service_leader = 2;- Returns:
- The serviceLeader.
-
getServiceLeaderBytes
com.google.protobuf.ByteString getServiceLeaderBytes()Address where the coordination service instance is hosted.
string service_leader = 2;- Returns:
- The bytes for serviceLeader.
-
getEnableHealthCheck
boolean getEnableHealthCheck()Whether to enable the health check mechanism.
bool enable_health_check = 3;- Returns:
- The enableHealthCheck.
-
getClusterRegisterTimeoutInMs
long getClusterRegisterTimeoutInMs()Maximum wait time for all members in the cluster to be registered.
int64 cluster_register_timeout_in_ms = 4;- Returns:
- The clusterRegisterTimeoutInMs.
-
getClusterRegisterWithBarrier
boolean getClusterRegisterWithBarrier()Denotes if we should synchronize the agents' register attempts by blocking on a barrier. This is useful for synchronized restarts.
bool cluster_register_with_barrier = 14;- Returns:
- The clusterRegisterWithBarrier.
-
getHeartbeatTimeoutInMs
long getHeartbeatTimeoutInMs()Heartbeat timeout, if a task does not record heartbeat in this time window, it will be considered disconnected. Note: This is also used as a grace period to accept any heartbeats after the agent has disconnected, to account for the lag time between the service recording the state change and the agent stopping heartbeats.
int64 heartbeat_timeout_in_ms = 5;- Returns:
- The heartbeatTimeoutInMs.
-
getCoordinatedJobListList
List<CoordinationConfig.CoordinatedJob> getCoordinatedJobListList()repeated .tensorflow.CoordinatedJob coordinated_job_list = 10; -
getCoordinatedJobList
repeated .tensorflow.CoordinatedJob coordinated_job_list = 10; -
getCoordinatedJobListCount
int getCoordinatedJobListCount()repeated .tensorflow.CoordinatedJob coordinated_job_list = 10; -
getCoordinatedJobListOrBuilderList
List<? extends CoordinationConfig.CoordinatedJobOrBuilder> getCoordinatedJobListOrBuilderList()repeated .tensorflow.CoordinatedJob coordinated_job_list = 10; -
getCoordinatedJobListOrBuilder
repeated .tensorflow.CoordinatedJob coordinated_job_list = 10; -
getShutdownBarrierTimeoutInMs
long getShutdownBarrierTimeoutInMs()Denotes how long to wait for all coordination agents to reach the barriers (after the first shutdown request) before disconnecting together. If set to 0, no barrier is imposed upon shutdown and each worker can disconnect individually.
int64 shutdown_barrier_timeout_in_ms = 7;- Returns:
- The shutdownBarrierTimeoutInMs.
-
getAgentDestructionWithoutShutdown
boolean getAgentDestructionWithoutShutdown()If set, agents do not make an explicit Shutdown() call. Service will only find out about the disconnecte agent via stale heartbeats. Used for testing.
bool agent_destruction_without_shutdown = 8;- Returns:
- The agentDestructionWithoutShutdown.
-
getRecoverableJobsList
The list of jobs which are recoverable. If a task in this list fails, it will not propagate error to other tasks. If empty, no jobs will be recoverable and every task failure will cause error propagation to other tasks.
repeated string recoverable_jobs = 9;- Returns:
- A list containing the recoverableJobs.
-
getRecoverableJobsCount
int getRecoverableJobsCount()The list of jobs which are recoverable. If a task in this list fails, it will not propagate error to other tasks. If empty, no jobs will be recoverable and every task failure will cause error propagation to other tasks.
repeated string recoverable_jobs = 9;- Returns:
- The count of recoverableJobs.
-
getRecoverableJobs
The list of jobs which are recoverable. If a task in this list fails, it will not propagate error to other tasks. If empty, no jobs will be recoverable and every task failure will cause error propagation to other tasks.
repeated string recoverable_jobs = 9;- Parameters:
index- The index of the element to return.- Returns:
- The recoverableJobs at the given index.
-
getRecoverableJobsBytes
com.google.protobuf.ByteString getRecoverableJobsBytes(int index) The list of jobs which are recoverable. If a task in this list fails, it will not propagate error to other tasks. If empty, no jobs will be recoverable and every task failure will cause error propagation to other tasks.
repeated string recoverable_jobs = 9;- Parameters:
index- The index of the value to return.- Returns:
- The bytes of the recoverableJobs at the given index.
-
getAllowNewIncarnationToReconnect
boolean getAllowNewIncarnationToReconnect()If a task restarts with a new incarnation, we may allow it to reconnect silently. This is useful when we know that a task can immediately resume work upon re-connecting to the service.
bool allow_new_incarnation_to_reconnect = 11;- Returns:
- The allowNewIncarnationToReconnect.
-
getForceDisable
boolean getForceDisable()Disables coordination service. Some libraries enable coordination service by default even if the user did not specify any config. This field allows users to explicitly disable coordination service under all situations.
bool force_disable = 12;- Returns:
- The forceDisable.
-
getPollForErrorFromServiceAtStartup
boolean getPollForErrorFromServiceAtStartup()Use long polling to get error from coordination service as the error propagation mechanism.
bool poll_for_error_from_service_at_startup = 13;- Returns:
- The pollForErrorFromServiceAtStartup.
-