Package org.tensorflow.framework
Class GPUOptions.Experimental
java.lang.Object
com.google.protobuf.AbstractMessageLite
com.google.protobuf.AbstractMessage
com.google.protobuf.GeneratedMessageV3
org.tensorflow.framework.GPUOptions.Experimental
- All Implemented Interfaces:
com.google.protobuf.Message,com.google.protobuf.MessageLite,com.google.protobuf.MessageLiteOrBuilder,com.google.protobuf.MessageOrBuilder,Serializable,GPUOptions.ExperimentalOrBuilder
- Enclosing class:
GPUOptions
public static final class GPUOptions.Experimental
extends com.google.protobuf.GeneratedMessageV3
implements GPUOptions.ExperimentalOrBuilder
Protobuf type
tensorflow.GPUOptions.Experimental- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final classProtobuf typetensorflow.GPUOptions.Experimentalstatic final classWhether to merge data transfer streams into the compute stream in the same stream group.static interfacestatic final classConfiguration for breaking down a visible GPU into multiple "virtual" devices.static interfaceNested classes/interfaces inherited from class com.google.protobuf.GeneratedMessageV3
com.google.protobuf.GeneratedMessageV3.BuilderParent, com.google.protobuf.GeneratedMessageV3.ExtendableBuilder<MessageT extends com.google.protobuf.GeneratedMessageV3.ExtendableMessage<MessageT>,BuilderT extends com.google.protobuf.GeneratedMessageV3.ExtendableBuilder<MessageT, BuilderT>>, com.google.protobuf.GeneratedMessageV3.ExtendableMessage<MessageT extends com.google.protobuf.GeneratedMessageV3.ExtendableMessage<MessageT>>, com.google.protobuf.GeneratedMessageV3.ExtendableMessageOrBuilder<MessageT extends com.google.protobuf.GeneratedMessageV3.ExtendableMessage<MessageT>>, com.google.protobuf.GeneratedMessageV3.FieldAccessorTable, com.google.protobuf.GeneratedMessageV3.UnusedPrivateParameter Nested classes/interfaces inherited from class com.google.protobuf.AbstractMessageLite
com.google.protobuf.AbstractMessageLite.InternalOneOfEnum -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intstatic final intFields inherited from class com.google.protobuf.GeneratedMessageV3
alwaysUseFieldBuilders, unknownFieldsFields inherited from class com.google.protobuf.AbstractMessage
memoizedSizeFields inherited from class com.google.protobuf.AbstractMessageLite
memoizedHashCode -
Method Summary
Modifier and TypeMethodDescriptionbooleanIf non-empty, defines a good GPU ring order on a single worker based on device interconnect.com.google.protobuf.ByteStringIf non-empty, defines a good GPU ring order on a single worker based on device interconnect.static GPUOptions.Experimentalstatic final com.google.protobuf.Descriptors.DescriptorbooleanBy default, BFCAllocator may sleep when it runs out of memory, in the hopes that another thread will free up memory in the meantime.booleanIf true, then the host allocator allocates its max memory all upfront and never grows.floatMemory limit for "GPU host allocator", aka pinned memory allocator.intMemory limit for gpu system.doubleBFC Allocator can return an allocated chunk of memory upto 2x the requested size.intIf kernel_tracker_max_bytes = n > 0, then a tracking event is inserted after every series of kernels allocating a sum of memory >= n.intParameters for GPUKernelTracker.intIf kernel_tracker_max_pending > 0 then no more than this many tracking events can be outstanding at a time.intnode_id for use when creating a PjRt GPU client with remote devices, which enumerates jobs*tasks from a ServerDef.intIf > 1, the number of device-to-device copy streams to create for each GPUDevice.intThe number of virtual devices to create on each visible GPU.com.google.protobuf.Parser<GPUOptions.Experimental> booleanIf true, save information needed for created a PjRt GPU client for creating a client with remote devices.int.tensorflow.GPUOptions.Experimental.StreamMergeOptions stream_merge_options = 19;.tensorflow.GPUOptions.Experimental.StreamMergeOptions stream_merge_options = 19;booleanIf true then extra work is done by GPUDevice and GPUBFCAllocator to keep track of when GPU memory is freed and when kernels actually complete so that we can know when a nominally free memory chunk is really not subject to pending use.booleanWhen true, use CUDA cudaMallocAsync API instead of TF gpu allocator.booleanIf true, uses CUDA unified memory for memory allocations.getVirtualDevices(int index) The multi virtual device settings.intThe multi virtual device settings.The multi virtual device settings.getVirtualDevicesOrBuilder(int index) The multi virtual device settings.The multi virtual device settings.inthashCode()boolean.tensorflow.GPUOptions.Experimental.StreamMergeOptions stream_merge_options = 19;protected com.google.protobuf.GeneratedMessageV3.FieldAccessorTablefinal booleannewBuilder(GPUOptions.Experimental prototype) protected GPUOptions.Experimental.BuildernewBuilderForType(com.google.protobuf.GeneratedMessageV3.BuilderParent parent) protected ObjectnewInstance(com.google.protobuf.GeneratedMessageV3.UnusedPrivateParameter unused) static GPUOptions.ExperimentalparseDelimitedFrom(InputStream input) static GPUOptions.ExperimentalparseDelimitedFrom(InputStream input, com.google.protobuf.ExtensionRegistryLite extensionRegistry) static GPUOptions.ExperimentalparseFrom(byte[] data) static GPUOptions.ExperimentalparseFrom(byte[] data, com.google.protobuf.ExtensionRegistryLite extensionRegistry) static GPUOptions.ExperimentalparseFrom(com.google.protobuf.ByteString data) static GPUOptions.ExperimentalparseFrom(com.google.protobuf.ByteString data, com.google.protobuf.ExtensionRegistryLite extensionRegistry) static GPUOptions.ExperimentalparseFrom(com.google.protobuf.CodedInputStream input) static GPUOptions.ExperimentalparseFrom(com.google.protobuf.CodedInputStream input, com.google.protobuf.ExtensionRegistryLite extensionRegistry) static GPUOptions.ExperimentalparseFrom(InputStream input) static GPUOptions.ExperimentalparseFrom(InputStream input, com.google.protobuf.ExtensionRegistryLite extensionRegistry) static GPUOptions.ExperimentalparseFrom(ByteBuffer data) static GPUOptions.ExperimentalparseFrom(ByteBuffer data, com.google.protobuf.ExtensionRegistryLite extensionRegistry) static com.google.protobuf.Parser<GPUOptions.Experimental> parser()voidwriteTo(com.google.protobuf.CodedOutputStream output) Methods inherited from class com.google.protobuf.GeneratedMessageV3
canUseUnsafe, computeStringSize, computeStringSizeNoTag, emptyBooleanList, emptyDoubleList, emptyFloatList, emptyIntList, emptyList, emptyLongList, getAllFields, getDescriptorForType, getField, getOneofFieldDescriptor, getRepeatedField, getRepeatedFieldCount, getUnknownFields, hasField, hasOneof, internalGetMapField, internalGetMapFieldReflection, isStringEmpty, makeExtensionsImmutable, makeMutableCopy, makeMutableCopy, mergeFromAndMakeImmutableInternal, mutableCopy, mutableCopy, mutableCopy, mutableCopy, mutableCopy, newBooleanList, newBuilderForType, newDoubleList, newFloatList, newIntList, newLongList, parseDelimitedWithIOException, parseDelimitedWithIOException, parseUnknownField, parseUnknownFieldProto3, parseWithIOException, parseWithIOException, parseWithIOException, parseWithIOException, serializeBooleanMapTo, serializeIntegerMapTo, serializeLongMapTo, serializeStringMapTo, writeReplace, writeString, writeStringNoTagMethods inherited from class com.google.protobuf.AbstractMessage
findInitializationErrors, getInitializationErrorString, hashBoolean, hashEnum, hashEnumList, hashFields, hashLong, toStringMethods inherited from class com.google.protobuf.AbstractMessageLite
addAll, addAll, checkByteStringIsUtf8, toByteArray, toByteString, writeDelimitedTo, writeToMethods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, waitMethods inherited from interface com.google.protobuf.MessageLite
toByteArray, toByteString, writeDelimitedTo, writeToMethods inherited from interface com.google.protobuf.MessageOrBuilder
findInitializationErrors, getAllFields, getDescriptorForType, getField, getInitializationErrorString, getOneofFieldDescriptor, getRepeatedField, getRepeatedFieldCount, getUnknownFields, hasField, hasOneof
-
Field Details
-
VIRTUAL_DEVICES_FIELD_NUMBER
public static final int VIRTUAL_DEVICES_FIELD_NUMBER- See Also:
-
NUM_VIRTUAL_DEVICES_PER_GPU_FIELD_NUMBER
public static final int NUM_VIRTUAL_DEVICES_PER_GPU_FIELD_NUMBER- See Also:
-
USE_UNIFIED_MEMORY_FIELD_NUMBER
public static final int USE_UNIFIED_MEMORY_FIELD_NUMBER- See Also:
-
NUM_DEV_TO_DEV_COPY_STREAMS_FIELD_NUMBER
public static final int NUM_DEV_TO_DEV_COPY_STREAMS_FIELD_NUMBER- See Also:
-
COLLECTIVE_RING_ORDER_FIELD_NUMBER
public static final int COLLECTIVE_RING_ORDER_FIELD_NUMBER- See Also:
-
TIMESTAMPED_ALLOCATOR_FIELD_NUMBER
public static final int TIMESTAMPED_ALLOCATOR_FIELD_NUMBER- See Also:
-
KERNEL_TRACKER_MAX_INTERVAL_FIELD_NUMBER
public static final int KERNEL_TRACKER_MAX_INTERVAL_FIELD_NUMBER- See Also:
-
KERNEL_TRACKER_MAX_BYTES_FIELD_NUMBER
public static final int KERNEL_TRACKER_MAX_BYTES_FIELD_NUMBER- See Also:
-
KERNEL_TRACKER_MAX_PENDING_FIELD_NUMBER
public static final int KERNEL_TRACKER_MAX_PENDING_FIELD_NUMBER- See Also:
-
INTERNAL_FRAGMENTATION_FRACTION_FIELD_NUMBER
public static final int INTERNAL_FRAGMENTATION_FRACTION_FIELD_NUMBER- See Also:
-
USE_CUDA_MALLOC_ASYNC_FIELD_NUMBER
public static final int USE_CUDA_MALLOC_ASYNC_FIELD_NUMBER- See Also:
-
DISALLOW_RETRY_ON_ALLOCATION_FAILURE_FIELD_NUMBER
public static final int DISALLOW_RETRY_ON_ALLOCATION_FAILURE_FIELD_NUMBER- See Also:
-
GPU_HOST_MEM_LIMIT_IN_MB_FIELD_NUMBER
public static final int GPU_HOST_MEM_LIMIT_IN_MB_FIELD_NUMBER- See Also:
-
GPU_HOST_MEM_DISALLOW_GROWTH_FIELD_NUMBER
public static final int GPU_HOST_MEM_DISALLOW_GROWTH_FIELD_NUMBER- See Also:
-
GPU_SYSTEM_MEMORY_SIZE_IN_MB_FIELD_NUMBER
public static final int GPU_SYSTEM_MEMORY_SIZE_IN_MB_FIELD_NUMBER- See Also:
-
POPULATE_PJRT_GPU_CLIENT_CREATION_INFO_FIELD_NUMBER
public static final int POPULATE_PJRT_GPU_CLIENT_CREATION_INFO_FIELD_NUMBER- See Also:
-
NODE_ID_FIELD_NUMBER
public static final int NODE_ID_FIELD_NUMBER- See Also:
-
STREAM_MERGE_OPTIONS_FIELD_NUMBER
public static final int STREAM_MERGE_OPTIONS_FIELD_NUMBER- See Also:
-
-
Method Details
-
newInstance
- Overrides:
newInstancein classcom.google.protobuf.GeneratedMessageV3
-
getDescriptor
public static final com.google.protobuf.Descriptors.Descriptor getDescriptor() -
internalGetFieldAccessorTable
protected com.google.protobuf.GeneratedMessageV3.FieldAccessorTable internalGetFieldAccessorTable()- Specified by:
internalGetFieldAccessorTablein classcom.google.protobuf.GeneratedMessageV3
-
getVirtualDevicesList
The multi virtual device settings. If empty (not set), it will create single virtual device on each visible GPU, according to the settings in "visible_device_list" above. Otherwise, the number of elements in the list must be the same as the number of visible GPUs (after "visible_device_list" filtering if it is set), and the string represented device names (e.g. /device:GPU:<id>) will refer to the virtual devices and have the <id> field assigned sequentially starting from 0, according to the order of the virtual devices determined by device_ordinal and the location in the virtual device list. For example, visible_device_list = "1,0" virtual_devices { memory_limit: 1GB memory_limit: 2GB } virtual_devices { memory_limit: 3GB memory_limit: 4GB } will create 4 virtual devices as: /device:GPU:0 -> visible GPU 1 with 1GB memory /device:GPU:1 -> visible GPU 1 with 2GB memory /device:GPU:2 -> visible GPU 0 with 3GB memory /device:GPU:3 -> visible GPU 0 with 4GB memory but visible_device_list = "1,0" virtual_devices { memory_limit: 1GB memory_limit: 2GB device_ordinal: 10 device_ordinal: 20} virtual_devices { memory_limit: 3GB memory_limit: 4GB device_ordinal: 10 device_ordinal: 20} will create 4 virtual devices as: /device:GPU:0 -> visible GPU 1 with 1GB memory (ordinal 10) /device:GPU:1 -> visible GPU 0 with 3GB memory (ordinal 10) /device:GPU:2 -> visible GPU 1 with 2GB memory (ordinal 20) /device:GPU:3 -> visible GPU 0 with 4GB memory (ordinal 20) NOTE: 1. It's invalid to set both this and "per_process_gpu_memory_fraction" at the same time. 2. Currently this setting is per-process, not per-session. Using different settings in different sessions within same process will result in undefined behavior.repeated .tensorflow.GPUOptions.Experimental.VirtualDevices virtual_devices = 1;- Specified by:
getVirtualDevicesListin interfaceGPUOptions.ExperimentalOrBuilder
-
getVirtualDevicesOrBuilderList
public List<? extends GPUOptions.Experimental.VirtualDevicesOrBuilder> getVirtualDevicesOrBuilderList()The multi virtual device settings. If empty (not set), it will create single virtual device on each visible GPU, according to the settings in "visible_device_list" above. Otherwise, the number of elements in the list must be the same as the number of visible GPUs (after "visible_device_list" filtering if it is set), and the string represented device names (e.g. /device:GPU:<id>) will refer to the virtual devices and have the <id> field assigned sequentially starting from 0, according to the order of the virtual devices determined by device_ordinal and the location in the virtual device list. For example, visible_device_list = "1,0" virtual_devices { memory_limit: 1GB memory_limit: 2GB } virtual_devices { memory_limit: 3GB memory_limit: 4GB } will create 4 virtual devices as: /device:GPU:0 -> visible GPU 1 with 1GB memory /device:GPU:1 -> visible GPU 1 with 2GB memory /device:GPU:2 -> visible GPU 0 with 3GB memory /device:GPU:3 -> visible GPU 0 with 4GB memory but visible_device_list = "1,0" virtual_devices { memory_limit: 1GB memory_limit: 2GB device_ordinal: 10 device_ordinal: 20} virtual_devices { memory_limit: 3GB memory_limit: 4GB device_ordinal: 10 device_ordinal: 20} will create 4 virtual devices as: /device:GPU:0 -> visible GPU 1 with 1GB memory (ordinal 10) /device:GPU:1 -> visible GPU 0 with 3GB memory (ordinal 10) /device:GPU:2 -> visible GPU 1 with 2GB memory (ordinal 20) /device:GPU:3 -> visible GPU 0 with 4GB memory (ordinal 20) NOTE: 1. It's invalid to set both this and "per_process_gpu_memory_fraction" at the same time. 2. Currently this setting is per-process, not per-session. Using different settings in different sessions within same process will result in undefined behavior.repeated .tensorflow.GPUOptions.Experimental.VirtualDevices virtual_devices = 1;- Specified by:
getVirtualDevicesOrBuilderListin interfaceGPUOptions.ExperimentalOrBuilder
-
getVirtualDevicesCount
public int getVirtualDevicesCount()The multi virtual device settings. If empty (not set), it will create single virtual device on each visible GPU, according to the settings in "visible_device_list" above. Otherwise, the number of elements in the list must be the same as the number of visible GPUs (after "visible_device_list" filtering if it is set), and the string represented device names (e.g. /device:GPU:<id>) will refer to the virtual devices and have the <id> field assigned sequentially starting from 0, according to the order of the virtual devices determined by device_ordinal and the location in the virtual device list. For example, visible_device_list = "1,0" virtual_devices { memory_limit: 1GB memory_limit: 2GB } virtual_devices { memory_limit: 3GB memory_limit: 4GB } will create 4 virtual devices as: /device:GPU:0 -> visible GPU 1 with 1GB memory /device:GPU:1 -> visible GPU 1 with 2GB memory /device:GPU:2 -> visible GPU 0 with 3GB memory /device:GPU:3 -> visible GPU 0 with 4GB memory but visible_device_list = "1,0" virtual_devices { memory_limit: 1GB memory_limit: 2GB device_ordinal: 10 device_ordinal: 20} virtual_devices { memory_limit: 3GB memory_limit: 4GB device_ordinal: 10 device_ordinal: 20} will create 4 virtual devices as: /device:GPU:0 -> visible GPU 1 with 1GB memory (ordinal 10) /device:GPU:1 -> visible GPU 0 with 3GB memory (ordinal 10) /device:GPU:2 -> visible GPU 1 with 2GB memory (ordinal 20) /device:GPU:3 -> visible GPU 0 with 4GB memory (ordinal 20) NOTE: 1. It's invalid to set both this and "per_process_gpu_memory_fraction" at the same time. 2. Currently this setting is per-process, not per-session. Using different settings in different sessions within same process will result in undefined behavior.repeated .tensorflow.GPUOptions.Experimental.VirtualDevices virtual_devices = 1;- Specified by:
getVirtualDevicesCountin interfaceGPUOptions.ExperimentalOrBuilder
-
getVirtualDevices
The multi virtual device settings. If empty (not set), it will create single virtual device on each visible GPU, according to the settings in "visible_device_list" above. Otherwise, the number of elements in the list must be the same as the number of visible GPUs (after "visible_device_list" filtering if it is set), and the string represented device names (e.g. /device:GPU:<id>) will refer to the virtual devices and have the <id> field assigned sequentially starting from 0, according to the order of the virtual devices determined by device_ordinal and the location in the virtual device list. For example, visible_device_list = "1,0" virtual_devices { memory_limit: 1GB memory_limit: 2GB } virtual_devices { memory_limit: 3GB memory_limit: 4GB } will create 4 virtual devices as: /device:GPU:0 -> visible GPU 1 with 1GB memory /device:GPU:1 -> visible GPU 1 with 2GB memory /device:GPU:2 -> visible GPU 0 with 3GB memory /device:GPU:3 -> visible GPU 0 with 4GB memory but visible_device_list = "1,0" virtual_devices { memory_limit: 1GB memory_limit: 2GB device_ordinal: 10 device_ordinal: 20} virtual_devices { memory_limit: 3GB memory_limit: 4GB device_ordinal: 10 device_ordinal: 20} will create 4 virtual devices as: /device:GPU:0 -> visible GPU 1 with 1GB memory (ordinal 10) /device:GPU:1 -> visible GPU 0 with 3GB memory (ordinal 10) /device:GPU:2 -> visible GPU 1 with 2GB memory (ordinal 20) /device:GPU:3 -> visible GPU 0 with 4GB memory (ordinal 20) NOTE: 1. It's invalid to set both this and "per_process_gpu_memory_fraction" at the same time. 2. Currently this setting is per-process, not per-session. Using different settings in different sessions within same process will result in undefined behavior.repeated .tensorflow.GPUOptions.Experimental.VirtualDevices virtual_devices = 1;- Specified by:
getVirtualDevicesin interfaceGPUOptions.ExperimentalOrBuilder
-
getVirtualDevicesOrBuilder
The multi virtual device settings. If empty (not set), it will create single virtual device on each visible GPU, according to the settings in "visible_device_list" above. Otherwise, the number of elements in the list must be the same as the number of visible GPUs (after "visible_device_list" filtering if it is set), and the string represented device names (e.g. /device:GPU:<id>) will refer to the virtual devices and have the <id> field assigned sequentially starting from 0, according to the order of the virtual devices determined by device_ordinal and the location in the virtual device list. For example, visible_device_list = "1,0" virtual_devices { memory_limit: 1GB memory_limit: 2GB } virtual_devices { memory_limit: 3GB memory_limit: 4GB } will create 4 virtual devices as: /device:GPU:0 -> visible GPU 1 with 1GB memory /device:GPU:1 -> visible GPU 1 with 2GB memory /device:GPU:2 -> visible GPU 0 with 3GB memory /device:GPU:3 -> visible GPU 0 with 4GB memory but visible_device_list = "1,0" virtual_devices { memory_limit: 1GB memory_limit: 2GB device_ordinal: 10 device_ordinal: 20} virtual_devices { memory_limit: 3GB memory_limit: 4GB device_ordinal: 10 device_ordinal: 20} will create 4 virtual devices as: /device:GPU:0 -> visible GPU 1 with 1GB memory (ordinal 10) /device:GPU:1 -> visible GPU 0 with 3GB memory (ordinal 10) /device:GPU:2 -> visible GPU 1 with 2GB memory (ordinal 20) /device:GPU:3 -> visible GPU 0 with 4GB memory (ordinal 20) NOTE: 1. It's invalid to set both this and "per_process_gpu_memory_fraction" at the same time. 2. Currently this setting is per-process, not per-session. Using different settings in different sessions within same process will result in undefined behavior.repeated .tensorflow.GPUOptions.Experimental.VirtualDevices virtual_devices = 1;- Specified by:
getVirtualDevicesOrBuilderin interfaceGPUOptions.ExperimentalOrBuilder
-
getNumVirtualDevicesPerGpu
public int getNumVirtualDevicesPerGpu()The number of virtual devices to create on each visible GPU. The available memory will be split equally among all virtual devices. If the field `memory_limit_mb` in `VirtualDevices` is not empty, this field will be ignored.
int32 num_virtual_devices_per_gpu = 15;- Specified by:
getNumVirtualDevicesPerGpuin interfaceGPUOptions.ExperimentalOrBuilder- Returns:
- The numVirtualDevicesPerGpu.
-
getUseUnifiedMemory
public boolean getUseUnifiedMemory()If true, uses CUDA unified memory for memory allocations. If per_process_gpu_memory_fraction option is greater than 1.0, then unified memory is used regardless of the value for this field. See comments for per_process_gpu_memory_fraction field for more details and requirements of the unified memory. This option is useful to oversubscribe memory if multiple processes are sharing a single GPU while individually using less than 1.0 per process memory fraction.
bool use_unified_memory = 2;- Specified by:
getUseUnifiedMemoryin interfaceGPUOptions.ExperimentalOrBuilder- Returns:
- The useUnifiedMemory.
-
getNumDevToDevCopyStreams
public int getNumDevToDevCopyStreams()If > 1, the number of device-to-device copy streams to create for each GPUDevice. Default value is 0, which is automatically converted to 1.
int32 num_dev_to_dev_copy_streams = 3;- Specified by:
getNumDevToDevCopyStreamsin interfaceGPUOptions.ExperimentalOrBuilder- Returns:
- The numDevToDevCopyStreams.
-
getCollectiveRingOrder
If non-empty, defines a good GPU ring order on a single worker based on device interconnect. This assumes that all workers have the same GPU topology. Specify as a comma-separated string, e.g. "3,2,1,0,7,6,5,4". This ring order is used by the RingReducer implementation of CollectiveReduce, and serves as an override to automatic ring order generation in OrderTaskDeviceMap() during CollectiveParam resolution.
string collective_ring_order = 4;- Specified by:
getCollectiveRingOrderin interfaceGPUOptions.ExperimentalOrBuilder- Returns:
- The collectiveRingOrder.
-
getCollectiveRingOrderBytes
public com.google.protobuf.ByteString getCollectiveRingOrderBytes()If non-empty, defines a good GPU ring order on a single worker based on device interconnect. This assumes that all workers have the same GPU topology. Specify as a comma-separated string, e.g. "3,2,1,0,7,6,5,4". This ring order is used by the RingReducer implementation of CollectiveReduce, and serves as an override to automatic ring order generation in OrderTaskDeviceMap() during CollectiveParam resolution.
string collective_ring_order = 4;- Specified by:
getCollectiveRingOrderBytesin interfaceGPUOptions.ExperimentalOrBuilder- Returns:
- The bytes for collectiveRingOrder.
-
getTimestampedAllocator
public boolean getTimestampedAllocator()If true then extra work is done by GPUDevice and GPUBFCAllocator to keep track of when GPU memory is freed and when kernels actually complete so that we can know when a nominally free memory chunk is really not subject to pending use.
bool timestamped_allocator = 5;- Specified by:
getTimestampedAllocatorin interfaceGPUOptions.ExperimentalOrBuilder- Returns:
- The timestampedAllocator.
-
getKernelTrackerMaxInterval
public int getKernelTrackerMaxInterval()Parameters for GPUKernelTracker. By default no kernel tracking is done. Note that timestamped_allocator is only effective if some tracking is specified. If kernel_tracker_max_interval = n > 0, then a tracking event is inserted after every n kernels without an event.
int32 kernel_tracker_max_interval = 7;- Specified by:
getKernelTrackerMaxIntervalin interfaceGPUOptions.ExperimentalOrBuilder- Returns:
- The kernelTrackerMaxInterval.
-
getKernelTrackerMaxBytes
public int getKernelTrackerMaxBytes()If kernel_tracker_max_bytes = n > 0, then a tracking event is inserted after every series of kernels allocating a sum of memory >= n. If one kernel allocates b * n bytes, then one event will be inserted after it, but it will count as b against the pending limit.
int32 kernel_tracker_max_bytes = 8;- Specified by:
getKernelTrackerMaxBytesin interfaceGPUOptions.ExperimentalOrBuilder- Returns:
- The kernelTrackerMaxBytes.
-
getKernelTrackerMaxPending
public int getKernelTrackerMaxPending()If kernel_tracker_max_pending > 0 then no more than this many tracking events can be outstanding at a time. An attempt to launch an additional kernel will stall until an event completes.
int32 kernel_tracker_max_pending = 9;- Specified by:
getKernelTrackerMaxPendingin interfaceGPUOptions.ExperimentalOrBuilder- Returns:
- The kernelTrackerMaxPending.
-
getInternalFragmentationFraction
public double getInternalFragmentationFraction()BFC Allocator can return an allocated chunk of memory upto 2x the requested size. For virtual devices with tight memory constraints, and proportionately large allocation requests, this can lead to a significant reduction in available memory. The threshold below controls when a chunk should be split if the chunk size exceeds requested memory size. It is expressed as a fraction of total available memory for the tf device. For example setting it to 0.05 would imply a chunk needs to be split if its size exceeds the requested memory by 5% of the total virtual device/gpu memory size.
double internal_fragmentation_fraction = 10;- Specified by:
getInternalFragmentationFractionin interfaceGPUOptions.ExperimentalOrBuilder- Returns:
- The internalFragmentationFraction.
-
getUseCudaMallocAsync
public boolean getUseCudaMallocAsync()When true, use CUDA cudaMallocAsync API instead of TF gpu allocator.
bool use_cuda_malloc_async = 11;- Specified by:
getUseCudaMallocAsyncin interfaceGPUOptions.ExperimentalOrBuilder- Returns:
- The useCudaMallocAsync.
-
getDisallowRetryOnAllocationFailure
public boolean getDisallowRetryOnAllocationFailure()By default, BFCAllocator may sleep when it runs out of memory, in the hopes that another thread will free up memory in the meantime. Setting this to true disables the sleep; instead we'll OOM immediately.
bool disallow_retry_on_allocation_failure = 12;- Specified by:
getDisallowRetryOnAllocationFailurein interfaceGPUOptions.ExperimentalOrBuilder- Returns:
- The disallowRetryOnAllocationFailure.
-
getGpuHostMemLimitInMb
public float getGpuHostMemLimitInMb()Memory limit for "GPU host allocator", aka pinned memory allocator. This can also be set via the envvar TF_GPU_HOST_MEM_LIMIT_IN_MB.
float gpu_host_mem_limit_in_mb = 13;- Specified by:
getGpuHostMemLimitInMbin interfaceGPUOptions.ExperimentalOrBuilder- Returns:
- The gpuHostMemLimitInMb.
-
getGpuHostMemDisallowGrowth
public boolean getGpuHostMemDisallowGrowth()If true, then the host allocator allocates its max memory all upfront and never grows. This can be useful for latency-sensitive systems, because growing the GPU host memory pool can be expensive. You probably only want to use this in combination with gpu_host_mem_limit_in_mb, because the default GPU host memory limit is quite high.
bool gpu_host_mem_disallow_growth = 14;- Specified by:
getGpuHostMemDisallowGrowthin interfaceGPUOptions.ExperimentalOrBuilder- Returns:
- The gpuHostMemDisallowGrowth.
-
getGpuSystemMemorySizeInMb
public int getGpuSystemMemorySizeInMb()Memory limit for gpu system. This can also be set by TF_DEVICE_MIN_SYS_MEMORY_IN_MB, which takes precedence over gpu_system_memory_size_in_mb. With this, user can configure the gpu system memory size for better resource estimation of multi-tenancy(one gpu with multiple model) use case.
int32 gpu_system_memory_size_in_mb = 16;- Specified by:
getGpuSystemMemorySizeInMbin interfaceGPUOptions.ExperimentalOrBuilder- Returns:
- The gpuSystemMemorySizeInMb.
-
getPopulatePjrtGpuClientCreationInfo
public boolean getPopulatePjrtGpuClientCreationInfo()If true, save information needed for created a PjRt GPU client for creating a client with remote devices.
bool populate_pjrt_gpu_client_creation_info = 17;- Specified by:
getPopulatePjrtGpuClientCreationInfoin interfaceGPUOptions.ExperimentalOrBuilder- Returns:
- The populatePjrtGpuClientCreationInfo.
-
getNodeId
public int getNodeId()node_id for use when creating a PjRt GPU client with remote devices, which enumerates jobs*tasks from a ServerDef.
int32 node_id = 18;- Specified by:
getNodeIdin interfaceGPUOptions.ExperimentalOrBuilder- Returns:
- The nodeId.
-
hasStreamMergeOptions
public boolean hasStreamMergeOptions().tensorflow.GPUOptions.Experimental.StreamMergeOptions stream_merge_options = 19;- Specified by:
hasStreamMergeOptionsin interfaceGPUOptions.ExperimentalOrBuilder- Returns:
- Whether the streamMergeOptions field is set.
-
getStreamMergeOptions
.tensorflow.GPUOptions.Experimental.StreamMergeOptions stream_merge_options = 19;- Specified by:
getStreamMergeOptionsin interfaceGPUOptions.ExperimentalOrBuilder- Returns:
- The streamMergeOptions.
-
getStreamMergeOptionsOrBuilder
.tensorflow.GPUOptions.Experimental.StreamMergeOptions stream_merge_options = 19;- Specified by:
getStreamMergeOptionsOrBuilderin interfaceGPUOptions.ExperimentalOrBuilder
-
isInitialized
public final boolean isInitialized()- Specified by:
isInitializedin interfacecom.google.protobuf.MessageLiteOrBuilder- Overrides:
isInitializedin classcom.google.protobuf.GeneratedMessageV3
-
writeTo
- Specified by:
writeToin interfacecom.google.protobuf.MessageLite- Overrides:
writeToin classcom.google.protobuf.GeneratedMessageV3- Throws:
IOException
-
getSerializedSize
public int getSerializedSize()- Specified by:
getSerializedSizein interfacecom.google.protobuf.MessageLite- Overrides:
getSerializedSizein classcom.google.protobuf.GeneratedMessageV3
-
equals
- Specified by:
equalsin interfacecom.google.protobuf.Message- Overrides:
equalsin classcom.google.protobuf.AbstractMessage
-
hashCode
public int hashCode()- Specified by:
hashCodein interfacecom.google.protobuf.Message- Overrides:
hashCodein classcom.google.protobuf.AbstractMessage
-
parseFrom
public static GPUOptions.Experimental parseFrom(ByteBuffer data) throws com.google.protobuf.InvalidProtocolBufferException - Throws:
com.google.protobuf.InvalidProtocolBufferException
-
parseFrom
public static GPUOptions.Experimental parseFrom(ByteBuffer data, com.google.protobuf.ExtensionRegistryLite extensionRegistry) throws com.google.protobuf.InvalidProtocolBufferException - Throws:
com.google.protobuf.InvalidProtocolBufferException
-
parseFrom
public static GPUOptions.Experimental parseFrom(com.google.protobuf.ByteString data) throws com.google.protobuf.InvalidProtocolBufferException - Throws:
com.google.protobuf.InvalidProtocolBufferException
-
parseFrom
public static GPUOptions.Experimental parseFrom(com.google.protobuf.ByteString data, com.google.protobuf.ExtensionRegistryLite extensionRegistry) throws com.google.protobuf.InvalidProtocolBufferException - Throws:
com.google.protobuf.InvalidProtocolBufferException
-
parseFrom
public static GPUOptions.Experimental parseFrom(byte[] data) throws com.google.protobuf.InvalidProtocolBufferException - Throws:
com.google.protobuf.InvalidProtocolBufferException
-
parseFrom
public static GPUOptions.Experimental parseFrom(byte[] data, com.google.protobuf.ExtensionRegistryLite extensionRegistry) throws com.google.protobuf.InvalidProtocolBufferException - Throws:
com.google.protobuf.InvalidProtocolBufferException
-
parseFrom
- Throws:
IOException
-
parseFrom
public static GPUOptions.Experimental parseFrom(InputStream input, com.google.protobuf.ExtensionRegistryLite extensionRegistry) throws IOException - Throws:
IOException
-
parseDelimitedFrom
- Throws:
IOException
-
parseDelimitedFrom
public static GPUOptions.Experimental parseDelimitedFrom(InputStream input, com.google.protobuf.ExtensionRegistryLite extensionRegistry) throws IOException - Throws:
IOException
-
parseFrom
public static GPUOptions.Experimental parseFrom(com.google.protobuf.CodedInputStream input) throws IOException - Throws:
IOException
-
parseFrom
public static GPUOptions.Experimental parseFrom(com.google.protobuf.CodedInputStream input, com.google.protobuf.ExtensionRegistryLite extensionRegistry) throws IOException - Throws:
IOException
-
newBuilderForType
- Specified by:
newBuilderForTypein interfacecom.google.protobuf.Message- Specified by:
newBuilderForTypein interfacecom.google.protobuf.MessageLite
-
newBuilder
-
newBuilder
-
toBuilder
- Specified by:
toBuilderin interfacecom.google.protobuf.Message- Specified by:
toBuilderin interfacecom.google.protobuf.MessageLite
-
newBuilderForType
protected GPUOptions.Experimental.Builder newBuilderForType(com.google.protobuf.GeneratedMessageV3.BuilderParent parent) - Specified by:
newBuilderForTypein classcom.google.protobuf.GeneratedMessageV3
-
getDefaultInstance
-
parser
-
getParserForType
- Specified by:
getParserForTypein interfacecom.google.protobuf.Message- Specified by:
getParserForTypein interfacecom.google.protobuf.MessageLite- Overrides:
getParserForTypein classcom.google.protobuf.GeneratedMessageV3
-
getDefaultInstanceForType
- Specified by:
getDefaultInstanceForTypein interfacecom.google.protobuf.MessageLiteOrBuilder- Specified by:
getDefaultInstanceForTypein interfacecom.google.protobuf.MessageOrBuilder
-