All Superinterfaces:: com.google.protobuf.MessageLiteOrBuilder, com.google.protobuf.MessageOrBuilder

All Known Implementing Classes:: GPUOptions.Experimental, GPUOptions.Experimental.Builder

Enclosing class:: GPUOptions

public static interface GPUOptions.ExperimentalOrBuilder extends com.google.protobuf.MessageOrBuilder

Method Summary

Modifier and Type

Method

Description

String

getCollectiveRingOrder()

If non-empty, defines a good GPU ring order on a single worker based on device interconnect.

com.google.protobuf.ByteString

getCollectiveRingOrderBytes()

If non-empty, defines a good GPU ring order on a single worker based on device interconnect.

boolean

getDisallowRetryOnAllocationFailure()

By default, BFCAllocator may sleep when it runs out of memory, in the hopes that another thread will free up memory in the meantime.

boolean

getGpuHostMemDisallowGrowth()

If true, then the host allocator allocates its max memory all upfront and never grows.

float

getGpuHostMemLimitInMb()

Memory limit for "GPU host allocator", aka pinned memory allocator.

int

getGpuSystemMemorySizeInMb()

Memory limit for gpu system.

double

getInternalFragmentationFraction()

BFC Allocator can return an allocated chunk of memory upto 2x the requested size.

int

getKernelTrackerMaxBytes()

If kernel_tracker_max_bytes = n > 0, then a tracking event is inserted after every series of kernels allocating a sum of memory >= n.

int

getKernelTrackerMaxInterval()

Parameters for GPUKernelTracker.

int

getKernelTrackerMaxPending()

If kernel_tracker_max_pending > 0 then no more than this many tracking events can be outstanding at a time.

int

getNodeId()

node_id for use when creating a PjRt GPU client with remote devices, which enumerates jobs*tasks from a ServerDef.

int

getNumDevToDevCopyStreams()

If > 1, the number of device-to-device copy streams to create for each GPUDevice.

int

getNumVirtualDevicesPerGpu()

The number of virtual devices to create on each visible GPU.

boolean

getPopulatePjrtGpuClientCreationInfo()

If true, save information needed for created a PjRt GPU client for creating a client with remote devices.

GPUOptions.Experimental.StreamMergeOptions

getStreamMergeOptions()

.tensorflow.GPUOptions.Experimental.StreamMergeOptions stream_merge_options = 19;

GPUOptions.Experimental.StreamMergeOptionsOrBuilder

getStreamMergeOptionsOrBuilder()

.tensorflow.GPUOptions.Experimental.StreamMergeOptions stream_merge_options = 19;

boolean

getTimestampedAllocator()

If true then extra work is done by GPUDevice and GPUBFCAllocator to keep track of when GPU memory is freed and when kernels actually complete so that we can know when a nominally free memory chunk is really not subject to pending use.

boolean

getUseCudaMallocAsync()

When true, use CUDA cudaMallocAsync API instead of TF gpu allocator.

boolean

getUseUnifiedMemory()

If true, uses CUDA unified memory for memory allocations.

GPUOptions.Experimental.VirtualDevices

getVirtualDevices(int index)

The multi virtual device settings.

int

getVirtualDevicesCount()

The multi virtual device settings.

List<GPUOptions.Experimental.VirtualDevices>

getVirtualDevicesList()

The multi virtual device settings.

GPUOptions.Experimental.VirtualDevicesOrBuilder

getVirtualDevicesOrBuilder(int index)

The multi virtual device settings.

List<? extends GPUOptions.Experimental.VirtualDevicesOrBuilder>

getVirtualDevicesOrBuilderList()

The multi virtual device settings.

boolean

hasStreamMergeOptions()

.tensorflow.GPUOptions.Experimental.StreamMergeOptions stream_merge_options = 19;

Methods inherited from interface com.google.protobuf.MessageLiteOrBuilder
isInitialized

Methods inherited from interface com.google.protobuf.MessageOrBuilder
findInitializationErrors, getAllFields, getDefaultInstanceForType, getDescriptorForType, getField, getInitializationErrorString, getOneofFieldDescriptor, getRepeatedField, getRepeatedFieldCount, getUnknownFields, hasField, hasOneof

Method Details

getVirtualDevicesList

List<GPUOptions.Experimental.VirtualDevices> getVirtualDevicesList()

 The multi virtual device settings. If empty (not set), it will create
 single virtual device on each visible GPU, according to the settings
 in "visible_device_list" above. Otherwise, the number of elements in the
 list must be the same as the number of visible GPUs (after
 "visible_device_list" filtering if it is set), and the string represented
 device names (e.g. /device:GPU:<id>) will refer to the virtual
 devices and have the <id> field assigned sequentially starting from 0,
 according to the order of the virtual devices determined by
 device_ordinal and the location in the virtual device list.

 For example,
   visible_device_list = "1,0"
   virtual_devices { memory_limit: 1GB memory_limit: 2GB }
   virtual_devices { memory_limit: 3GB memory_limit: 4GB }
 will create 4 virtual devices as:
   /device:GPU:0 -> visible GPU 1 with 1GB memory
   /device:GPU:1 -> visible GPU 1 with 2GB memory
   /device:GPU:2 -> visible GPU 0 with 3GB memory
   /device:GPU:3 -> visible GPU 0 with 4GB memory

 but
   visible_device_list = "1,0"
   virtual_devices { memory_limit: 1GB memory_limit: 2GB
                     device_ordinal: 10 device_ordinal: 20}
   virtual_devices { memory_limit: 3GB memory_limit: 4GB
                     device_ordinal: 10 device_ordinal: 20}
 will create 4 virtual devices as:
   /device:GPU:0 -> visible GPU 1 with 1GB memory  (ordinal 10)
   /device:GPU:1 -> visible GPU 0 with 3GB memory  (ordinal 10)
   /device:GPU:2 -> visible GPU 1 with 2GB memory  (ordinal 20)
   /device:GPU:3 -> visible GPU 0 with 4GB memory  (ordinal 20)

 NOTE:
 1. It's invalid to set both this and "per_process_gpu_memory_fraction"
    at the same time.
 2. Currently this setting is per-process, not per-session. Using
    different settings in different sessions within same process will
    result in undefined behavior.

repeated .tensorflow.GPUOptions.Experimental.VirtualDevices virtual_devices = 1;

getVirtualDevices

GPUOptions.Experimental.VirtualDevices getVirtualDevices(int index)

 The multi virtual device settings. If empty (not set), it will create
 single virtual device on each visible GPU, according to the settings
 in "visible_device_list" above. Otherwise, the number of elements in the
 list must be the same as the number of visible GPUs (after
 "visible_device_list" filtering if it is set), and the string represented
 device names (e.g. /device:GPU:<id>) will refer to the virtual
 devices and have the <id> field assigned sequentially starting from 0,
 according to the order of the virtual devices determined by
 device_ordinal and the location in the virtual device list.

 For example,
   visible_device_list = "1,0"
   virtual_devices { memory_limit: 1GB memory_limit: 2GB }
   virtual_devices { memory_limit: 3GB memory_limit: 4GB }
 will create 4 virtual devices as:
   /device:GPU:0 -> visible GPU 1 with 1GB memory
   /device:GPU:1 -> visible GPU 1 with 2GB memory
   /device:GPU:2 -> visible GPU 0 with 3GB memory
   /device:GPU:3 -> visible GPU 0 with 4GB memory

 but
   visible_device_list = "1,0"
   virtual_devices { memory_limit: 1GB memory_limit: 2GB
                     device_ordinal: 10 device_ordinal: 20}
   virtual_devices { memory_limit: 3GB memory_limit: 4GB
                     device_ordinal: 10 device_ordinal: 20}
 will create 4 virtual devices as:
   /device:GPU:0 -> visible GPU 1 with 1GB memory  (ordinal 10)
   /device:GPU:1 -> visible GPU 0 with 3GB memory  (ordinal 10)
   /device:GPU:2 -> visible GPU 1 with 2GB memory  (ordinal 20)
   /device:GPU:3 -> visible GPU 0 with 4GB memory  (ordinal 20)

 NOTE:
 1. It's invalid to set both this and "per_process_gpu_memory_fraction"
    at the same time.
 2. Currently this setting is per-process, not per-session. Using
    different settings in different sessions within same process will
    result in undefined behavior.

repeated .tensorflow.GPUOptions.Experimental.VirtualDevices virtual_devices = 1;

getVirtualDevicesCount

int getVirtualDevicesCount()

 The multi virtual device settings. If empty (not set), it will create
 single virtual device on each visible GPU, according to the settings
 in "visible_device_list" above. Otherwise, the number of elements in the
 list must be the same as the number of visible GPUs (after
 "visible_device_list" filtering if it is set), and the string represented
 device names (e.g. /device:GPU:<id>) will refer to the virtual
 devices and have the <id> field assigned sequentially starting from 0,
 according to the order of the virtual devices determined by
 device_ordinal and the location in the virtual device list.

 For example,
   visible_device_list = "1,0"
   virtual_devices { memory_limit: 1GB memory_limit: 2GB }
   virtual_devices { memory_limit: 3GB memory_limit: 4GB }
 will create 4 virtual devices as:
   /device:GPU:0 -> visible GPU 1 with 1GB memory
   /device:GPU:1 -> visible GPU 1 with 2GB memory
   /device:GPU:2 -> visible GPU 0 with 3GB memory
   /device:GPU:3 -> visible GPU 0 with 4GB memory

 but
   visible_device_list = "1,0"
   virtual_devices { memory_limit: 1GB memory_limit: 2GB
                     device_ordinal: 10 device_ordinal: 20}
   virtual_devices { memory_limit: 3GB memory_limit: 4GB
                     device_ordinal: 10 device_ordinal: 20}
 will create 4 virtual devices as:
   /device:GPU:0 -> visible GPU 1 with 1GB memory  (ordinal 10)
   /device:GPU:1 -> visible GPU 0 with 3GB memory  (ordinal 10)
   /device:GPU:2 -> visible GPU 1 with 2GB memory  (ordinal 20)
   /device:GPU:3 -> visible GPU 0 with 4GB memory  (ordinal 20)

 NOTE:
 1. It's invalid to set both this and "per_process_gpu_memory_fraction"
    at the same time.
 2. Currently this setting is per-process, not per-session. Using
    different settings in different sessions within same process will
    result in undefined behavior.

repeated .tensorflow.GPUOptions.Experimental.VirtualDevices virtual_devices = 1;

getVirtualDevicesOrBuilderList

List<? extends GPUOptions.Experimental.VirtualDevicesOrBuilder> getVirtualDevicesOrBuilderList()

 The multi virtual device settings. If empty (not set), it will create
 single virtual device on each visible GPU, according to the settings
 in "visible_device_list" above. Otherwise, the number of elements in the
 list must be the same as the number of visible GPUs (after
 "visible_device_list" filtering if it is set), and the string represented
 device names (e.g. /device:GPU:<id>) will refer to the virtual
 devices and have the <id> field assigned sequentially starting from 0,
 according to the order of the virtual devices determined by
 device_ordinal and the location in the virtual device list.

 For example,
   visible_device_list = "1,0"
   virtual_devices { memory_limit: 1GB memory_limit: 2GB }
   virtual_devices { memory_limit: 3GB memory_limit: 4GB }
 will create 4 virtual devices as:
   /device:GPU:0 -> visible GPU 1 with 1GB memory
   /device:GPU:1 -> visible GPU 1 with 2GB memory
   /device:GPU:2 -> visible GPU 0 with 3GB memory
   /device:GPU:3 -> visible GPU 0 with 4GB memory

 but
   visible_device_list = "1,0"
   virtual_devices { memory_limit: 1GB memory_limit: 2GB
                     device_ordinal: 10 device_ordinal: 20}
   virtual_devices { memory_limit: 3GB memory_limit: 4GB
                     device_ordinal: 10 device_ordinal: 20}
 will create 4 virtual devices as:
   /device:GPU:0 -> visible GPU 1 with 1GB memory  (ordinal 10)
   /device:GPU:1 -> visible GPU 0 with 3GB memory  (ordinal 10)
   /device:GPU:2 -> visible GPU 1 with 2GB memory  (ordinal 20)
   /device:GPU:3 -> visible GPU 0 with 4GB memory  (ordinal 20)

 NOTE:
 1. It's invalid to set both this and "per_process_gpu_memory_fraction"
    at the same time.
 2. Currently this setting is per-process, not per-session. Using
    different settings in different sessions within same process will
    result in undefined behavior.

repeated .tensorflow.GPUOptions.Experimental.VirtualDevices virtual_devices = 1;

getVirtualDevicesOrBuilder

GPUOptions.Experimental.VirtualDevicesOrBuilder getVirtualDevicesOrBuilder(int index)

 The multi virtual device settings. If empty (not set), it will create
 single virtual device on each visible GPU, according to the settings
 in "visible_device_list" above. Otherwise, the number of elements in the
 list must be the same as the number of visible GPUs (after
 "visible_device_list" filtering if it is set), and the string represented
 device names (e.g. /device:GPU:<id>) will refer to the virtual
 devices and have the <id> field assigned sequentially starting from 0,
 according to the order of the virtual devices determined by
 device_ordinal and the location in the virtual device list.

 For example,
   visible_device_list = "1,0"
   virtual_devices { memory_limit: 1GB memory_limit: 2GB }
   virtual_devices { memory_limit: 3GB memory_limit: 4GB }
 will create 4 virtual devices as:
   /device:GPU:0 -> visible GPU 1 with 1GB memory
   /device:GPU:1 -> visible GPU 1 with 2GB memory
   /device:GPU:2 -> visible GPU 0 with 3GB memory
   /device:GPU:3 -> visible GPU 0 with 4GB memory

 but
   visible_device_list = "1,0"
   virtual_devices { memory_limit: 1GB memory_limit: 2GB
                     device_ordinal: 10 device_ordinal: 20}
   virtual_devices { memory_limit: 3GB memory_limit: 4GB
                     device_ordinal: 10 device_ordinal: 20}
 will create 4 virtual devices as:
   /device:GPU:0 -> visible GPU 1 with 1GB memory  (ordinal 10)
   /device:GPU:1 -> visible GPU 0 with 3GB memory  (ordinal 10)
   /device:GPU:2 -> visible GPU 1 with 2GB memory  (ordinal 20)
   /device:GPU:3 -> visible GPU 0 with 4GB memory  (ordinal 20)

 NOTE:
 1. It's invalid to set both this and "per_process_gpu_memory_fraction"
    at the same time.
 2. Currently this setting is per-process, not per-session. Using
    different settings in different sessions within same process will
    result in undefined behavior.

repeated .tensorflow.GPUOptions.Experimental.VirtualDevices virtual_devices = 1;

getNumVirtualDevicesPerGpu

int getNumVirtualDevicesPerGpu()

 The number of virtual devices to create on each visible GPU. The
 available memory will be split equally among all virtual devices. If the
 field `memory_limit_mb` in `VirtualDevices` is not empty, this field will
 be ignored.

int32 num_virtual_devices_per_gpu = 15;

Returns:: The numVirtualDevicesPerGpu.

getUseUnifiedMemory

boolean getUseUnifiedMemory()

 If true, uses CUDA unified memory for memory allocations. If
 per_process_gpu_memory_fraction option is greater than 1.0, then unified
 memory is used regardless of the value for this field. See comments for
 per_process_gpu_memory_fraction field for more details and requirements
 of the unified memory. This option is useful to oversubscribe memory if
 multiple processes are sharing a single GPU while individually using less
 than 1.0 per process memory fraction.

bool use_unified_memory = 2;

Returns:: The useUnifiedMemory.

getNumDevToDevCopyStreams

int getNumDevToDevCopyStreams()

 If > 1, the number of device-to-device copy streams to create
 for each GPUDevice.  Default value is 0, which is automatically
 converted to 1.

int32 num_dev_to_dev_copy_streams = 3;

Returns:: The numDevToDevCopyStreams.

getCollectiveRingOrder

String getCollectiveRingOrder()

 If non-empty, defines a good GPU ring order on a single worker based on
 device interconnect.  This assumes that all workers have the same GPU
 topology.  Specify as a comma-separated string, e.g. "3,2,1,0,7,6,5,4".
 This ring order is used by the RingReducer implementation of
 CollectiveReduce, and serves as an override to automatic ring order
 generation in OrderTaskDeviceMap() during CollectiveParam resolution.

string collective_ring_order = 4;

Returns:: The collectiveRingOrder.

getCollectiveRingOrderBytes

com.google.protobuf.ByteString getCollectiveRingOrderBytes()

 If non-empty, defines a good GPU ring order on a single worker based on
 device interconnect.  This assumes that all workers have the same GPU
 topology.  Specify as a comma-separated string, e.g. "3,2,1,0,7,6,5,4".
 This ring order is used by the RingReducer implementation of
 CollectiveReduce, and serves as an override to automatic ring order
 generation in OrderTaskDeviceMap() during CollectiveParam resolution.

string collective_ring_order = 4;

Returns:: The bytes for collectiveRingOrder.

getTimestampedAllocator

boolean getTimestampedAllocator()

 If true then extra work is done by GPUDevice and GPUBFCAllocator to
 keep track of when GPU memory is freed and when kernels actually
 complete so that we can know when a nominally free memory chunk
 is really not subject to pending use.

bool timestamped_allocator = 5;

Returns:: The timestampedAllocator.

getKernelTrackerMaxInterval

int getKernelTrackerMaxInterval()

 Parameters for GPUKernelTracker.  By default no kernel tracking is done.
 Note that timestamped_allocator is only effective if some tracking is
 specified.

 If kernel_tracker_max_interval = n > 0, then a tracking event
 is inserted after every n kernels without an event.

int32 kernel_tracker_max_interval = 7;

Returns:: The kernelTrackerMaxInterval.

getKernelTrackerMaxBytes

int getKernelTrackerMaxBytes()

 If kernel_tracker_max_bytes = n > 0, then a tracking event is
 inserted after every series of kernels allocating a sum of
 memory >= n.  If one kernel allocates b * n bytes, then one
 event will be inserted after it, but it will count as b against
 the pending limit.

int32 kernel_tracker_max_bytes = 8;

Returns:: The kernelTrackerMaxBytes.

getKernelTrackerMaxPending

int getKernelTrackerMaxPending()

 If kernel_tracker_max_pending > 0 then no more than this many
 tracking events can be outstanding at a time.  An attempt to
 launch an additional kernel will stall until an event
 completes.

int32 kernel_tracker_max_pending = 9;

Returns:: The kernelTrackerMaxPending.

getInternalFragmentationFraction

double getInternalFragmentationFraction()

 BFC Allocator can return an allocated chunk of memory upto 2x the
 requested size. For virtual devices with tight memory constraints, and
 proportionately large allocation requests, this can lead to a significant
 reduction in available memory. The threshold below controls when a chunk
 should be split if the chunk size exceeds requested memory size. It is
 expressed as a fraction of total available memory for the tf device. For
 example setting it to 0.05 would imply a chunk needs to be split if its
 size exceeds the requested memory by 5% of the total virtual device/gpu
 memory size.

double internal_fragmentation_fraction = 10;

Returns:: The internalFragmentationFraction.

getUseCudaMallocAsync

boolean getUseCudaMallocAsync()
```
 When true, use CUDA cudaMallocAsync API instead of TF gpu allocator.
 
```
bool use_cuda_malloc_async = 11;
Returns:

The useCudaMallocAsync.

getDisallowRetryOnAllocationFailure

boolean getDisallowRetryOnAllocationFailure()

 By default, BFCAllocator may sleep when it runs out of memory, in the
 hopes that another thread will free up memory in the meantime.  Setting
 this to true disables the sleep; instead we'll OOM immediately.

bool disallow_retry_on_allocation_failure = 12;

Returns:: The disallowRetryOnAllocationFailure.

getGpuHostMemLimitInMb

float getGpuHostMemLimitInMb()

 Memory limit for "GPU host allocator", aka pinned memory allocator.  This
 can also be set via the envvar TF_GPU_HOST_MEM_LIMIT_IN_MB.

float gpu_host_mem_limit_in_mb = 13;

Returns:: The gpuHostMemLimitInMb.

getGpuHostMemDisallowGrowth

boolean getGpuHostMemDisallowGrowth()

 If true, then the host allocator allocates its max memory all upfront and
 never grows.  This can be useful for latency-sensitive systems, because
 growing the GPU host memory pool can be expensive.

 You probably only want to use this in combination with
 gpu_host_mem_limit_in_mb, because the default GPU host memory limit is
 quite high.

bool gpu_host_mem_disallow_growth = 14;

Returns:: The gpuHostMemDisallowGrowth.

getGpuSystemMemorySizeInMb

int getGpuSystemMemorySizeInMb()

 Memory limit for gpu system. This can also be set by
 TF_DEVICE_MIN_SYS_MEMORY_IN_MB, which takes precedence over
 gpu_system_memory_size_in_mb. With this, user can configure the gpu
 system memory size for better resource estimation of multi-tenancy(one
 gpu with multiple model) use case.

int32 gpu_system_memory_size_in_mb = 16;

Returns:: The gpuSystemMemorySizeInMb.

getPopulatePjrtGpuClientCreationInfo

boolean getPopulatePjrtGpuClientCreationInfo()
```
 If true, save information needed for created a PjRt GPU client for
 creating a client with remote devices.
 
```
bool populate_pjrt_gpu_client_creation_info = 17;
Returns:

The populatePjrtGpuClientCreationInfo.

getNodeId

int getNodeId()

 node_id for use when creating a PjRt GPU client with remote devices,
 which enumerates jobs*tasks from a ServerDef.

int32 node_id = 18;

Returns:: The nodeId.

hasStreamMergeOptions

boolean hasStreamMergeOptions()

.tensorflow.GPUOptions.Experimental.StreamMergeOptions stream_merge_options = 19;

Returns:

Whether the streamMergeOptions field is set.
getStreamMergeOptions

GPUOptions.Experimental.StreamMergeOptions getStreamMergeOptions()

.tensorflow.GPUOptions.Experimental.StreamMergeOptions stream_merge_options = 19;

Returns:

The streamMergeOptions.
getStreamMergeOptionsOrBuilder

GPUOptions.Experimental.StreamMergeOptionsOrBuilder getStreamMergeOptionsOrBuilder()

.tensorflow.GPUOptions.Experimental.StreamMergeOptions stream_merge_options = 19;

Interface GPUOptions.ExperimentalOrBuilder

Method Summary

Methods inherited from interface com.google.protobuf.MessageLiteOrBuilder

Methods inherited from interface com.google.protobuf.MessageOrBuilder

Method Details

getVirtualDevicesList

getVirtualDevices

getVirtualDevicesCount

getVirtualDevicesOrBuilderList

getVirtualDevicesOrBuilder

getNumVirtualDevicesPerGpu

getUseUnifiedMemory

getNumDevToDevCopyStreams

getCollectiveRingOrder

getCollectiveRingOrderBytes

getTimestampedAllocator

getKernelTrackerMaxInterval

getKernelTrackerMaxBytes

getKernelTrackerMaxPending

getInternalFragmentationFraction

getUseCudaMallocAsync

getDisallowRetryOnAllocationFailure

getGpuHostMemLimitInMb

getGpuHostMemDisallowGrowth

getGpuSystemMemorySizeInMb

getPopulatePjrtGpuClientCreationInfo

getNodeId

hasStreamMergeOptions

getStreamMergeOptions

getStreamMergeOptionsOrBuilder