Package org.tensorflow.framework
Interface GPUOptions.ExperimentalOrBuilder
- All Superinterfaces:
com.google.protobuf.MessageLiteOrBuilder,com.google.protobuf.MessageOrBuilder
- All Known Implementing Classes:
GPUOptions.Experimental,GPUOptions.Experimental.Builder
- Enclosing class:
GPUOptions
public static interface GPUOptions.ExperimentalOrBuilder
extends com.google.protobuf.MessageOrBuilder
-
Method Summary
Modifier and TypeMethodDescriptionIf non-empty, defines a good GPU ring order on a single worker based on device interconnect.com.google.protobuf.ByteStringIf non-empty, defines a good GPU ring order on a single worker based on device interconnect.booleanBy default, BFCAllocator may sleep when it runs out of memory, in the hopes that another thread will free up memory in the meantime.booleanIf true, then the host allocator allocates its max memory all upfront and never grows.floatMemory limit for "GPU host allocator", aka pinned memory allocator.intMemory limit for gpu system.doubleBFC Allocator can return an allocated chunk of memory upto 2x the requested size.intIf kernel_tracker_max_bytes = n > 0, then a tracking event is inserted after every series of kernels allocating a sum of memory >= n.intParameters for GPUKernelTracker.intIf kernel_tracker_max_pending > 0 then no more than this many tracking events can be outstanding at a time.intnode_id for use when creating a PjRt GPU client with remote devices, which enumerates jobs*tasks from a ServerDef.intIf > 1, the number of device-to-device copy streams to create for each GPUDevice.intThe number of virtual devices to create on each visible GPU.booleanIf true, save information needed for created a PjRt GPU client for creating a client with remote devices..tensorflow.GPUOptions.Experimental.StreamMergeOptions stream_merge_options = 19;.tensorflow.GPUOptions.Experimental.StreamMergeOptions stream_merge_options = 19;booleanIf true then extra work is done by GPUDevice and GPUBFCAllocator to keep track of when GPU memory is freed and when kernels actually complete so that we can know when a nominally free memory chunk is really not subject to pending use.booleanWhen true, use CUDA cudaMallocAsync API instead of TF gpu allocator.booleanIf true, uses CUDA unified memory for memory allocations.getVirtualDevices(int index) The multi virtual device settings.intThe multi virtual device settings.The multi virtual device settings.getVirtualDevicesOrBuilder(int index) The multi virtual device settings.The multi virtual device settings.boolean.tensorflow.GPUOptions.Experimental.StreamMergeOptions stream_merge_options = 19;Methods inherited from interface com.google.protobuf.MessageLiteOrBuilder
isInitializedMethods inherited from interface com.google.protobuf.MessageOrBuilder
findInitializationErrors, getAllFields, getDefaultInstanceForType, getDescriptorForType, getField, getInitializationErrorString, getOneofFieldDescriptor, getRepeatedField, getRepeatedFieldCount, getUnknownFields, hasField, hasOneof
-
Method Details
-
getVirtualDevicesList
List<GPUOptions.Experimental.VirtualDevices> getVirtualDevicesList()The multi virtual device settings. If empty (not set), it will create single virtual device on each visible GPU, according to the settings in "visible_device_list" above. Otherwise, the number of elements in the list must be the same as the number of visible GPUs (after "visible_device_list" filtering if it is set), and the string represented device names (e.g. /device:GPU:<id>) will refer to the virtual devices and have the <id> field assigned sequentially starting from 0, according to the order of the virtual devices determined by device_ordinal and the location in the virtual device list. For example, visible_device_list = "1,0" virtual_devices { memory_limit: 1GB memory_limit: 2GB } virtual_devices { memory_limit: 3GB memory_limit: 4GB } will create 4 virtual devices as: /device:GPU:0 -> visible GPU 1 with 1GB memory /device:GPU:1 -> visible GPU 1 with 2GB memory /device:GPU:2 -> visible GPU 0 with 3GB memory /device:GPU:3 -> visible GPU 0 with 4GB memory but visible_device_list = "1,0" virtual_devices { memory_limit: 1GB memory_limit: 2GB device_ordinal: 10 device_ordinal: 20} virtual_devices { memory_limit: 3GB memory_limit: 4GB device_ordinal: 10 device_ordinal: 20} will create 4 virtual devices as: /device:GPU:0 -> visible GPU 1 with 1GB memory (ordinal 10) /device:GPU:1 -> visible GPU 0 with 3GB memory (ordinal 10) /device:GPU:2 -> visible GPU 1 with 2GB memory (ordinal 20) /device:GPU:3 -> visible GPU 0 with 4GB memory (ordinal 20) NOTE: 1. It's invalid to set both this and "per_process_gpu_memory_fraction" at the same time. 2. Currently this setting is per-process, not per-session. Using different settings in different sessions within same process will result in undefined behavior.repeated .tensorflow.GPUOptions.Experimental.VirtualDevices virtual_devices = 1; -
getVirtualDevices
The multi virtual device settings. If empty (not set), it will create single virtual device on each visible GPU, according to the settings in "visible_device_list" above. Otherwise, the number of elements in the list must be the same as the number of visible GPUs (after "visible_device_list" filtering if it is set), and the string represented device names (e.g. /device:GPU:<id>) will refer to the virtual devices and have the <id> field assigned sequentially starting from 0, according to the order of the virtual devices determined by device_ordinal and the location in the virtual device list. For example, visible_device_list = "1,0" virtual_devices { memory_limit: 1GB memory_limit: 2GB } virtual_devices { memory_limit: 3GB memory_limit: 4GB } will create 4 virtual devices as: /device:GPU:0 -> visible GPU 1 with 1GB memory /device:GPU:1 -> visible GPU 1 with 2GB memory /device:GPU:2 -> visible GPU 0 with 3GB memory /device:GPU:3 -> visible GPU 0 with 4GB memory but visible_device_list = "1,0" virtual_devices { memory_limit: 1GB memory_limit: 2GB device_ordinal: 10 device_ordinal: 20} virtual_devices { memory_limit: 3GB memory_limit: 4GB device_ordinal: 10 device_ordinal: 20} will create 4 virtual devices as: /device:GPU:0 -> visible GPU 1 with 1GB memory (ordinal 10) /device:GPU:1 -> visible GPU 0 with 3GB memory (ordinal 10) /device:GPU:2 -> visible GPU 1 with 2GB memory (ordinal 20) /device:GPU:3 -> visible GPU 0 with 4GB memory (ordinal 20) NOTE: 1. It's invalid to set both this and "per_process_gpu_memory_fraction" at the same time. 2. Currently this setting is per-process, not per-session. Using different settings in different sessions within same process will result in undefined behavior.repeated .tensorflow.GPUOptions.Experimental.VirtualDevices virtual_devices = 1; -
getVirtualDevicesCount
int getVirtualDevicesCount()The multi virtual device settings. If empty (not set), it will create single virtual device on each visible GPU, according to the settings in "visible_device_list" above. Otherwise, the number of elements in the list must be the same as the number of visible GPUs (after "visible_device_list" filtering if it is set), and the string represented device names (e.g. /device:GPU:<id>) will refer to the virtual devices and have the <id> field assigned sequentially starting from 0, according to the order of the virtual devices determined by device_ordinal and the location in the virtual device list. For example, visible_device_list = "1,0" virtual_devices { memory_limit: 1GB memory_limit: 2GB } virtual_devices { memory_limit: 3GB memory_limit: 4GB } will create 4 virtual devices as: /device:GPU:0 -> visible GPU 1 with 1GB memory /device:GPU:1 -> visible GPU 1 with 2GB memory /device:GPU:2 -> visible GPU 0 with 3GB memory /device:GPU:3 -> visible GPU 0 with 4GB memory but visible_device_list = "1,0" virtual_devices { memory_limit: 1GB memory_limit: 2GB device_ordinal: 10 device_ordinal: 20} virtual_devices { memory_limit: 3GB memory_limit: 4GB device_ordinal: 10 device_ordinal: 20} will create 4 virtual devices as: /device:GPU:0 -> visible GPU 1 with 1GB memory (ordinal 10) /device:GPU:1 -> visible GPU 0 with 3GB memory (ordinal 10) /device:GPU:2 -> visible GPU 1 with 2GB memory (ordinal 20) /device:GPU:3 -> visible GPU 0 with 4GB memory (ordinal 20) NOTE: 1. It's invalid to set both this and "per_process_gpu_memory_fraction" at the same time. 2. Currently this setting is per-process, not per-session. Using different settings in different sessions within same process will result in undefined behavior.repeated .tensorflow.GPUOptions.Experimental.VirtualDevices virtual_devices = 1; -
getVirtualDevicesOrBuilderList
List<? extends GPUOptions.Experimental.VirtualDevicesOrBuilder> getVirtualDevicesOrBuilderList()The multi virtual device settings. If empty (not set), it will create single virtual device on each visible GPU, according to the settings in "visible_device_list" above. Otherwise, the number of elements in the list must be the same as the number of visible GPUs (after "visible_device_list" filtering if it is set), and the string represented device names (e.g. /device:GPU:<id>) will refer to the virtual devices and have the <id> field assigned sequentially starting from 0, according to the order of the virtual devices determined by device_ordinal and the location in the virtual device list. For example, visible_device_list = "1,0" virtual_devices { memory_limit: 1GB memory_limit: 2GB } virtual_devices { memory_limit: 3GB memory_limit: 4GB } will create 4 virtual devices as: /device:GPU:0 -> visible GPU 1 with 1GB memory /device:GPU:1 -> visible GPU 1 with 2GB memory /device:GPU:2 -> visible GPU 0 with 3GB memory /device:GPU:3 -> visible GPU 0 with 4GB memory but visible_device_list = "1,0" virtual_devices { memory_limit: 1GB memory_limit: 2GB device_ordinal: 10 device_ordinal: 20} virtual_devices { memory_limit: 3GB memory_limit: 4GB device_ordinal: 10 device_ordinal: 20} will create 4 virtual devices as: /device:GPU:0 -> visible GPU 1 with 1GB memory (ordinal 10) /device:GPU:1 -> visible GPU 0 with 3GB memory (ordinal 10) /device:GPU:2 -> visible GPU 1 with 2GB memory (ordinal 20) /device:GPU:3 -> visible GPU 0 with 4GB memory (ordinal 20) NOTE: 1. It's invalid to set both this and "per_process_gpu_memory_fraction" at the same time. 2. Currently this setting is per-process, not per-session. Using different settings in different sessions within same process will result in undefined behavior.repeated .tensorflow.GPUOptions.Experimental.VirtualDevices virtual_devices = 1; -
getVirtualDevicesOrBuilder
The multi virtual device settings. If empty (not set), it will create single virtual device on each visible GPU, according to the settings in "visible_device_list" above. Otherwise, the number of elements in the list must be the same as the number of visible GPUs (after "visible_device_list" filtering if it is set), and the string represented device names (e.g. /device:GPU:<id>) will refer to the virtual devices and have the <id> field assigned sequentially starting from 0, according to the order of the virtual devices determined by device_ordinal and the location in the virtual device list. For example, visible_device_list = "1,0" virtual_devices { memory_limit: 1GB memory_limit: 2GB } virtual_devices { memory_limit: 3GB memory_limit: 4GB } will create 4 virtual devices as: /device:GPU:0 -> visible GPU 1 with 1GB memory /device:GPU:1 -> visible GPU 1 with 2GB memory /device:GPU:2 -> visible GPU 0 with 3GB memory /device:GPU:3 -> visible GPU 0 with 4GB memory but visible_device_list = "1,0" virtual_devices { memory_limit: 1GB memory_limit: 2GB device_ordinal: 10 device_ordinal: 20} virtual_devices { memory_limit: 3GB memory_limit: 4GB device_ordinal: 10 device_ordinal: 20} will create 4 virtual devices as: /device:GPU:0 -> visible GPU 1 with 1GB memory (ordinal 10) /device:GPU:1 -> visible GPU 0 with 3GB memory (ordinal 10) /device:GPU:2 -> visible GPU 1 with 2GB memory (ordinal 20) /device:GPU:3 -> visible GPU 0 with 4GB memory (ordinal 20) NOTE: 1. It's invalid to set both this and "per_process_gpu_memory_fraction" at the same time. 2. Currently this setting is per-process, not per-session. Using different settings in different sessions within same process will result in undefined behavior.repeated .tensorflow.GPUOptions.Experimental.VirtualDevices virtual_devices = 1; -
getNumVirtualDevicesPerGpu
int getNumVirtualDevicesPerGpu()The number of virtual devices to create on each visible GPU. The available memory will be split equally among all virtual devices. If the field `memory_limit_mb` in `VirtualDevices` is not empty, this field will be ignored.
int32 num_virtual_devices_per_gpu = 15;- Returns:
- The numVirtualDevicesPerGpu.
-
getUseUnifiedMemory
boolean getUseUnifiedMemory()If true, uses CUDA unified memory for memory allocations. If per_process_gpu_memory_fraction option is greater than 1.0, then unified memory is used regardless of the value for this field. See comments for per_process_gpu_memory_fraction field for more details and requirements of the unified memory. This option is useful to oversubscribe memory if multiple processes are sharing a single GPU while individually using less than 1.0 per process memory fraction.
bool use_unified_memory = 2;- Returns:
- The useUnifiedMemory.
-
getNumDevToDevCopyStreams
int getNumDevToDevCopyStreams()If > 1, the number of device-to-device copy streams to create for each GPUDevice. Default value is 0, which is automatically converted to 1.
int32 num_dev_to_dev_copy_streams = 3;- Returns:
- The numDevToDevCopyStreams.
-
getCollectiveRingOrder
String getCollectiveRingOrder()If non-empty, defines a good GPU ring order on a single worker based on device interconnect. This assumes that all workers have the same GPU topology. Specify as a comma-separated string, e.g. "3,2,1,0,7,6,5,4". This ring order is used by the RingReducer implementation of CollectiveReduce, and serves as an override to automatic ring order generation in OrderTaskDeviceMap() during CollectiveParam resolution.
string collective_ring_order = 4;- Returns:
- The collectiveRingOrder.
-
getCollectiveRingOrderBytes
com.google.protobuf.ByteString getCollectiveRingOrderBytes()If non-empty, defines a good GPU ring order on a single worker based on device interconnect. This assumes that all workers have the same GPU topology. Specify as a comma-separated string, e.g. "3,2,1,0,7,6,5,4". This ring order is used by the RingReducer implementation of CollectiveReduce, and serves as an override to automatic ring order generation in OrderTaskDeviceMap() during CollectiveParam resolution.
string collective_ring_order = 4;- Returns:
- The bytes for collectiveRingOrder.
-
getTimestampedAllocator
boolean getTimestampedAllocator()If true then extra work is done by GPUDevice and GPUBFCAllocator to keep track of when GPU memory is freed and when kernels actually complete so that we can know when a nominally free memory chunk is really not subject to pending use.
bool timestamped_allocator = 5;- Returns:
- The timestampedAllocator.
-
getKernelTrackerMaxInterval
int getKernelTrackerMaxInterval()Parameters for GPUKernelTracker. By default no kernel tracking is done. Note that timestamped_allocator is only effective if some tracking is specified. If kernel_tracker_max_interval = n > 0, then a tracking event is inserted after every n kernels without an event.
int32 kernel_tracker_max_interval = 7;- Returns:
- The kernelTrackerMaxInterval.
-
getKernelTrackerMaxBytes
int getKernelTrackerMaxBytes()If kernel_tracker_max_bytes = n > 0, then a tracking event is inserted after every series of kernels allocating a sum of memory >= n. If one kernel allocates b * n bytes, then one event will be inserted after it, but it will count as b against the pending limit.
int32 kernel_tracker_max_bytes = 8;- Returns:
- The kernelTrackerMaxBytes.
-
getKernelTrackerMaxPending
int getKernelTrackerMaxPending()If kernel_tracker_max_pending > 0 then no more than this many tracking events can be outstanding at a time. An attempt to launch an additional kernel will stall until an event completes.
int32 kernel_tracker_max_pending = 9;- Returns:
- The kernelTrackerMaxPending.
-
getInternalFragmentationFraction
double getInternalFragmentationFraction()BFC Allocator can return an allocated chunk of memory upto 2x the requested size. For virtual devices with tight memory constraints, and proportionately large allocation requests, this can lead to a significant reduction in available memory. The threshold below controls when a chunk should be split if the chunk size exceeds requested memory size. It is expressed as a fraction of total available memory for the tf device. For example setting it to 0.05 would imply a chunk needs to be split if its size exceeds the requested memory by 5% of the total virtual device/gpu memory size.
double internal_fragmentation_fraction = 10;- Returns:
- The internalFragmentationFraction.
-
getUseCudaMallocAsync
boolean getUseCudaMallocAsync()When true, use CUDA cudaMallocAsync API instead of TF gpu allocator.
bool use_cuda_malloc_async = 11;- Returns:
- The useCudaMallocAsync.
-
getDisallowRetryOnAllocationFailure
boolean getDisallowRetryOnAllocationFailure()By default, BFCAllocator may sleep when it runs out of memory, in the hopes that another thread will free up memory in the meantime. Setting this to true disables the sleep; instead we'll OOM immediately.
bool disallow_retry_on_allocation_failure = 12;- Returns:
- The disallowRetryOnAllocationFailure.
-
getGpuHostMemLimitInMb
float getGpuHostMemLimitInMb()Memory limit for "GPU host allocator", aka pinned memory allocator. This can also be set via the envvar TF_GPU_HOST_MEM_LIMIT_IN_MB.
float gpu_host_mem_limit_in_mb = 13;- Returns:
- The gpuHostMemLimitInMb.
-
getGpuHostMemDisallowGrowth
boolean getGpuHostMemDisallowGrowth()If true, then the host allocator allocates its max memory all upfront and never grows. This can be useful for latency-sensitive systems, because growing the GPU host memory pool can be expensive. You probably only want to use this in combination with gpu_host_mem_limit_in_mb, because the default GPU host memory limit is quite high.
bool gpu_host_mem_disallow_growth = 14;- Returns:
- The gpuHostMemDisallowGrowth.
-
getGpuSystemMemorySizeInMb
int getGpuSystemMemorySizeInMb()Memory limit for gpu system. This can also be set by TF_DEVICE_MIN_SYS_MEMORY_IN_MB, which takes precedence over gpu_system_memory_size_in_mb. With this, user can configure the gpu system memory size for better resource estimation of multi-tenancy(one gpu with multiple model) use case.
int32 gpu_system_memory_size_in_mb = 16;- Returns:
- The gpuSystemMemorySizeInMb.
-
getPopulatePjrtGpuClientCreationInfo
boolean getPopulatePjrtGpuClientCreationInfo()If true, save information needed for created a PjRt GPU client for creating a client with remote devices.
bool populate_pjrt_gpu_client_creation_info = 17;- Returns:
- The populatePjrtGpuClientCreationInfo.
-
getNodeId
int getNodeId()node_id for use when creating a PjRt GPU client with remote devices, which enumerates jobs*tasks from a ServerDef.
int32 node_id = 18;- Returns:
- The nodeId.
-
hasStreamMergeOptions
boolean hasStreamMergeOptions().tensorflow.GPUOptions.Experimental.StreamMergeOptions stream_merge_options = 19;- Returns:
- Whether the streamMergeOptions field is set.
-
getStreamMergeOptions
GPUOptions.Experimental.StreamMergeOptions getStreamMergeOptions().tensorflow.GPUOptions.Experimental.StreamMergeOptions stream_merge_options = 19;- Returns:
- The streamMergeOptions.
-
getStreamMergeOptionsOrBuilder
GPUOptions.Experimental.StreamMergeOptionsOrBuilder getStreamMergeOptionsOrBuilder().tensorflow.GPUOptions.Experimental.StreamMergeOptions stream_merge_options = 19;
-