Package org.nd4j.autodiff.samediff.optimize.optimizations
-
Class Summary Class Description BaseOptimizerSet ConstantFunctionOptimizations This set of optimizations looks for functions that are applied to constants, and "pre executes" them, so they don't have to be calculated (returning the same value) on each run.ConstantFunctionOptimizations.FoldConstantFunctions CuDNNFunctionOptimizations CuDNNFunctionOptimizations.CudnnConv2dNCHWtoNHWCConversion https://docs.nvidia.com/deeplearning/sdk/dl-performance-guide/index.html#tensor-layout For tensor cores: we want NHWC layout: Section 7.3.1 "Layout choice has an effect on performance, as convolutions implemented for Tensor Cores require NHWC layout and are fastest when input tensors are laid out in NHWC." "To maximize performance, we recommend using NHWC tensor layout." As for weights format: cuDNN docs are vague - but TF uses NCHW+OIHW or NHWC+OHWIIdentityFunctionOptimizations IdentityFunctionOptimizations.RemoveIdentityOps Remove identity(x)IdentityFunctionOptimizations.RemoveIdentityPermute Remove permute(0,1,2,...,rank-1) as this is a no-opOptimizationUtils ShapeFunctionOptimizations ShapeFunctionOptimizations.FuseChainedConcatOps Fuse [concat(concat(concat(x,y,dim=D), z, dim=D), a, dim=D)] into a single concat op, concat(x,y,z,a, dim=D) As long as the intermediate outputs aren't needed elsewhereShapeFunctionOptimizations.FuseChainedPermutes Fuse [permute1 -> permute2 -> ...ShapeFunctionOptimizations.FuseChainedReshapes Fuse [reshape1 -> reshape2 -> ...UnusedFunctionOptimizations UnusedFunctionOptimizations.RemoveUnusedConstants