All Superinterfaces:: FastCompositeExpression, Savable, Serializable

All Known Implementing Classes:: BulkTurboEvaluator.BatchedVectorCompositeExpression

public interface SIMDCompositeExpression extends FastCompositeExpression

Super-fast, SIMD-aligned vector expression evaluator interface. Exposes a dual-path API optimized for massive parallel time-series calculation, machine learning transformations, and fused token attention operations. * @author GBEMIRO

Method Summary

Modifier and Type

Method

Description

void

applyBulk(double[][] variables, double[] output, boolean useBlocks)

Executes the compiled expression over a convenient 2D array of variables.

void

applyBulk(double[] flatVariables, double[] output, boolean useBlocks)

Warp Speed Path (Power Users): Evaluates a single, raw, pre-grouped flat array at maximum hardware throughput with zero memory copies, zero allocations, and direct sequential prefetching.

void

applyBulkBatched(double[][] variables, double[] output, int batchSize, boolean useBlocks)

Evaluates a 2D variable structure using an explicit, cache-bounded window chunk size to enforce strict CPU L1/L2 data cache localization.

void

applyBulkBatched(double[] flatVariables, double[] output, int batchSize, boolean useBlocks)

Warp Speed Path (Power Users): Evaluates a pre-grouped flat array using custom-defined batch chunk boundaries to maximize custom hardware L1/L2 execution efficiency.

void

applyBulkParallel(double[][] variables, double[] output)

Distributes the evaluation of a 2D array dataset evenly across multiple processing threads using a fork-join block chunking methodology.

void

applyBulkParallel(double[] flatVariables, double[] output)

Warp Speed Path (Power Users): Concurrently evaluates a pre-grouped flat array by cleanly dividing segments across active CPU cores for optimal multi-threaded memory bandwidth consumption.

void

applyMatrixKernel(FlatMatrix[] inputs, FlatMatrix output, String operation)

Fuses deep learning and high-performance neural network transformations directly over pre-allocated double-precision structural tensor types.

void

applyMatrixKernel(FlatMatrixF[] inputs, FlatMatrixF output, String op)

Fuses deep learning and high-performance neural network transformations directly over pre-allocated single-precision (float) tensor execution spaces.

Methods inherited from interface com.github.gbenroscience.parser.turbo.tools.FastCompositeExpression
apply, applyMatrix, applyScalar, applyString, applyVector, checkErrorLogs, getCompiler, getRoot

Methods inherited from interface com.github.gbenroscience.interfaces.Savable
serialize

Method Details
- applyBulk
  
  void applyBulk(double[][] variables, double[] output, boolean useBlocks)
  
  Executes the compiled expression over a convenient 2D array of variables. Under the hood, this method flattens and groups data into sequential memory blocks before delegating to the core vectorized execution kernel.
  
  Parameters:
  
  variables - A 2D array of variable channels where each outer row represents an entire vector for a specific variable slot, e.g., [[x1, x2... xn], [y1, y2... yn], [z1, z2... zn]].
  
  output - The pre-allocated target array where the final evaluated results will be directly dumped.
  
  useBlocks - If true, processes data in L1/L2 cache-bounded blocks to prevent memory thrashing on large datasets. If false, executes a raw scalar sequential stream maximizing clock throughput on low-element datasets.
- applyBulkParallel
  
  void applyBulkParallel(double[][] variables, double[] output)
  
  Distributes the evaluation of a 2D array dataset evenly across multiple processing threads using a fork-join block chunking methodology.
  
  Parameters:
  
  variables - A 2D array of variable channels where each outer row represents an entire vector for a specific variable slot, e.g., [[x1, x2... xn], [y1, y2... yn]].
  
  output - The pre-allocated target array where the parallel calculations will be written.
- applyBulkBatched
  
  void applyBulkBatched(double[][] variables, double[] output, int batchSize, boolean useBlocks)
  
  Evaluates a 2D variable structure using an explicit, cache-bounded window chunk size to enforce strict CPU L1/L2 data cache localization.
  
  Parameters:
  
  variables - A 2D array of variable channels where each outer row represents an entire vector for a specific variable slot, e.g., [[x1, x2... xn], [y1, y2... yn]].
  
  output - The pre-allocated target array where the evaluated blocks will be written.
  
  batchSize - The strict memory segment window slice size processed per individual cache loop pass.
  
  useBlocks - If true, overlays internal sub-tiling structures on top of the batch slice boundaries. If false, evaluates the raw batch length sequentially in a single pass.
- applyBulk
  
  void applyBulk(double[] flatVariables, double[] output, boolean useBlocks)
  
  Warp Speed Path (Power Users): Evaluates a single, raw, pre-grouped flat array at maximum hardware throughput with zero memory copies, zero allocations, and direct sequential prefetching.
  
  Parameters:
  
  flatVariables - A single flat array containing variables contiguously grouped back-to-back by variable slot. CRITICAL: Data must use a Grouped structure, e.g., [x1, x2... xn, y1, y2... yn, z1, z2... zn]. Interleaved arrays will yield corrupted data.
  
  output - The pre-allocated target array where the evaluated vector stream is directly copied.
  
  useBlocks - If true, maps memory segments into cache-sized micro-tiles. If false, allows the CPU prefetcher to step uninhibited sequentially through the flat segments.
- applyBulkParallel
  
  void applyBulkParallel(double[] flatVariables, double[] output)
  
  Warp Speed Path (Power Users): Concurrently evaluates a pre-grouped flat array by cleanly dividing segments across active CPU cores for optimal multi-threaded memory bandwidth consumption.
  
  Parameters:
  
  flatVariables - A single flat array containing variables contiguously grouped back-to-back by variable slot. CRITICAL: Data must use a Grouped structure, e.g., [x1, x2... xn, y1, y2... yn, z1, z2... zn]. Do NOT pass interleaved data.
  
  output - The pre-allocated target array where parallel workers will drop evaluated computations.
- applyBulkBatched
  
  void applyBulkBatched(double[] flatVariables, double[] output, int batchSize, boolean useBlocks)
  
  Warp Speed Path (Power Users): Evaluates a pre-grouped flat array using custom-defined batch chunk boundaries to maximize custom hardware L1/L2 execution efficiency.
  
  Parameters:
  
  flatVariables - A single flat array containing variables contiguously grouped back-to-back by variable slot. CRITICAL: Data must use a Grouped structure, e.g., [x1, x2... xn, y1, y2... yn, z1, z2... zn]. Do NOT pass interleaved data.
  
  output - The pre-allocated target array where the evaluated blocks will be written.
  
  batchSize - The localized processing block window length applied across individual memory loops.
  
  useBlocks - If true, enforces inner tiling within the specified batch limits. If false, trusts the user-provided batch size as the definitive sequential block length.
- applyMatrixKernel
  
  void applyMatrixKernel(FlatMatrix[] inputs, FlatMatrix output, String operation)
  
  Fuses deep learning and high-performance neural network transformations directly over pre-allocated double-precision structural tensor types.
  
  Parameters:
  
  inputs - An ordered array of operational tensor matrices, e.g., weights, targets, scales, or vector biases.
  
  output - The destination matrix wrapper initialized to capture transformed weights or spatial dimensions.
  
  operation - The explicit execution identifier targeting underlying matrix optimization layers (e.g., "matmul", "rms_norm", "swiglu", "q8_quantize").
- applyMatrixKernel
  
  void applyMatrixKernel(FlatMatrixF[] inputs, FlatMatrixF output, String op)
  
  Fuses deep learning and high-performance neural network transformations directly over pre-allocated single-precision (float) tensor execution spaces.
  
  Parameters:
  
  inputs - An ordered array of float operational matrices, optimized for lightning-fast transformer workloads.
  
  output - The destination single-precision tensor buffer designed to safely lock down mutated states.
  
  op - The explicit execution identifier targeting underlying matrix optimization layers (e.g., "matmul_bias_gelu", "rope_split", "mha_attention").

Interface SIMDCompositeExpression

Method Summary

Methods inherited from interface com.github.gbenroscience.parser.turbo.tools.FastCompositeExpression

Methods inherited from interface com.github.gbenroscience.interfaces.Savable

Method Details

applyBulk

applyBulkParallel

applyBulkBatched

applyBulk

applyBulkParallel

applyBulkBatched

applyMatrixKernel

applyMatrixKernel