Interface SIMDCompositeExpression

All Superinterfaces:
FastCompositeExpression, Savable, Serializable
All Known Implementing Classes:
BulkTurboEvaluator.BatchedVectorCompositeExpression

public interface SIMDCompositeExpression extends FastCompositeExpression
Super-fast, SIMD-aligned vector expression evaluator interface. Exposes a dual-path API optimized for massive parallel time-series calculation, machine learning transformations, and fused token attention operations. * @author GBEMIRO
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    applyBulk(double[][] variables, double[] output, boolean useBlocks)
    Executes the compiled expression over a convenient 2D array of variables.
    void
    applyBulk(double[] flatVariables, double[] output, boolean useBlocks)
    Warp Speed Path (Power Users): Evaluates a single, raw, pre-grouped flat array at maximum hardware throughput with zero memory copies, zero allocations, and direct sequential prefetching.
    void
    applyBulkBatched(double[][] variables, double[] output, int batchSize, boolean useBlocks)
    Evaluates a 2D variable structure using an explicit, cache-bounded window chunk size to enforce strict CPU L1/L2 data cache localization.
    void
    applyBulkBatched(double[] flatVariables, double[] output, int batchSize, boolean useBlocks)
    Warp Speed Path (Power Users): Evaluates a pre-grouped flat array using custom-defined batch chunk boundaries to maximize custom hardware L1/L2 execution efficiency.
    void
    applyBulkParallel(double[][] variables, double[] output)
    Distributes the evaluation of a 2D array dataset evenly across multiple processing threads using a fork-join block chunking methodology.
    void
    applyBulkParallel(double[] flatVariables, double[] output)
    Warp Speed Path (Power Users): Concurrently evaluates a pre-grouped flat array by cleanly dividing segments across active CPU cores for optimal multi-threaded memory bandwidth consumption.
    void
    applyMatrixKernel(FlatMatrix[] inputs, FlatMatrix output, String operation)
    Fuses deep learning and high-performance neural network transformations directly over pre-allocated double-precision structural tensor types.
    void
    Fuses deep learning and high-performance neural network transformations directly over pre-allocated single-precision (float) tensor execution spaces.

    Methods inherited from interface com.github.gbenroscience.parser.turbo.tools.FastCompositeExpression

    apply, applyMatrix, applyScalar, applyString, applyVector, checkErrorLogs, getCompiler, getRoot

    Methods inherited from interface com.github.gbenroscience.interfaces.Savable

    serialize
  • Method Details

    • applyBulk

      void applyBulk(double[][] variables, double[] output, boolean useBlocks)
      Executes the compiled expression over a convenient 2D array of variables. Under the hood, this method flattens and groups data into sequential memory blocks before delegating to the core vectorized execution kernel.
      Parameters:
      variables - A 2D array of variable channels where each outer row represents an entire vector for a specific variable slot, e.g., [[x1, x2... xn], [y1, y2... yn], [z1, z2... zn]].
      output - The pre-allocated target array where the final evaluated results will be directly dumped.
      useBlocks - If true, processes data in L1/L2 cache-bounded blocks to prevent memory thrashing on large datasets. If false, executes a raw scalar sequential stream maximizing clock throughput on low-element datasets.
    • applyBulkParallel

      void applyBulkParallel(double[][] variables, double[] output)
      Distributes the evaluation of a 2D array dataset evenly across multiple processing threads using a fork-join block chunking methodology.
      Parameters:
      variables - A 2D array of variable channels where each outer row represents an entire vector for a specific variable slot, e.g., [[x1, x2... xn], [y1, y2... yn]].
      output - The pre-allocated target array where the parallel calculations will be written.
    • applyBulkBatched

      void applyBulkBatched(double[][] variables, double[] output, int batchSize, boolean useBlocks)
      Evaluates a 2D variable structure using an explicit, cache-bounded window chunk size to enforce strict CPU L1/L2 data cache localization.
      Parameters:
      variables - A 2D array of variable channels where each outer row represents an entire vector for a specific variable slot, e.g., [[x1, x2... xn], [y1, y2... yn]].
      output - The pre-allocated target array where the evaluated blocks will be written.
      batchSize - The strict memory segment window slice size processed per individual cache loop pass.
      useBlocks - If true, overlays internal sub-tiling structures on top of the batch slice boundaries. If false, evaluates the raw batch length sequentially in a single pass.
    • applyBulk

      void applyBulk(double[] flatVariables, double[] output, boolean useBlocks)
      Warp Speed Path (Power Users): Evaluates a single, raw, pre-grouped flat array at maximum hardware throughput with zero memory copies, zero allocations, and direct sequential prefetching.
      Parameters:
      flatVariables - A single flat array containing variables contiguously grouped back-to-back by variable slot. CRITICAL: Data must use a Grouped structure, e.g., [x1, x2... xn, y1, y2... yn, z1, z2... zn]. Interleaved arrays will yield corrupted data.
      output - The pre-allocated target array where the evaluated vector stream is directly copied.
      useBlocks - If true, maps memory segments into cache-sized micro-tiles. If false, allows the CPU prefetcher to step uninhibited sequentially through the flat segments.
    • applyBulkParallel

      void applyBulkParallel(double[] flatVariables, double[] output)
      Warp Speed Path (Power Users): Concurrently evaluates a pre-grouped flat array by cleanly dividing segments across active CPU cores for optimal multi-threaded memory bandwidth consumption.
      Parameters:
      flatVariables - A single flat array containing variables contiguously grouped back-to-back by variable slot. CRITICAL: Data must use a Grouped structure, e.g., [x1, x2... xn, y1, y2... yn, z1, z2... zn]. Do NOT pass interleaved data.
      output - The pre-allocated target array where parallel workers will drop evaluated computations.
    • applyBulkBatched

      void applyBulkBatched(double[] flatVariables, double[] output, int batchSize, boolean useBlocks)
      Warp Speed Path (Power Users): Evaluates a pre-grouped flat array using custom-defined batch chunk boundaries to maximize custom hardware L1/L2 execution efficiency.
      Parameters:
      flatVariables - A single flat array containing variables contiguously grouped back-to-back by variable slot. CRITICAL: Data must use a Grouped structure, e.g., [x1, x2... xn, y1, y2... yn, z1, z2... zn]. Do NOT pass interleaved data.
      output - The pre-allocated target array where the evaluated blocks will be written.
      batchSize - The localized processing block window length applied across individual memory loops.
      useBlocks - If true, enforces inner tiling within the specified batch limits. If false, trusts the user-provided batch size as the definitive sequential block length.
    • applyMatrixKernel

      void applyMatrixKernel(FlatMatrix[] inputs, FlatMatrix output, String operation)
      Fuses deep learning and high-performance neural network transformations directly over pre-allocated double-precision structural tensor types.
      Parameters:
      inputs - An ordered array of operational tensor matrices, e.g., weights, targets, scales, or vector biases.
      output - The destination matrix wrapper initialized to capture transformed weights or spatial dimensions.
      operation - The explicit execution identifier targeting underlying matrix optimization layers (e.g., "matmul", "rms_norm", "swiglu", "q8_quantize").
    • applyMatrixKernel

      void applyMatrixKernel(FlatMatrixF[] inputs, FlatMatrixF output, String op)
      Fuses deep learning and high-performance neural network transformations directly over pre-allocated single-precision (float) tensor execution spaces.
      Parameters:
      inputs - An ordered array of float operational matrices, optimized for lightning-fast transformer workloads.
      output - The destination single-precision tensor buffer designed to safely lock down mutated states.
      op - The explicit execution identifier targeting underlying matrix optimization layers (e.g., "matmul_bias_gelu", "rope_split", "mha_attention").