WakewordTrigger (Spokestack Library for Android 11.4.2 API)

java.lang.Object
- io.spokestack.spokestack.wakeword.WakewordTrigger

All Implemented Interfaces:

SpeechProcessor, AutoCloseable
```
public final class WakewordTrigger
extends Object
implements SpeechProcessor
```
wakeword Detection pipeline component
WakewordTrigger is a speech pipeline component that provides wakeword detection for activating downstream components. It uses a Tensorflow-Lite binary classifier to detect keyword phrases. Once a wakeword phrase is detected, the pipeline is activated. The pipeline remains active until the user stops talking or the activation timeout is reached.

The incoming raw audio signal is first normalized and then converted to the magnitude Short-Time Fourier Transform (STFT) representation over a hopped sliding window. This linear spectrogram is then converted to a mel spectrogram via a "filter" Tensorflow model. These mel frames are batched together into a sliding window.

The mel spectrogram represents the features to be passed to the autoregressive encoder (usually an rnn or crnn), which is implemented in an "encode" Tensorflow model. This encoder outputs an encoded vector and a state vector. The encoded vectors are batched together into a sliding window for classification, and the state vector is used to perform the running autoregressive transduction over the mel frames.

The "detect" Tensorflow model takes the encoded sliding window and outputs a single posterior value in the range [0, 1]. Values closer to 1 indicate a detected keyword phrase, values closer to 0 indicate non-keyword speech. This classifier is commonly implemented as an attention mechanism over the encoder window.

The detector's outputs are then compared against a configured threshold, in order to determine whether to activate the pipeline. If the posterior is greater than the threshold, the activation occurs.

Activations have configurable minimum/maximum lengths. The minimum length prevents the activation from being aborted if the user pauses after saying the wakeword (which untriggers the VAD). The maximum activation length allows the activation to timeout if the user doesn't say anything after saying the wakeword.

The wakeword detector can be used in a multi-turn dialogue system. In such an environment, the user is not expected to say the wakeword during each turn. Therefore, an application can manually activate the pipeline by calling setActive (after a system turn), and the wakeword detector will apply its minimum/maximum activation lengths to control the duration of the activation.

This pipeline component supports the following configuration properties:
- wake-filter-path (string, required): file system path to the "filter" Tensorflow-Lite model, which is used to calculate a mel spectrogram frame from the linear STFT; its inputs should be shaped [fft-width], and its outputs [mel-width]
- wake-encode-path (string, required): file system path to the "encode" Tensorflow-Lite model, which is used to perform each autoregressive step over the mel frames; its inputs should be shaped [mel-length, mel-width], and its outputs [encode-width], with an additional state input/output shaped [state-width]
- wake-detect-path (string, required): file system path to the "detect" Tensorflow-Lite model; its inputs shoudld be shaped [encode-length, encode-width], and its outputs [1]
- rms-target (double): the desired linear Root Mean Squared (RMS) signal energy, which is used for signal normalization and should be tuned to the RMS target used during training
- rms-alpha (double): the Exponentially-Weighted Moving Average (EWMA) update rate for the current RMS signal energy (0 for no RMS normalization)
- pre-emphasis (double): the pre-emphasis filter weight to apply to the normalized audio signal (0 for no pre-emphasis)
- fft-window-size (integer): the size of the signal window used to calculate the STFT, in number of samples - should be a power of 2 for maximum efficiency
- fft-window-type (string): the name of the windowing function to apply to each audio frame before calculating the STFT; currently the "hann" window is supported
- fft-hop-length (integer): the length of time to skip each time the overlapping STFT is calculated, in milliseconds
- mel-frame-length (integer): the length of the mel spectrogram used as an input to the encoder, in milliseconds
- mel-frame-width (integer): the size of each mel spectrogram frame, in number of filterbank components
- wake-encode-length (integer): the length of the sliding window of encoder output used as an input to the classifier, in milliseconds
- wake-encode-width (integer): the size of the encoder output, in vector units
- wake-state-width (integer): the size of the encoder state, in vector units (defaults to wake-encode-width)
- wake-threshold (double): the threshold of the classifier's posterior output, above which the trigger activates the pipeline, in the range [0, 1]

Field Summary

Fields
Modifier and Type	Field and Description
`static int`	`DEFAULT_FFT_HOP_LENGTH` default fft-hop-length configuration value.
`static int`	`DEFAULT_FFT_WINDOW_SIZE` default fft-window-size configuration value.
`static String`	`DEFAULT_FFT_WINDOW_TYPE` default fft-window-type configuration value.
`static int`	`DEFAULT_MEL_FRAME_LENGTH` default mel-frame-length configuration value.
`static int`	`DEFAULT_MEL_FRAME_WIDTH` default mel-frame-width configuration value.
`static float`	`DEFAULT_PRE_EMPHASIS` default pre-emphasis configuration value.
`static float`	`DEFAULT_RMS_ALPHA` default rms-alpha configuration value.
`static float`	`DEFAULT_RMS_TARGET` default rms-target configuration value.
`static int`	`DEFAULT_WAKE_ENCODE_LENGTH` default wake-encode-length configuration value.
`static int`	`DEFAULT_WAKE_ENCODE_WIDTH` default wake-encode-width configuration value.
`static float`	`DEFAULT_WAKE_THRESHOLD` default wake-threshold value.
`static String`	`FFT_WINDOW_TYPE_HANN` the hann fft-window-type.

Constructor Summary

Constructors
Constructor and Description
`WakewordTrigger(SpeechConfig config)` constructs a new trigger instance.
`WakewordTrigger(SpeechConfig config, TensorflowModel.Loader loader)` constructs a new trigger instance, for testing.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`close()` releases resources associated with the wakeword detector.
`void`	`process(SpeechContext context, ByteBuffer buffer)` processes a frame of audio.
`void`	`reset()` resets all state internal to the stage.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - FFT_WINDOW_TYPE_HANN
```
public static final String FFT_WINDOW_TYPE_HANN
```
    the hann fft-window-type.
    
    See Also:
    
    Constant Field Values
  - DEFAULT_FFT_WINDOW_TYPE
```
public static final String DEFAULT_FFT_WINDOW_TYPE
```
    default fft-window-type configuration value.
    
    See Also:
    
    Constant Field Values
  - DEFAULT_RMS_TARGET
```
public static final float DEFAULT_RMS_TARGET
```
    default rms-target configuration value.
    
    See Also:
    
    Constant Field Values
  - DEFAULT_RMS_ALPHA
```
public static final float DEFAULT_RMS_ALPHA
```
    default rms-alpha configuration value.
    
    See Also:
    
    Constant Field Values
  - DEFAULT_PRE_EMPHASIS
```
public static final float DEFAULT_PRE_EMPHASIS
```
    default pre-emphasis configuration value.
    
    See Also:
    
    Constant Field Values
  - DEFAULT_FFT_WINDOW_SIZE
```
public static final int DEFAULT_FFT_WINDOW_SIZE
```
    default fft-window-size configuration value.
    
    See Also:
    
    Constant Field Values
  - DEFAULT_FFT_HOP_LENGTH
```
public static final int DEFAULT_FFT_HOP_LENGTH
```
    default fft-hop-length configuration value.
    
    See Also:
    
    Constant Field Values
  - DEFAULT_MEL_FRAME_LENGTH
```
public static final int DEFAULT_MEL_FRAME_LENGTH
```
    default mel-frame-length configuration value.
    
    See Also:
    
    Constant Field Values
  - DEFAULT_MEL_FRAME_WIDTH
```
public static final int DEFAULT_MEL_FRAME_WIDTH
```
    default mel-frame-width configuration value.
    
    See Also:
    
    Constant Field Values
  - DEFAULT_WAKE_ENCODE_LENGTH
```
public static final int DEFAULT_WAKE_ENCODE_LENGTH
```
    default wake-encode-length configuration value.
    
    See Also:
    
    Constant Field Values
  - DEFAULT_WAKE_ENCODE_WIDTH
```
public static final int DEFAULT_WAKE_ENCODE_WIDTH
```
    default wake-encode-width configuration value.
    
    See Also:
    
    Constant Field Values
  - DEFAULT_WAKE_THRESHOLD
```
public static final float DEFAULT_WAKE_THRESHOLD
```
    default wake-threshold value.
    
    See Also:
    
    Constant Field Values
- Constructor Detail
  - WakewordTrigger
```
public WakewordTrigger(SpeechConfig config)
```
    constructs a new trigger instance.
    
    Parameters:
    
    config - the pipeline configuration instance
  - WakewordTrigger
```
public WakewordTrigger(SpeechConfig config,
                       TensorflowModel.Loader loader)
```
    constructs a new trigger instance, for testing.
    
    Parameters:
    
    config - the pipeline configuration instance
    
    loader - tensorflow model loader
- Method Detail
  - close
```
public void close()
           throws Exception
```
    releases resources associated with the wakeword detector.
    
    Specified by:
    
    close in interface AutoCloseable
    
    Throws:
    
    Exception - on error
  - reset
```
public void reset()
```
    Description copied from interface: SpeechProcessor
    
    resets all state internal to the stage.
    
    Specified by:
    
    reset in interface SpeechProcessor
  - process
```
public void process(SpeechContext context,
                    ByteBuffer buffer)
             throws Exception
```
    processes a frame of audio.
    
    Specified by:
    
    process in interface SpeechProcessor
    
    Parameters:
    
    context - the current speech context
    
    buffer - the audio frame to detect
    
    Throws:
    
    Exception - on error

Class WakewordTrigger

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

FFT_WINDOW_TYPE_HANN

DEFAULT_FFT_WINDOW_TYPE

DEFAULT_RMS_TARGET

DEFAULT_RMS_ALPHA

DEFAULT_PRE_EMPHASIS

DEFAULT_FFT_WINDOW_SIZE

DEFAULT_FFT_HOP_LENGTH

DEFAULT_MEL_FRAME_LENGTH

DEFAULT_MEL_FRAME_WIDTH

DEFAULT_WAKE_ENCODE_LENGTH

DEFAULT_WAKE_ENCODE_WIDTH

DEFAULT_WAKE_THRESHOLD

Constructor Detail

WakewordTrigger

WakewordTrigger

Method Detail

close

reset

process