KeywordRecognizer (Spokestack Library for Android 11.4.2 API)

java.lang.Object
- io.spokestack.spokestack.asr.KeywordRecognizer

All Implemented Interfaces:

SpeechProcessor, AutoCloseable
```
public final class KeywordRecognizer
extends Object
implements SpeechProcessor
```
keyword recognition pipeline component
KeywordRecognizer is a speech pipeline component that provides the ability to recognize one or more keyword phrases during pipeline activation. Its behavior is similar to the other speech recognizer components, albeit for a limited vocabulary (usually just a few words/phrases).

The incoming raw audio signal is first normalized and then converted to the magnitude Short-Time Fourier Transform (STFT) representation over a hopped sliding window. This linear spectrogram is then converted to a mel spectrogram via a "filter" Tensorflow model. These mel frames are batched together into a sliding window.

The mel spectrogram represents the features to be passed to the autoregressive encoder (usually an rnn or crnn), which is implemented in an "encode" Tensorflow model. This encoder outputs an encoded vector and a state vector. The encoded vectors are batched together into a sliding window for classification, and the state vector is used to perform the running autoregressive transduction over the mel frames.

The "detect" Tensorflow model takes the encoded sliding window and outputs a set of independent posterior values in the range [0, 1], one per keyword class.

During detection, the highest scoring posterior is chosen as the recognized class, and if its value is higher than the configured threshold, that class is reported to the client through the speech recognition event. Otherwise, a timeout event occurs. Note that the detection model is only run on the frame in which the speech context is deactivated, similar to the end-of-utterance mechanism used by the other speech recognizers.

The keyword recognizer can be used as a stand-alone speech recognizer, using the VAD/timeout (or other activator) to manage activations. Alternatively, the recognizer can be used along with a wakeword detector to manage activations, in a two-stage wakeword/recognizer pattern.

This pipeline component supports the following configuration properties:
- keyword-filter-path (string, required): file system path to the "filter" Tensorflow-Lite model, which is used to calculate a mel spectrogram frame from the linear STFT; its inputs should be shaped [fft-width], and its outputs [mel-width]
- keyword-encode-path (string, required): file system path to the "encode" Tensorflow-Lite model, which is used to perform each autoregressive step over the mel frames; its inputs should be shaped [mel-length, mel-width], and its outputs [encode-width], with an additional state input/output shaped [state-width]
- keyword-detect-path (string, required): file system path to the "detect" Tensorflow-Lite model; its inputs should be shaped [encode-length, encode-width], and its outputs [len(classes)]
- keyword-metadata-path (string): file system path to the keyword model's metadata JSON file containing its classes. Required if keyword-classes is not supplied.
- keyword-classes (string): comma-separated ordered list of class names for the keywords; the name corresponding to the most likely class will be returned in the transcript field when the recognition event is raised. Required if keyword-metadata-path is not supplied.
- keyword-pre-emphasis (double): the pre-emphasis filter weight to apply to the audio signal (0 for no pre-emphasis)
- keyword-fft-window-size (integer): the size of the signal window used to calculate the STFT, in number of samples - should be a power of 2 for maximum efficiency
- keyword-fft-window-type (string): the name of the windowing function to apply to each audio frame before calculating the STFT; currently the "hann" window is supported
- keyword-fft-hop-length (integer): the length of time to skip each time the overlapping STFT is calculated, in milliseconds
- keyword-mel-frame-length (integer): the length of the mel spectrogram used as an input to the encoder, in milliseconds
- keyword-mel-frame-width (integer): the size of each mel spectrogram frame, in number of filterbank components
- keyword-encode-length (integer): the length of the sliding window of encoder output used as an input to the classifier, in milliseconds
- keyword-encode-width (integer): the size of the encoder output, in vector units
- keyword-state-width (integer): the size of the encoder state, in vector units (defaults to keyword-encode-width)
- keyword-threshold (double): the threshold of the classifier's posterior output, above which the recognizer raises a recognition event for the most likely kewyord class, in the range [0, 1]

Field Summary

Fields
Modifier and Type	Field and Description
`static int`	`DEFAULT_ENCODE_LENGTH` default keyword-encode-length configuration value.
`static int`	`DEFAULT_ENCODE_WIDTH` default keyword-encode-width configuration value.
`static int`	`DEFAULT_FFT_HOP_LENGTH` default keyword-fft-hop-length configuration value.
`static int`	`DEFAULT_FFT_WINDOW_SIZE` default keyword-fft-window-size configuration value.
`static String`	`DEFAULT_FFT_WINDOW_TYPE` default keyword-fft-window-type configuration value.
`static int`	`DEFAULT_MEL_FRAME_LENGTH` default keyword-mel-frame-length configuration value.
`static int`	`DEFAULT_MEL_FRAME_WIDTH` default keyword-mel-frame-width configuration value.
`static float`	`DEFAULT_PRE_EMPHASIS` default keyword-pre-emphasis configuration value.
`static float`	`DEFAULT_THRESHOLD` default recognition threshold value.
`static String`	`FFT_WINDOW_TYPE_HANN` the hann keyword-fft-window-type.

Constructor Summary

Constructors
Constructor and Description
`KeywordRecognizer(SpeechConfig config)` constructs a new recognizer instance.
`KeywordRecognizer(SpeechConfig config, TensorflowModel.Loader loader)` constructs a new recognizer instance, for testing.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`close()` releases resources associated with the keyword recognizer.
`void`	`process(SpeechContext context, ByteBuffer buffer)` processes a frame of audio.
`void`	`reset()` resets all state internal to the stage.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - FFT_WINDOW_TYPE_HANN
```
public static final String FFT_WINDOW_TYPE_HANN
```
    the hann keyword-fft-window-type.
    
    See Also:
    
    Constant Field Values
  - DEFAULT_FFT_WINDOW_TYPE
```
public static final String DEFAULT_FFT_WINDOW_TYPE
```
    default keyword-fft-window-type configuration value.
    
    See Also:
    
    Constant Field Values
  - DEFAULT_PRE_EMPHASIS
```
public static final float DEFAULT_PRE_EMPHASIS
```
    default keyword-pre-emphasis configuration value.
    
    See Also:
    
    Constant Field Values
  - DEFAULT_FFT_WINDOW_SIZE
```
public static final int DEFAULT_FFT_WINDOW_SIZE
```
    default keyword-fft-window-size configuration value.
    
    See Also:
    
    Constant Field Values
  - DEFAULT_FFT_HOP_LENGTH
```
public static final int DEFAULT_FFT_HOP_LENGTH
```
    default keyword-fft-hop-length configuration value.
    
    See Also:
    
    Constant Field Values
  - DEFAULT_MEL_FRAME_LENGTH
```
public static final int DEFAULT_MEL_FRAME_LENGTH
```
    default keyword-mel-frame-length configuration value.
    
    See Also:
    
    Constant Field Values
  - DEFAULT_MEL_FRAME_WIDTH
```
public static final int DEFAULT_MEL_FRAME_WIDTH
```
    default keyword-mel-frame-width configuration value.
    
    See Also:
    
    Constant Field Values
  - DEFAULT_ENCODE_LENGTH
```
public static final int DEFAULT_ENCODE_LENGTH
```
    default keyword-encode-length configuration value.
    
    See Also:
    
    Constant Field Values
  - DEFAULT_ENCODE_WIDTH
```
public static final int DEFAULT_ENCODE_WIDTH
```
    default keyword-encode-width configuration value.
    
    See Also:
    
    Constant Field Values
  - DEFAULT_THRESHOLD
```
public static final float DEFAULT_THRESHOLD
```
    default recognition threshold value.
    
    See Also:
    
    Constant Field Values
- Constructor Detail
  - KeywordRecognizer
```
public KeywordRecognizer(SpeechConfig config)
```
    constructs a new recognizer instance.
    
    Parameters:
    
    config - the pipeline configuration instance
  - KeywordRecognizer
```
public KeywordRecognizer(SpeechConfig config,
                         TensorflowModel.Loader loader)
```
    constructs a new recognizer instance, for testing.
    
    Parameters:
    
    config - the pipeline configuration instance
    
    loader - tensorflow model loader
- Method Detail
  - close
```
public void close()
           throws Exception
```
    releases resources associated with the keyword recognizer.
    
    Specified by:
    
    close in interface AutoCloseable
    
    Throws:
    
    Exception - on error
  - reset
```
public void reset()
```
    Description copied from interface: SpeechProcessor
    
    resets all state internal to the stage.
    
    Specified by:
    
    reset in interface SpeechProcessor
  - process
```
public void process(SpeechContext context,
                    ByteBuffer buffer)
             throws Exception
```
    processes a frame of audio.
    
    Specified by:
    
    process in interface SpeechProcessor
    
    Parameters:
    
    context - the current speech context
    
    buffer - the audio frame to detect
    
    Throws:
    
    Exception - on error

Class KeywordRecognizer

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

FFT_WINDOW_TYPE_HANN

DEFAULT_FFT_WINDOW_TYPE

DEFAULT_PRE_EMPHASIS

DEFAULT_FFT_WINDOW_SIZE

DEFAULT_FFT_HOP_LENGTH

DEFAULT_MEL_FRAME_LENGTH

DEFAULT_MEL_FRAME_WIDTH

DEFAULT_ENCODE_LENGTH

DEFAULT_ENCODE_WIDTH

DEFAULT_THRESHOLD

Constructor Detail

KeywordRecognizer

KeywordRecognizer

Method Detail

close

reset

process