public final class KeywordRecognizer extends Object implements SpeechProcessor
KeywordRecognizer is a speech pipeline component that provides the ability to recognize one or more keyword phrases during pipeline activation. Its behavior is similar to the other speech recognizer components, albeit for a limited vocabulary (usually just a few words/phrases).
The incoming raw audio signal is first normalized and then converted to the magnitude Short-Time Fourier Transform (STFT) representation over a hopped sliding window. This linear spectrogram is then converted to a mel spectrogram via a "filter" Tensorflow model. These mel frames are batched together into a sliding window.
The mel spectrogram represents the features to be passed to the autoregressive encoder (usually an rnn or crnn), which is implemented in an "encode" Tensorflow model. This encoder outputs an encoded vector and a state vector. The encoded vectors are batched together into a sliding window for classification, and the state vector is used to perform the running autoregressive transduction over the mel frames.
The "detect" Tensorflow model takes the encoded sliding window and outputs a set of independent posterior values in the range [0, 1], one per keyword class.
During detection, the highest scoring posterior is chosen as the recognized class, and if its value is higher than the configured threshold, that class is reported to the client through the speech recognition event. Otherwise, a timeout event occurs. Note that the detection model is only run on the frame in which the speech context is deactivated, similar to the end-of-utterance mechanism used by the other speech recognizers.
The keyword recognizer can be used as a stand-alone speech recognizer, using the VAD/timeout (or other activator) to manage activations. Alternatively, the recognizer can be used along with a wakeword detector to manage activations, in a two-stage wakeword/recognizer pattern.
This pipeline component supports the following configuration properties:
keyword-classes
is not supplied.
keyword-metadata-path
is not supplied.
Modifier and Type | Field and Description |
---|---|
static int |
DEFAULT_ENCODE_LENGTH
default keyword-encode-length configuration value.
|
static int |
DEFAULT_ENCODE_WIDTH
default keyword-encode-width configuration value.
|
static int |
DEFAULT_FFT_HOP_LENGTH
default keyword-fft-hop-length configuration value.
|
static int |
DEFAULT_FFT_WINDOW_SIZE
default keyword-fft-window-size configuration value.
|
static String |
DEFAULT_FFT_WINDOW_TYPE
default keyword-fft-window-type configuration value.
|
static int |
DEFAULT_MEL_FRAME_LENGTH
default keyword-mel-frame-length configuration value.
|
static int |
DEFAULT_MEL_FRAME_WIDTH
default keyword-mel-frame-width configuration value.
|
static float |
DEFAULT_PRE_EMPHASIS
default keyword-pre-emphasis configuration value.
|
static float |
DEFAULT_THRESHOLD
default recognition threshold value.
|
static String |
FFT_WINDOW_TYPE_HANN
the hann keyword-fft-window-type.
|
Constructor and Description |
---|
KeywordRecognizer(SpeechConfig config)
constructs a new recognizer instance.
|
KeywordRecognizer(SpeechConfig config,
TensorflowModel.Loader loader)
constructs a new recognizer instance, for testing.
|
Modifier and Type | Method and Description |
---|---|
void |
close()
releases resources associated with the keyword recognizer.
|
void |
process(SpeechContext context,
ByteBuffer buffer)
processes a frame of audio.
|
void |
reset()
resets all state internal to the stage.
|
public static final String FFT_WINDOW_TYPE_HANN
public static final String DEFAULT_FFT_WINDOW_TYPE
public static final float DEFAULT_PRE_EMPHASIS
public static final int DEFAULT_FFT_WINDOW_SIZE
public static final int DEFAULT_FFT_HOP_LENGTH
public static final int DEFAULT_MEL_FRAME_LENGTH
public static final int DEFAULT_MEL_FRAME_WIDTH
public static final int DEFAULT_ENCODE_LENGTH
public static final int DEFAULT_ENCODE_WIDTH
public static final float DEFAULT_THRESHOLD
public KeywordRecognizer(SpeechConfig config)
config
- the pipeline configuration instancepublic KeywordRecognizer(SpeechConfig config, TensorflowModel.Loader loader)
config
- the pipeline configuration instanceloader
- tensorflow model loaderpublic void close() throws Exception
close
in interface AutoCloseable
Exception
- on errorpublic void reset()
SpeechProcessor
reset
in interface SpeechProcessor
public void process(SpeechContext context, ByteBuffer buffer) throws Exception
process
in interface SpeechProcessor
context
- the current speech contextbuffer
- the audio frame to detectException
- on errorCopyright © 2021. All rights reserved.