Class RealtimeAudioInputTurnDetection.ServerVad
-
- All Implemented Interfaces:
public final class RealtimeAudioInputTurnDetection.ServerVad
Server-side voice activity detection (VAD) which flips on when user speech is detected and off after a period of silence.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description public final class
RealtimeAudioInputTurnDetection.ServerVad.Builder
A builder for ServerVad.
-
Method Summary
Modifier and Type Method Description final JsonValue
_type()
Type of turn detection, server_vad
to turn on simple Server VAD.final Optional<Boolean>
createResponse()
Whether or not to automatically generate a response when a VAD stop event occurs. final Optional<Long>
idleTimeoutMs()
Optional timeout after which a model response will be triggered automatically. final Optional<Boolean>
interruptResponse()
Whether or not to automatically interrupt any ongoing response with output to the default conversation (i.e. final Optional<Long>
prefixPaddingMs()
Used only for server_vad
mode.final Optional<Long>
silenceDurationMs()
Used only for server_vad
mode.final Optional<Double>
threshold()
Used only for server_vad
mode.final JsonField<Boolean>
_createResponse()
Returns the raw JSON value of createResponse. final JsonField<Long>
_idleTimeoutMs()
Returns the raw JSON value of idleTimeoutMs. final JsonField<Boolean>
_interruptResponse()
Returns the raw JSON value of interruptResponse. final JsonField<Long>
_prefixPaddingMs()
Returns the raw JSON value of prefixPaddingMs. final JsonField<Long>
_silenceDurationMs()
Returns the raw JSON value of silenceDurationMs. final JsonField<Double>
_threshold()
Returns the raw JSON value of threshold. final Map<String, JsonValue>
_additionalProperties()
final RealtimeAudioInputTurnDetection.ServerVad.Builder
toBuilder()
final RealtimeAudioInputTurnDetection.ServerVad
validate()
final Boolean
isValid()
Boolean
equals(Object other)
Integer
hashCode()
String
toString()
final static RealtimeAudioInputTurnDetection.ServerVad.Builder
builder()
Returns a mutable builder for constructing an instance of ServerVad. -
-
Method Detail
-
_type
final JsonValue _type()
Type of turn detection,
server_vad
to turn on simple Server VAD.Expected to always return the following:
JsonValue.from("server_vad")
However, this method can be useful for debugging and logging (e.g. if the server responded with an unexpected value).
-
createResponse
final Optional<Boolean> createResponse()
Whether or not to automatically generate a response when a VAD stop event occurs.
-
idleTimeoutMs
final Optional<Long> idleTimeoutMs()
Optional timeout after which a model response will be triggered automatically. This is useful for situations in which a long pause from the user is unexpected, such as a phone call. The model will effectively prompt the user to continue the conversation based on the current context.
The timeout value will be applied after the last model response's audio has finished playing, i.e. it's set to the
response.done
time plus audio playback duration.An
input_audio_buffer.timeout_triggered
event (plus events associated with the Response) will be emitted when the timeout is reached. Idle timeout is currently only supported forserver_vad
mode.
-
interruptResponse
final Optional<Boolean> interruptResponse()
Whether or not to automatically interrupt any ongoing response with output to the default conversation (i.e.
conversation
ofauto
) when a VAD start event occurs.
-
prefixPaddingMs
final Optional<Long> prefixPaddingMs()
Used only for
server_vad
mode. Amount of audio to include before the VAD detected speech (in milliseconds). Defaults to 300ms.
-
silenceDurationMs
final Optional<Long> silenceDurationMs()
Used only for
server_vad
mode. Duration of silence to detect speech stop (in milliseconds). Defaults to 500ms. With shorter values the model will respond more quickly, but may jump in on short pauses from the user.
-
threshold
final Optional<Double> threshold()
Used only for
server_vad
mode. Activation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A higher threshold will require louder audio to activate the model, and thus might perform better in noisy environments.
-
_createResponse
final JsonField<Boolean> _createResponse()
Returns the raw JSON value of createResponse.
Unlike createResponse, this method doesn't throw if the JSON field has an unexpected type.
-
_idleTimeoutMs
final JsonField<Long> _idleTimeoutMs()
Returns the raw JSON value of idleTimeoutMs.
Unlike idleTimeoutMs, this method doesn't throw if the JSON field has an unexpected type.
-
_interruptResponse
final JsonField<Boolean> _interruptResponse()
Returns the raw JSON value of interruptResponse.
Unlike interruptResponse, this method doesn't throw if the JSON field has an unexpected type.
-
_prefixPaddingMs
final JsonField<Long> _prefixPaddingMs()
Returns the raw JSON value of prefixPaddingMs.
Unlike prefixPaddingMs, this method doesn't throw if the JSON field has an unexpected type.
-
_silenceDurationMs
final JsonField<Long> _silenceDurationMs()
Returns the raw JSON value of silenceDurationMs.
Unlike silenceDurationMs, this method doesn't throw if the JSON field has an unexpected type.
-
_threshold
final JsonField<Double> _threshold()
Returns the raw JSON value of threshold.
Unlike threshold, this method doesn't throw if the JSON field has an unexpected type.
-
_additionalProperties
final Map<String, JsonValue> _additionalProperties()
-
toBuilder
final RealtimeAudioInputTurnDetection.ServerVad.Builder toBuilder()
-
validate
final RealtimeAudioInputTurnDetection.ServerVad validate()
-
builder
final static RealtimeAudioInputTurnDetection.ServerVad.Builder builder()
Returns a mutable builder for constructing an instance of ServerVad.
-
-
-
-