Class TranscriptionSessionUpdate.Session
-
- All Implemented Interfaces:
public final class TranscriptionSessionUpdate.Session
Realtime transcription session object configuration.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description public final class
TranscriptionSessionUpdate.Session.Builder
A builder for Session.
public final class
TranscriptionSessionUpdate.Session.ClientSecret
Configuration options for the generated client secret.
public final class
TranscriptionSessionUpdate.Session.InputAudioFormat
The format of input audio. Options are
pcm16
,g711_ulaw
, org711_alaw
. Forpcm16
, input audio must be 16-bit PCM at a 24kHz sample rate, single channel (mono), and little-endian byte order.public final class
TranscriptionSessionUpdate.Session.InputAudioNoiseReduction
Configuration for input audio noise reduction. This can be set to
null
to turn off. Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model. Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.public final class
TranscriptionSessionUpdate.Session.InputAudioTranscription
Configuration for input audio transcription. The client can optionally set the language and prompt for transcription, these offer additional guidance to the transcription service.
public final class
TranscriptionSessionUpdate.Session.Modality
public final class
TranscriptionSessionUpdate.Session.TurnDetection
Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to
null
to turn off, in which case the client must manually trigger model response. Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech. Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with "uhhm", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.
-
Method Summary
-
-
Method Detail
-
clientSecret
final Optional<TranscriptionSessionUpdate.Session.ClientSecret> clientSecret()
Configuration options for the generated client secret.
-
include
final Optional<List<String>> include()
The set of items to include in the transcription. Current available items are:
item.input_audio_transcription.logprobs
-
inputAudioFormat
final Optional<TranscriptionSessionUpdate.Session.InputAudioFormat> inputAudioFormat()
The format of input audio. Options are
pcm16
,g711_ulaw
, org711_alaw
. Forpcm16
, input audio must be 16-bit PCM at a 24kHz sample rate, single channel (mono), and little-endian byte order.
-
inputAudioNoiseReduction
final Optional<TranscriptionSessionUpdate.Session.InputAudioNoiseReduction> inputAudioNoiseReduction()
Configuration for input audio noise reduction. This can be set to
null
to turn off. Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model. Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.
-
inputAudioTranscription
final Optional<TranscriptionSessionUpdate.Session.InputAudioTranscription> inputAudioTranscription()
Configuration for input audio transcription. The client can optionally set the language and prompt for transcription, these offer additional guidance to the transcription service.
-
modalities
final Optional<List<TranscriptionSessionUpdate.Session.Modality>> modalities()
The set of modalities the model can respond with. To disable audio, set this to "text".
-
turnDetection
final Optional<TranscriptionSessionUpdate.Session.TurnDetection> turnDetection()
Configuration for turn detection, ether Server VAD or Semantic VAD. This can be set to
null
to turn off, in which case the client must manually trigger model response. Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech. Semantic VAD is more advanced and uses a turn detection model (in conjunction with VAD) to semantically estimate whether the user has finished speaking, then dynamically sets a timeout based on this probability. For example, if user audio trails off with "uhhm", the model will score a low probability of turn end and wait longer for the user to continue speaking. This can be useful for more natural conversations, but may have a higher latency.
-
_clientSecret
final JsonField<TranscriptionSessionUpdate.Session.ClientSecret> _clientSecret()
Returns the raw JSON value of clientSecret.
Unlike clientSecret, this method doesn't throw if the JSON field has an unexpected type.
-
_include
final JsonField<List<String>> _include()
Returns the raw JSON value of include.
Unlike include, this method doesn't throw if the JSON field has an unexpected type.
-
_inputAudioFormat
final JsonField<TranscriptionSessionUpdate.Session.InputAudioFormat> _inputAudioFormat()
Returns the raw JSON value of inputAudioFormat.
Unlike inputAudioFormat, this method doesn't throw if the JSON field has an unexpected type.
-
_inputAudioNoiseReduction
final JsonField<TranscriptionSessionUpdate.Session.InputAudioNoiseReduction> _inputAudioNoiseReduction()
Returns the raw JSON value of inputAudioNoiseReduction.
Unlike inputAudioNoiseReduction, this method doesn't throw if the JSON field has an unexpected type.
-
_inputAudioTranscription
final JsonField<TranscriptionSessionUpdate.Session.InputAudioTranscription> _inputAudioTranscription()
Returns the raw JSON value of inputAudioTranscription.
Unlike inputAudioTranscription, this method doesn't throw if the JSON field has an unexpected type.
-
_modalities
final JsonField<List<TranscriptionSessionUpdate.Session.Modality>> _modalities()
Returns the raw JSON value of modalities.
Unlike modalities, this method doesn't throw if the JSON field has an unexpected type.
-
_turnDetection
final JsonField<TranscriptionSessionUpdate.Session.TurnDetection> _turnDetection()
Returns the raw JSON value of turnDetection.
Unlike turnDetection, this method doesn't throw if the JSON field has an unexpected type.
-
_additionalProperties
final Map<String, JsonValue> _additionalProperties()
-
toBuilder
final TranscriptionSessionUpdate.Session.Builder toBuilder()
-
validate
final TranscriptionSessionUpdate.Session validate()
-
builder
final static TranscriptionSessionUpdate.Session.Builder builder()
Returns a mutable builder for constructing an instance of Session.
-
-
-
-