Class RealtimeTranscriptionSessionCreateRequest
-
- All Implemented Interfaces:
public final class RealtimeTranscriptionSessionCreateRequest
Realtime transcription session object configuration.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description public final class
RealtimeTranscriptionSessionCreateRequest.Builder
A builder for RealtimeTranscriptionSessionCreateRequest.
public final class
RealtimeTranscriptionSessionCreateRequest.Model
ID of the model to use. The options are
gpt-4o-transcribe
,gpt-4o-mini-transcribe
, andwhisper-1
(which is powered by our open source Whisper V2 model).public final class
RealtimeTranscriptionSessionCreateRequest.Include
public final class
RealtimeTranscriptionSessionCreateRequest.InputAudioFormat
The format of input audio. Options are
pcm16
,g711_ulaw
, org711_alaw
. Forpcm16
, input audio must be 16-bit PCM at a 24kHz sample rate, single channel (mono), and little-endian byte order.public final class
RealtimeTranscriptionSessionCreateRequest.InputAudioNoiseReduction
Configuration for input audio noise reduction. This can be set to
null
to turn off. Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model. Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.public final class
RealtimeTranscriptionSessionCreateRequest.InputAudioTranscription
Configuration for input audio transcription. The client can optionally set the language and prompt for transcription, these offer additional guidance to the transcription service.
public final class
RealtimeTranscriptionSessionCreateRequest.TurnDetection
Configuration for turn detection. Can be set to
null
to turn off. Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.
-
Method Summary
-
-
Method Detail
-
model
final RealtimeTranscriptionSessionCreateRequest.Model model()
ID of the model to use. The options are
gpt-4o-transcribe
,gpt-4o-mini-transcribe
, andwhisper-1
(which is powered by our open source Whisper V2 model).
-
_type
final JsonValue _type()
The type of session to create. Always
transcription
for transcription sessions.Expected to always return the following:
JsonValue.from("transcription")
However, this method can be useful for debugging and logging (e.g. if the server responded with an unexpected value).
-
include
final Optional<List<RealtimeTranscriptionSessionCreateRequest.Include>> include()
The set of items to include in the transcription. Current available items are:
item.input_audio_transcription.logprobs
-
inputAudioFormat
final Optional<RealtimeTranscriptionSessionCreateRequest.InputAudioFormat> inputAudioFormat()
The format of input audio. Options are
pcm16
,g711_ulaw
, org711_alaw
. Forpcm16
, input audio must be 16-bit PCM at a 24kHz sample rate, single channel (mono), and little-endian byte order.
-
inputAudioNoiseReduction
final Optional<RealtimeTranscriptionSessionCreateRequest.InputAudioNoiseReduction> inputAudioNoiseReduction()
Configuration for input audio noise reduction. This can be set to
null
to turn off. Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model. Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.
-
inputAudioTranscription
final Optional<RealtimeTranscriptionSessionCreateRequest.InputAudioTranscription> inputAudioTranscription()
Configuration for input audio transcription. The client can optionally set the language and prompt for transcription, these offer additional guidance to the transcription service.
-
turnDetection
final Optional<RealtimeTranscriptionSessionCreateRequest.TurnDetection> turnDetection()
Configuration for turn detection. Can be set to
null
to turn off. Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.
-
_model
final JsonField<RealtimeTranscriptionSessionCreateRequest.Model> _model()
Returns the raw JSON value of model.
Unlike model, this method doesn't throw if the JSON field has an unexpected type.
-
_include
final JsonField<List<RealtimeTranscriptionSessionCreateRequest.Include>> _include()
Returns the raw JSON value of include.
Unlike include, this method doesn't throw if the JSON field has an unexpected type.
-
_inputAudioFormat
final JsonField<RealtimeTranscriptionSessionCreateRequest.InputAudioFormat> _inputAudioFormat()
Returns the raw JSON value of inputAudioFormat.
Unlike inputAudioFormat, this method doesn't throw if the JSON field has an unexpected type.
-
_inputAudioNoiseReduction
final JsonField<RealtimeTranscriptionSessionCreateRequest.InputAudioNoiseReduction> _inputAudioNoiseReduction()
Returns the raw JSON value of inputAudioNoiseReduction.
Unlike inputAudioNoiseReduction, this method doesn't throw if the JSON field has an unexpected type.
-
_inputAudioTranscription
final JsonField<RealtimeTranscriptionSessionCreateRequest.InputAudioTranscription> _inputAudioTranscription()
Returns the raw JSON value of inputAudioTranscription.
Unlike inputAudioTranscription, this method doesn't throw if the JSON field has an unexpected type.
-
_turnDetection
final JsonField<RealtimeTranscriptionSessionCreateRequest.TurnDetection> _turnDetection()
Returns the raw JSON value of turnDetection.
Unlike turnDetection, this method doesn't throw if the JSON field has an unexpected type.
-
_additionalProperties
final Map<String, JsonValue> _additionalProperties()
-
toBuilder
final RealtimeTranscriptionSessionCreateRequest.Builder toBuilder()
-
validate
final RealtimeTranscriptionSessionCreateRequest validate()
-
builder
final static RealtimeTranscriptionSessionCreateRequest.Builder builder()
Returns a mutable builder for constructing an instance of RealtimeTranscriptionSessionCreateRequest.
The following fields are required:
.model()
-
-
-
-