Class RealtimeTruncation
-
- All Implemented Interfaces:
public final class RealtimeTruncationWhen the number of tokens in a conversation exceeds the model's input token limit, the conversation be truncated, meaning messages (starting from the oldest) will not be included in the model's context. A 32k context model with 4,096 max output tokens can only include 28,224 tokens in the context before truncation occurs. Clients can configure truncation behavior to truncate with a lower max token limit, which is an effective way to control token usage and cost. Truncation will reduce the number of cached tokens on the next turn (busting the cache), since messages are dropped from the beginning of the context. However, clients can also configure truncation to retain messages up to a fraction of the maximum context size, which will reduce the need for future truncations and thus improve the cache rate. Truncation can be disabled entirely, which means the server will never truncate but would instead return an error if the conversation exceeds the model's input token limit.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description public interfaceRealtimeTruncation.VisitorAn interface that defines how to map each variant of RealtimeTruncation to a value of type T.
public final classRealtimeTruncation.RealtimeTruncationStrategyThe truncation strategy to use for the session.
autois the default truncation strategy.disabledwill disable truncation and emit errors when the conversation exceeds the input token limit.
-
Method Summary
Modifier and Type Method Description final Optional<RealtimeTruncation.RealtimeTruncationStrategy>strategy()The truncation strategy to use for the session. final Optional<RealtimeTruncationRetentionRatio>retentionRatio()Retain a fraction of the conversation tokens when the conversation exceeds the input token limit. final BooleanisStrategy()final BooleanisRetentionRatio()final RealtimeTruncation.RealtimeTruncationStrategyasStrategy()The truncation strategy to use for the session. final RealtimeTruncationRetentionRatioasRetentionRatio()Retain a fraction of the conversation tokens when the conversation exceeds the input token limit. final Optional<JsonValue>_json()final <T extends Any> Taccept(RealtimeTruncation.Visitor<T> visitor)final RealtimeTruncationvalidate()final BooleanisValid()Booleanequals(Object other)IntegerhashCode()StringtoString()final static RealtimeTruncationofStrategy(RealtimeTruncation.RealtimeTruncationStrategy strategy)The truncation strategy to use for the session. final static RealtimeTruncationofRetentionRatio(RealtimeTruncationRetentionRatio retentionRatio)Retain a fraction of the conversation tokens when the conversation exceeds the input token limit. -
-
Method Detail
-
strategy
final Optional<RealtimeTruncation.RealtimeTruncationStrategy> strategy()
The truncation strategy to use for the session.
autois the default truncation strategy.disabledwill disable truncation and emit errors when the conversation exceeds the input token limit.
-
retentionRatio
final Optional<RealtimeTruncationRetentionRatio> retentionRatio()
Retain a fraction of the conversation tokens when the conversation exceeds the input token limit. This allows you to amortize truncations across multiple turns, which can help improve cached token usage.
-
isStrategy
final Boolean isStrategy()
-
isRetentionRatio
final Boolean isRetentionRatio()
-
asStrategy
final RealtimeTruncation.RealtimeTruncationStrategy asStrategy()
The truncation strategy to use for the session.
autois the default truncation strategy.disabledwill disable truncation and emit errors when the conversation exceeds the input token limit.
-
asRetentionRatio
final RealtimeTruncationRetentionRatio asRetentionRatio()
Retain a fraction of the conversation tokens when the conversation exceeds the input token limit. This allows you to amortize truncations across multiple turns, which can help improve cached token usage.
-
accept
final <T extends Any> T accept(RealtimeTruncation.Visitor<T> visitor)
-
validate
final RealtimeTruncation validate()
-
ofStrategy
final static RealtimeTruncation ofStrategy(RealtimeTruncation.RealtimeTruncationStrategy strategy)
The truncation strategy to use for the session.
autois the default truncation strategy.disabledwill disable truncation and emit errors when the conversation exceeds the input token limit.
-
ofRetentionRatio
final static RealtimeTruncation ofRetentionRatio(RealtimeTruncationRetentionRatio retentionRatio)
Retain a fraction of the conversation tokens when the conversation exceeds the input token limit. This allows you to amortize truncations across multiple turns, which can help improve cached token usage.
-
-
-
-