Package dev.langchain4j.service
Interface TokenStream
- All Known Implementing Classes:
AiServiceTokenStream
public interface TokenStream
Represents a token stream from the model to which you can subscribe and receive updates
when a new partial response (usually a single token) is available,
when the model finishes streaming, or when an error occurs during streaming.
It is intended to be used as a return type in AI Service.
-
Method Summary
Modifier and TypeMethodDescriptiondefault TokenStreambeforeToolExecution(Consumer<BeforeToolExecution> beforeToolExecutionHandler) The provided consumer will be invoked right before a tool is executed.All errors during streaming will be ignored (but will be logged with a WARN log level).onCompleteResponse(Consumer<dev.langchain4j.model.chat.response.ChatResponse> completeResponseHandler) The provided consumer will be invoked when a language model finishes streaming the final chat response, as opposed to the intermediate response (seeonIntermediateResponse(Consumer)).The provided consumer will be invoked when an error occurs during streaming.default TokenStreamonIntermediateResponse(Consumer<dev.langchain4j.model.chat.response.ChatResponse> intermediateResponseHandler) The provided consumer will be invoked when a language model finishes streaming the intermediate chat response, as opposed to the final response (seeonCompleteResponse(Consumer)).onPartialResponse(Consumer<String> partialResponseHandler) The provided consumer will be invoked every time a new partial textual response (usually a single token) from a language model is available.default TokenStreamonPartialThinking(Consumer<dev.langchain4j.model.chat.response.PartialThinking> partialThinkingHandler) The provided consumer will be invoked every time a new partial thinking/reasoning text (usually a single token) from a language model is available.onRetrieved(Consumer<List<dev.langchain4j.rag.content.Content>> contentHandler) The provided consumer will be invoked if anyContents are retrieved usingRetrievalAugmentor.onToolExecuted(Consumer<ToolExecution> toolExecuteHandler) The provided consumer will be invoked right after a tool is executed.voidstart()Completes the current token stream building and starts processing.
-
Method Details
-
onPartialResponse
The provided consumer will be invoked every time a new partial textual response (usually a single token) from a language model is available.- Parameters:
partialResponseHandler- lambda that will be invoked when a model generates a new partial textual response- Returns:
- token stream instance used to configure or start stream processing
-
onPartialThinking
@Experimental default TokenStream onPartialThinking(Consumer<dev.langchain4j.model.chat.response.PartialThinking> partialThinkingHandler) The provided consumer will be invoked every time a new partial thinking/reasoning text (usually a single token) from a language model is available.- Parameters:
partialThinkingHandler- lambda that will be invoked when a model generates a new partial thinking/reasoning text- Returns:
- token stream instance used to configure or start stream processing
- Since:
- 1.2.0
-
onRetrieved
The provided consumer will be invoked if anyContents are retrieved usingRetrievalAugmentor.The invocation happens before any call is made to the language model.
- Parameters:
contentHandler- lambda that consumes all retrieved contents- Returns:
- token stream instance used to configure or start stream processing
-
onIntermediateResponse
default TokenStream onIntermediateResponse(Consumer<dev.langchain4j.model.chat.response.ChatResponse> intermediateResponseHandler) The provided consumer will be invoked when a language model finishes streaming the intermediate chat response, as opposed to the final response (seeonCompleteResponse(Consumer)). Intermediate chat responses containToolExecutionRequests, AI service will execute them after returning from this consumer.- Parameters:
intermediateResponseHandler- lambda that consumes intermediate chat responses- Returns:
- token stream instance used to configure or start stream processing
- Since:
- 1.2.0
- See Also:
-
beforeToolExecution
The provided consumer will be invoked right before a tool is executed.- Parameters:
beforeToolExecutionHandler- lambda that consumesBeforeToolExecution- Returns:
- token stream instance used to configure or start stream processing
- Since:
- 1.2.0
-
onToolExecuted
The provided consumer will be invoked right after a tool is executed.The invocation happens after the tool method has finished and before any other tool is executed.
- Parameters:
toolExecuteHandler- lambda that consumesToolExecution- Returns:
- token stream instance used to configure or start stream processing
-
onCompleteResponse
TokenStream onCompleteResponse(Consumer<dev.langchain4j.model.chat.response.ChatResponse> completeResponseHandler) The provided consumer will be invoked when a language model finishes streaming the final chat response, as opposed to the intermediate response (seeonIntermediateResponse(Consumer)).Please note that
ChatResponse.tokenUsage()contains aggregate token usage across all calls to the LLM. It is a sum ofChatResponse.tokenUsage()s of all intermediate responses (onIntermediateResponse(Consumer)).- Parameters:
completeResponseHandler- lambda that will be invoked when language model finishes streaming- Returns:
- token stream instance used to configure or start stream processing
- See Also:
-
onError
The provided consumer will be invoked when an error occurs during streaming.- Parameters:
errorHandler- lambda that will be invoked when an error occurs- Returns:
- token stream instance used to configure or start stream processing
-
ignoreErrors
TokenStream ignoreErrors()All errors during streaming will be ignored (but will be logged with a WARN log level).- Returns:
- token stream instance used to configure or start stream processing
-
start
void start()Completes the current token stream building and starts processing.Will send a request to LLM and start response streaming.
-