Interface ChatModelConfig
public interface ChatModelConfig
-
Field Summary
Fields -
Method Summary
Modifier and TypeMethodDescriptiondoSample()Whether or not to use sampling ; use greedy decoding otherwise.The URL of the inference endpoint for the chat model.Whether chat model requests should be loggedWhether chat model responses should be loggedInt (0-250).The parameter for repetition penalty. 1.0 means no penalty.If set tofalse, the return results will not contain the original query making it easier for promptingFloat (0.0-100.0).topK()The number of highest probability vocabulary tokens to keep for top-k-filtering.topP()If set to less than1, only the most probable tokens with probabilities that add up totop_por higher are kept for generation.If the model is not ready, wait for it instead of receiving 503.
-
Field Details
-
DEFAULT_INFERENCE_ENDPOINT
- See Also:
-
-
Method Details
-
inferenceEndpointUrl
@WithDefault("https://api-inference.huggingface.co/models/tiiuae/falcon-7b-instruct") URL inferenceEndpointUrl()The URL of the inference endpoint for the chat model.When using Hugging Face with the inference API, the URL is
https://api-inference.huggingface.co/models/<model-id>, for examplehttps://api-inference.huggingface.co/models/google/flan-t5-small.When using a deployed inference endpoint, the URL is the URL of the endpoint. When using a local hugging face model, the URL is the URL of the local model.
-
temperature
Float (0.0-100.0). The temperature of the sampling operation. 1 means regular sampling, 0 means always take the highest score, 100.0 is getting closer to uniform probability -
maxNewTokens
-
returnFullText
If set tofalse, the return results will not contain the original query making it easier for prompting -
waitForModel
If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error as it will limit hanging in your application to known places -
doSample
-
topK
OptionalInt topK()The number of highest probability vocabulary tokens to keep for top-k-filtering. -
topP
OptionalDouble topP()If set to less than1, only the most probable tokens with probabilities that add up totop_por higher are kept for generation. -
repetitionPenalty
OptionalDouble repetitionPenalty()The parameter for repetition penalty. 1.0 means no penalty. See this paper for more details. -
logRequests
-
logResponses
-