public class CreateAICallRequest extends AbstractModel
header, skipSign
Constructor and Description |
---|
CreateAICallRequest() |
CreateAICallRequest(CreateAICallRequest source)
NOTE: Any ambiguous key set via .set("AnyKey", "value") will be a shallow copy,
and any explicit key, i.e Foo, set via .setFoo("value") will be a deep copy.
|
Modifier and Type | Method and Description |
---|---|
String |
getAPIKey()
Get Model API key, for authentication information, please refer to the respective model's official website
- OpenAI protocol: [GPT](https://help.openai.com/en/articles/4936850-where-do-i-find-my-openai-api-key), [Hunyuan](https://intl.cloud.tencent.com/document/product/1729/111008?from_cn_redirect=1), [DeepSeek](https://api-docs.deepseek.com/zh-cn/);
- Azure protocol: [Azure GPT](https://learn.microsoft.com/en-us/azure/ai-services/openai/chatgpt-quickstart?tabs=command-line%2Ctypescript%2Cpython-new&pivots=programming-language-studio#key-settings);
- Minimax:[Minimax](https://platform.minimaxi.com/document/Fast%20access?key=66701cf51d57f38758d581b2)
|
String |
getAPIUrl()
Get Model interface address
- OpenAI protocol
GPT:"https://api.openai.com/v1/"
Hunyuan:"https://api.hunyuan.cloud.tencent.com/v1"
Deepseek:"https://api.deepseek.com/v1"
- Azure protocol
"https://{your-resource-name}.openai.azure.com?api-version={api-version}"
- Minimax protocol
"https://api.minimax.chat/v1"
|
String |
getCallee()
Get Called number.
|
String[] |
getCallers()
Get List of calling numbers.
|
String |
getCustomTTSConfig()
Get
|
String |
getEndFunctionDesc()
Get Effective when EndFunctionEnable is true; the description of call_end function calling, default is "End the call when user has to leave (like says bye) or you are instructed to do so."
|
Boolean |
getEndFunctionEnable()
Get Whether the model supports (or enables) call_end function calling
|
Long |
getInterruptMode()
Get Interrupt AI speech mode, default is 0, 0 indicates the server interrupts automatically, 1 indicates the server does not interrupt, interruption signal sent by the client side.
|
Long |
getInterruptSpeechDuration()
Get Used when InterruptMode is 0, unit in milliseconds, default is 500ms.
|
String[] |
getLanguages()
Get ASR Supported Languages, default is "zh" Chinese,
Fill in the array with up to 4 languages, the first is the primary language for recognition, followed by optional languages,
Note: When the primary language is a Chinese dialect, optional languages are invalid
Currently, the supported languages are as follows.
|
String |
getLLMType()
Get Model interface protocol types, currently compatible with three protocol types:
- OpenAI protocol (including GPT, Hunyuan, DeepSeek, etc.):"openai"
- Azure protocol:"azure"
- Minimax protocol:"minimax"
|
Long |
getMaxDuration()
Get Maximum Waiting Duration (milliseconds), default is 60 seconds, if the user does not speak within this time, the call is automatically terminated
|
String |
getModel()
Get Model name, such as
- OpenAI protocol
"gpt-4o-mini","gpt-4o","hunyuan-standard", "hunyuan-turbo","deepseek-chat";
- Azure protocol
"gpt-4o-mini", "gpt-4o";
- Minimax protocol
"deepseek-chat".
|
Long |
getNotifyDuration()
Get The duration after which the user hasn't spoken to trigger a notification, minimum 10 seconds, default 10 seconds
|
String |
getNotifyMessage()
Get The AI prompt when NotifyDuration has passed without the user speaking, default is "Sorry, I didn't hear you clearly.
|
Long |
getSdkAppId()
Get Application ID (required) can be found at https://console.cloud.tencent.com/ccc.
|
String |
getSystemPrompt()
Get ## Identity
You are Kate from the appointment department at Retell Health calling Cindy over the phone to prepare for the annual checkup coming up.
|
String |
getVoiceType()
Get The following voice parameter values are available by default.
|
String |
getWelcomeMessage()
Get Used to set the AI Agent Greeting.
|
Long |
getWelcomeType()
Get 0: Use welcomeMessage (if empty, the callee speaks first; if not empty, the bot speaks first)
1: Use AI to automatically generate welcomeMessage and speak first based on the prompt
|
void |
setAPIKey(String APIKey)
Set Model API key, for authentication information, please refer to the respective model's official website
- OpenAI protocol: [GPT](https://help.openai.com/en/articles/4936850-where-do-i-find-my-openai-api-key), [Hunyuan](https://intl.cloud.tencent.com/document/product/1729/111008?from_cn_redirect=1), [DeepSeek](https://api-docs.deepseek.com/zh-cn/);
- Azure protocol: [Azure GPT](https://learn.microsoft.com/en-us/azure/ai-services/openai/chatgpt-quickstart?tabs=command-line%2Ctypescript%2Cpython-new&pivots=programming-language-studio#key-settings);
- Minimax:[Minimax](https://platform.minimaxi.com/document/Fast%20access?key=66701cf51d57f38758d581b2)
|
void |
setAPIUrl(String APIUrl)
Set Model interface address
- OpenAI protocol
GPT:"https://api.openai.com/v1/"
Hunyuan:"https://api.hunyuan.cloud.tencent.com/v1"
Deepseek:"https://api.deepseek.com/v1"
- Azure protocol
"https://{your-resource-name}.openai.azure.com?api-version={api-version}"
- Minimax protocol
"https://api.minimax.chat/v1"
|
void |
setCallee(String Callee)
Set Called number.
|
void |
setCallers(String[] Callers)
Set List of calling numbers.
|
void |
setCustomTTSConfig(String CustomTTSConfig)
Set
|
void |
setEndFunctionDesc(String EndFunctionDesc)
Set Effective when EndFunctionEnable is true; the description of call_end function calling, default is "End the call when user has to leave (like says bye) or you are instructed to do so."
|
void |
setEndFunctionEnable(Boolean EndFunctionEnable)
Set Whether the model supports (or enables) call_end function calling
|
void |
setInterruptMode(Long InterruptMode)
Set Interrupt AI speech mode, default is 0, 0 indicates the server interrupts automatically, 1 indicates the server does not interrupt, interruption signal sent by the client side.
|
void |
setInterruptSpeechDuration(Long InterruptSpeechDuration)
Set Used when InterruptMode is 0, unit in milliseconds, default is 500ms.
|
void |
setLanguages(String[] Languages)
Set ASR Supported Languages, default is "zh" Chinese,
Fill in the array with up to 4 languages, the first is the primary language for recognition, followed by optional languages,
Note: When the primary language is a Chinese dialect, optional languages are invalid
Currently, the supported languages are as follows.
|
void |
setLLMType(String LLMType)
Set Model interface protocol types, currently compatible with three protocol types:
- OpenAI protocol (including GPT, Hunyuan, DeepSeek, etc.):"openai"
- Azure protocol:"azure"
- Minimax protocol:"minimax"
|
void |
setMaxDuration(Long MaxDuration)
Set Maximum Waiting Duration (milliseconds), default is 60 seconds, if the user does not speak within this time, the call is automatically terminated
|
void |
setModel(String Model)
Set Model name, such as
- OpenAI protocol
"gpt-4o-mini","gpt-4o","hunyuan-standard", "hunyuan-turbo","deepseek-chat";
- Azure protocol
"gpt-4o-mini", "gpt-4o";
- Minimax protocol
"deepseek-chat".
|
void |
setNotifyDuration(Long NotifyDuration)
Set The duration after which the user hasn't spoken to trigger a notification, minimum 10 seconds, default 10 seconds
|
void |
setNotifyMessage(String NotifyMessage)
Set The AI prompt when NotifyDuration has passed without the user speaking, default is "Sorry, I didn't hear you clearly.
|
void |
setSdkAppId(Long SdkAppId)
Set Application ID (required) can be found at https://console.cloud.tencent.com/ccc.
|
void |
setSystemPrompt(String SystemPrompt)
Set ## Identity
You are Kate from the appointment department at Retell Health calling Cindy over the phone to prepare for the annual checkup coming up.
|
void |
setVoiceType(String VoiceType)
Set The following voice parameter values are available by default.
|
void |
setWelcomeMessage(String WelcomeMessage)
Set Used to set the AI Agent Greeting.
|
void |
setWelcomeType(Long WelcomeType)
Set 0: Use welcomeMessage (if empty, the callee speaks first; if not empty, the bot speaks first)
1: Use AI to automatically generate welcomeMessage and speak first based on the prompt
|
void |
toMap(HashMap<String,String> map,
String prefix)
Internal implementation, normal users should not use it.
|
any, fromJsonString, getBinaryParams, GetHeader, getMultipartRequestParams, getSkipSign, set, SetHeader, setParamArrayObj, setParamArraySimple, setParamObj, setParamSimple, setSkipSign, toJsonString
public CreateAICallRequest()
public CreateAICallRequest(CreateAICallRequest source)
public Long getSdkAppId()
public void setSdkAppId(Long SdkAppId)
SdkAppId
- Application ID (required) can be found at https://console.cloud.tencent.com/ccc.public String getCallee()
public void setCallee(String Callee)
Callee
- Called number.public String getSystemPrompt()
public void setSystemPrompt(String SystemPrompt)
SystemPrompt
- ## Identity
You are Kate from the appointment department at Retell Health calling Cindy over the phone to prepare for the annual checkup coming up. You are a pleasant and friendly receptionist caring deeply for the user. You don't provide medical advice but would use the medical knowledge to understand user responses.
## Style Guardrails
Be Concise: Respond succinctly, addressing one topic at most.
Embrace Variety: Use diverse language and rephrasing to enhance clarity without repeating content.
Be Conversational: Use everyday language, making the chat feel like talking to a friend.
Be Proactive: Lead the conversation, often wrapping up with a question or next-step suggestion.
Avoid multiple questions in a single response.
Get clarity: If the user only partially answers a question, or if the answer is unclear, keep asking to get clarity.
Use a colloquial way of referring to the date (like Friday, January 14th, or Tuesday, January 12th, 2024 at 8am).
## Response Guideline
Adapt and Guess: Try to understand transcripts that may contain transcription errors. Avoid mentioning "transcription error" in the response.
Stay in Character: Keep conversations within your role's scope, guiding them back creatively without repeating.
Ensure Fluid Dialogue: Respond in a role-appropriate, direct manner to maintain a smooth conversation flow.
## Task
You will follow the steps below, do not skip steps, and only ask up to one question in response.
If at any time the user showed anger or wanted a human agent, call transfer_call to transfer to a human representative.
1. Begin with a self-introduction and verify if callee is Cindy.
- if callee is not Cindy, call end_call to hang up, say sorry for the confusion when hanging up.
- if Cindy is not available, call end_call politely to hang up, say you will call back later when hanging up.
2. Inform Cindy she has an annual body check coming up on April 4th, 2024 at 10am PDT. Check if Cindy is available.
- If not, tell Cindy to reschedule online and jump to step 5.
3. Ask Cindy if there's anything that the doctor should know before the annual checkup.
- Ask followup questions as needed to assess the severity of the issue, and understand how it has progressed.
4. Tell Cindy to not eat or drink that day before the checkup. Also tell Cindy to give you a callback if there's any changes in health condition.
5. Ask Cindy if she has any questions, and if so, answer them until there are no questions.
- If user asks something you do not know, let them know you don't have the answer. Ask them if they have any other questions.
- If user do not have any questions, call function end_call to hang up.public String getLLMType()
public void setLLMType(String LLMType)
LLMType
- Model interface protocol types, currently compatible with three protocol types:
- OpenAI protocol (including GPT, Hunyuan, DeepSeek, etc.):"openai"
- Azure protocol:"azure"
- Minimax protocol:"minimax"public String getModel()
public void setModel(String Model)
Model
- Model name, such as
- OpenAI protocol
"gpt-4o-mini","gpt-4o","hunyuan-standard", "hunyuan-turbo","deepseek-chat";
- Azure protocol
"gpt-4o-mini", "gpt-4o";
- Minimax protocol
"deepseek-chat".public String getAPIKey()
public void setAPIKey(String APIKey)
APIKey
- Model API key, for authentication information, please refer to the respective model's official website
- OpenAI protocol: [GPT](https://help.openai.com/en/articles/4936850-where-do-i-find-my-openai-api-key), [Hunyuan](https://intl.cloud.tencent.com/document/product/1729/111008?from_cn_redirect=1), [DeepSeek](https://api-docs.deepseek.com/zh-cn/);
- Azure protocol: [Azure GPT](https://learn.microsoft.com/en-us/azure/ai-services/openai/chatgpt-quickstart?tabs=command-line%2Ctypescript%2Cpython-new&pivots=programming-language-studio#key-settings);
- Minimax:[Minimax](https://platform.minimaxi.com/document/Fast%20access?key=66701cf51d57f38758d581b2)public String getAPIUrl()
public void setAPIUrl(String APIUrl)
APIUrl
- Model interface address
- OpenAI protocol
GPT:"https://api.openai.com/v1/"
Hunyuan:"https://api.hunyuan.cloud.tencent.com/v1"
Deepseek:"https://api.deepseek.com/v1"
- Azure protocol
"https://{your-resource-name}.openai.azure.com?api-version={api-version}"
- Minimax protocol
"https://api.minimax.chat/v1"public String getVoiceType()
public void setVoiceType(String VoiceType)
VoiceType
- The following voice parameter values are available by default. If you wish to customize the voice type, please leave VoiceType blank and configure it in the CustomTTSConfig parameter.
Chinese:
ZhiMei: Zhimei, customer service female voice
ZhiXi: Zhixi, general female voice
ZhiQi: Zhiqi, customer service female voice
ZhiTian: Zhitian, female child voice
AiXiaoJing: Ai Xiaojing, dialogue female voice
English:
WeRose:English Female Voice
Monika:English Female Voice
Japanese:
Nanami
Korean:
SunHi
Indonesian (Indonesia):
Gadis
Malay (Malaysia):
Yasmin
Tamil (Malaysia):
Kani
Thai (Thailand):
Achara
Vietnamese (Vietnam):
HoaiMypublic String[] getCallers()
public void setCallers(String[] Callers)
Callers
- List of calling numbers.public String getWelcomeMessage()
public void setWelcomeMessage(String WelcomeMessage)
WelcomeMessage
- Used to set the AI Agent Greeting.public Long getWelcomeType()
public void setWelcomeType(Long WelcomeType)
WelcomeType
- 0: Use welcomeMessage (if empty, the callee speaks first; if not empty, the bot speaks first)
1: Use AI to automatically generate welcomeMessage and speak first based on the promptpublic Long getMaxDuration()
public void setMaxDuration(Long MaxDuration)
MaxDuration
- Maximum Waiting Duration (milliseconds), default is 60 seconds, if the user does not speak within this time, the call is automatically terminatedpublic String[] getLanguages()
public void setLanguages(String[] Languages)
Languages
- ASR Supported Languages, default is "zh" Chinese,
Fill in the array with up to 4 languages, the first is the primary language for recognition, followed by optional languages,
Note: When the primary language is a Chinese dialect, optional languages are invalid
Currently, the supported languages are as follows. The English name of the language is on the left side of the equals sign, and the value to be filled in the Language field is on the right side, following ISO639:
1. Chinese = "zh" # Chinese
2. Chinese_TW = "zh-TW" # Taiwan (China)
3. Chinese_DIALECT = "zh-dialect" # Chinese Dialect
4. English = "en" # English
5. Vietnamese = "vi" # Vietnamese
6. Japanese = "ja" # Japanese
7. Korean = "ko" # Korean
8. Indonesia = "id" # Indonesian
9. Thai = "th" # Thai
10. Portuguese = "pt" # Portuguese
11. Turkish = "tr" # Turkish
12. Arabic = "ar" # Arabic
13. Spanish = "es" # Spanish
14. Hindi = "hi" # Hindi
15. French = "fr" # French
16. Malay = "ms" # Malay
17. Filipino = "fil" # Filipino
18. German = "de" # German
19. Italian = "it" # Italian
20. Russian = "ru" # Russianpublic Long getInterruptMode()
public void setInterruptMode(Long InterruptMode)
InterruptMode
- Interrupt AI speech mode, default is 0, 0 indicates the server interrupts automatically, 1 indicates the server does not interrupt, interruption signal sent by the client side.public Long getInterruptSpeechDuration()
public void setInterruptSpeechDuration(Long InterruptSpeechDuration)
InterruptSpeechDuration
- Used when InterruptMode is 0, unit in milliseconds, default is 500ms. It means that the server-side detects ongoing vocal input for the InterruptSpeechDuration milliseconds and then interrupts.public Boolean getEndFunctionEnable()
public void setEndFunctionEnable(Boolean EndFunctionEnable)
EndFunctionEnable
- Whether the model supports (or enables) call_end function callingpublic String getEndFunctionDesc()
public void setEndFunctionDesc(String EndFunctionDesc)
EndFunctionDesc
- Effective when EndFunctionEnable is true; the description of call_end function calling, default is "End the call when user has to leave (like says bye) or you are instructed to do so."public Long getNotifyDuration()
public void setNotifyDuration(Long NotifyDuration)
NotifyDuration
- The duration after which the user hasn't spoken to trigger a notification, minimum 10 seconds, default 10 secondspublic String getNotifyMessage()
public void setNotifyMessage(String NotifyMessage)
NotifyMessage
- The AI prompt when NotifyDuration has passed without the user speaking, default is "Sorry, I didn't hear you clearly. Can you repeat that?"public String getCustomTTSConfig()
And VoiceType field needs to select one, here is to use your own custom TTS, VoiceType is some built-in sound qualities
{
"TTSType": "tencent", // String TTS type, currently supports "tencent" and "minixmax", other vendors support in progress
"AppId": "Your application ID", // String required
"SecretId": "Your Secret ID", // String Required
"SecretKey": "Your Secret Key", // String Required
"VoiceType": 101001, // Integer Required, Sound quality ID, includes standard and premium sound quality. Premium sound quality is more realistic and differently priced than standard sound quality. See TTS billing overview for details. For the full list of sound quality IDs, see the TTS sound quality list.
"Speed": 1.25, // Integer Optional, speech speed, range: [-2,6], corresponding to different speeds: -2: represents 0.6x -1: represents 0.8x 0: represents 1.0x (default) 1: represents 1.2x 2: represents 1.5x 6: represents 2.5x For more precise speed control, you can retain two decimal places, such as 0.5/1.25/2.81, etc. For parameter value to actual speed conversion, refer to Speed Conversion
"Volume": 5, // Integer Optional, Volume level, range: [0,10], corresponding to 11 levels of volume, default is 0, which represents normal volume.
"PrimaryLanguage": 1, // Integer Optional, Primary language 1- Chinese (default) 2- English 3- Japanese
"FastVoiceType": "xxxx" // Optional parameter, Fast VRS parameter
}
{
"TTSType": "minimax", // String TTS type,
"Model": "speech-01-turbo",
"APIUrl": "https://api.minimax.chat/v1/t2a_v2",
"APIKey": "eyxxxx",
"GroupId": "181000000000000",
"VoiceType":"female-tianmei-yujie",
"Speed": 1.2
}
For type of sound quality configuration, refer to theVolcano TTS documentation
TTS Sound Quality List - Voice Technology - Volcano Engine
Large Model TTS Sound Quality List - Voice Technology - Volcano Engine
{
"TTSType": "volcengine", // Required: String TTS type
"AppId" : "xxxxxxxx", // Required: String Volcano Engine assigned AppId
"Token" : "TY9d4sQXHxxxxxxx", // Required: String type Volcano Engine access token
"Speed" : 1.0, // Optional parameter: Playback speed, default is 1.0
"Volume": 1.0, // Optional parameter: Volume, default is 1.0
"Cluster" : "volcano_tts", // Optional parameter: Business cluster, default is volcano_tts
"VoiceType" : "zh_male_aojiaobazong_moon_bigtts" // Sound quality type, default is the sound quality of the large model TTS. If using normal TTS, fill in the corresponding sound quality type. Incorrect sound quality type will result in no sound.
}
{
"TTSType": "azure", // Required: String TTS type
"SubscriptionKey": "xxxxxxxx", // Required: String subscription key
"Region": "chinanorth3", // Required: String subscription region
"VoiceName": "zh-CN-XiaoxiaoNeural", // Required: String Timbre Name required
"Language": "zh-CN", // Required: String Language for synthesis
"Rate": 1 // Optional: float Playback Speed 0.5-2 default is 1
}
TTS
Please refer to the specific protocol standards in the Tencent documentation
{
"TTSType": "custom", // Required String
"APIKey": "ApiKey", // Required String for Authentication
"APIUrl": "http://0.0.0.0:8080/stream-audio" // Required String, TTS API URL
"AudioFormat": "wav", // String, optional, expected audio format, such as mp3, ogg_opus, pcm, wav, default is wav, currently only pcm and wav are supported,
"SampleRate": 16000, // Integer, optional, audio sample rate, default is 16000 (16k), recommended value is 16000
"AudioChannel": 1, // Integer, optional, number of audio channels, values: 1 or 2, default is 1
}
And VoiceType field needs to select one, here is to use your own custom TTS, VoiceType is some built-in sound qualities
{
"TTSType": "tencent", // String TTS type, currently supports "tencent" and "minixmax", other vendors support in progress
"AppId": "Your application ID", // String required
"SecretId": "Your Secret ID", // String Required
"SecretKey": "Your Secret Key", // String Required
"VoiceType": 101001, // Integer Required, Sound quality ID, includes standard and premium sound quality. Premium sound quality is more realistic and differently priced than standard sound quality. See TTS billing overview for details. For the full list of sound quality IDs, see the TTS sound quality list.
"Speed": 1.25, // Integer Optional, speech speed, range: [-2,6], corresponding to different speeds: -2: represents 0.6x -1: represents 0.8x 0: represents 1.0x (default) 1: represents 1.2x 2: represents 1.5x 6: represents 2.5x For more precise speed control, you can retain two decimal places, such as 0.5/1.25/2.81, etc. For parameter value to actual speed conversion, refer to Speed Conversion
"Volume": 5, // Integer Optional, Volume level, range: [0,10], corresponding to 11 levels of volume, default is 0, which represents normal volume.
"PrimaryLanguage": 1, // Integer Optional, Primary language 1- Chinese (default) 2- English 3- Japanese
"FastVoiceType": "xxxx" // Optional parameter, Fast VRS parameter
}
{
"TTSType": "minimax", // String TTS type,
"Model": "speech-01-turbo",
"APIUrl": "https://api.minimax.chat/v1/t2a_v2",
"APIKey": "eyxxxx",
"GroupId": "181000000000000",
"VoiceType":"female-tianmei-yujie",
"Speed": 1.2
}
For type of sound quality configuration, refer to theVolcano TTS documentation
TTS Sound Quality List - Voice Technology - Volcano Engine
Large Model TTS Sound Quality List - Voice Technology - Volcano Engine
{
"TTSType": "volcengine", // Required: String TTS type
"AppId" : "xxxxxxxx", // Required: String Volcano Engine assigned AppId
"Token" : "TY9d4sQXHxxxxxxx", // Required: String type Volcano Engine access token
"Speed" : 1.0, // Optional parameter: Playback speed, default is 1.0
"Volume": 1.0, // Optional parameter: Volume, default is 1.0
"Cluster" : "volcano_tts", // Optional parameter: Business cluster, default is volcano_tts
"VoiceType" : "zh_male_aojiaobazong_moon_bigtts" // Sound quality type, default is the sound quality of the large model TTS. If using normal TTS, fill in the corresponding sound quality type. Incorrect sound quality type will result in no sound.
}
{
"TTSType": "azure", // Required: String TTS type
"SubscriptionKey": "xxxxxxxx", // Required: String subscription key
"Region": "chinanorth3", // Required: String subscription region
"VoiceName": "zh-CN-XiaoxiaoNeural", // Required: String Timbre Name required
"Language": "zh-CN", // Required: String Language for synthesis
"Rate": 1 // Optional: float Playback Speed 0.5-2 default is 1
}
TTS
Please refer to the specific protocol standards in the Tencent documentation
{
"TTSType": "custom", // Required String
"APIKey": "ApiKey", // Required String for Authentication
"APIUrl": "http://0.0.0.0:8080/stream-audio" // Required String, TTS API URL
"AudioFormat": "wav", // String, optional, expected audio format, such as mp3, ogg_opus, pcm, wav, default is wav, currently only pcm and wav are supported,
"SampleRate": 16000, // Integer, optional, audio sample rate, default is 16000 (16k), recommended value is 16000
"AudioChannel": 1, // Integer, optional, number of audio channels, values: 1 or 2, default is 1
}
public void setCustomTTSConfig(String CustomTTSConfig)
And VoiceType field needs to select one, here is to use your own custom TTS, VoiceType is some built-in sound qualities
{
"TTSType": "tencent", // String TTS type, currently supports "tencent" and "minixmax", other vendors support in progress
"AppId": "Your application ID", // String required
"SecretId": "Your Secret ID", // String Required
"SecretKey": "Your Secret Key", // String Required
"VoiceType": 101001, // Integer Required, Sound quality ID, includes standard and premium sound quality. Premium sound quality is more realistic and differently priced than standard sound quality. See TTS billing overview for details. For the full list of sound quality IDs, see the TTS sound quality list.
"Speed": 1.25, // Integer Optional, speech speed, range: [-2,6], corresponding to different speeds: -2: represents 0.6x -1: represents 0.8x 0: represents 1.0x (default) 1: represents 1.2x 2: represents 1.5x 6: represents 2.5x For more precise speed control, you can retain two decimal places, such as 0.5/1.25/2.81, etc. For parameter value to actual speed conversion, refer to Speed Conversion
"Volume": 5, // Integer Optional, Volume level, range: [0,10], corresponding to 11 levels of volume, default is 0, which represents normal volume.
"PrimaryLanguage": 1, // Integer Optional, Primary language 1- Chinese (default) 2- English 3- Japanese
"FastVoiceType": "xxxx" // Optional parameter, Fast VRS parameter
}
{
"TTSType": "minimax", // String TTS type,
"Model": "speech-01-turbo",
"APIUrl": "https://api.minimax.chat/v1/t2a_v2",
"APIKey": "eyxxxx",
"GroupId": "181000000000000",
"VoiceType":"female-tianmei-yujie",
"Speed": 1.2
}
For type of sound quality configuration, refer to theVolcano TTS documentation
TTS Sound Quality List - Voice Technology - Volcano Engine
Large Model TTS Sound Quality List - Voice Technology - Volcano Engine
{
"TTSType": "volcengine", // Required: String TTS type
"AppId" : "xxxxxxxx", // Required: String Volcano Engine assigned AppId
"Token" : "TY9d4sQXHxxxxxxx", // Required: String type Volcano Engine access token
"Speed" : 1.0, // Optional parameter: Playback speed, default is 1.0
"Volume": 1.0, // Optional parameter: Volume, default is 1.0
"Cluster" : "volcano_tts", // Optional parameter: Business cluster, default is volcano_tts
"VoiceType" : "zh_male_aojiaobazong_moon_bigtts" // Sound quality type, default is the sound quality of the large model TTS. If using normal TTS, fill in the corresponding sound quality type. Incorrect sound quality type will result in no sound.
}
{
"TTSType": "azure", // Required: String TTS type
"SubscriptionKey": "xxxxxxxx", // Required: String subscription key
"Region": "chinanorth3", // Required: String subscription region
"VoiceName": "zh-CN-XiaoxiaoNeural", // Required: String Timbre Name required
"Language": "zh-CN", // Required: String Language for synthesis
"Rate": 1 // Optional: float Playback Speed 0.5-2 default is 1
}
TTS
Please refer to the specific protocol standards in the Tencent documentation
{
"TTSType": "custom", // Required String
"APIKey": "ApiKey", // Required String for Authentication
"APIUrl": "http://0.0.0.0:8080/stream-audio" // Required String, TTS API URL
"AudioFormat": "wav", // String, optional, expected audio format, such as mp3, ogg_opus, pcm, wav, default is wav, currently only pcm and wav are supported,
"SampleRate": 16000, // Integer, optional, audio sample rate, default is 16000 (16k), recommended value is 16000
"AudioChannel": 1, // Integer, optional, number of audio channels, values: 1 or 2, default is 1
}
CustomTTSConfig
- And VoiceType field needs to select one, here is to use your own custom TTS, VoiceType is some built-in sound qualities
{
"TTSType": "tencent", // String TTS type, currently supports "tencent" and "minixmax", other vendors support in progress
"AppId": "Your application ID", // String required
"SecretId": "Your Secret ID", // String Required
"SecretKey": "Your Secret Key", // String Required
"VoiceType": 101001, // Integer Required, Sound quality ID, includes standard and premium sound quality. Premium sound quality is more realistic and differently priced than standard sound quality. See TTS billing overview for details. For the full list of sound quality IDs, see the TTS sound quality list.
"Speed": 1.25, // Integer Optional, speech speed, range: [-2,6], corresponding to different speeds: -2: represents 0.6x -1: represents 0.8x 0: represents 1.0x (default) 1: represents 1.2x 2: represents 1.5x 6: represents 2.5x For more precise speed control, you can retain two decimal places, such as 0.5/1.25/2.81, etc. For parameter value to actual speed conversion, refer to Speed Conversion
"Volume": 5, // Integer Optional, Volume level, range: [0,10], corresponding to 11 levels of volume, default is 0, which represents normal volume.
"PrimaryLanguage": 1, // Integer Optional, Primary language 1- Chinese (default) 2- English 3- Japanese
"FastVoiceType": "xxxx" // Optional parameter, Fast VRS parameter
}
{
"TTSType": "minimax", // String TTS type,
"Model": "speech-01-turbo",
"APIUrl": "https://api.minimax.chat/v1/t2a_v2",
"APIKey": "eyxxxx",
"GroupId": "181000000000000",
"VoiceType":"female-tianmei-yujie",
"Speed": 1.2
}
For type of sound quality configuration, refer to theVolcano TTS documentation
TTS Sound Quality List - Voice Technology - Volcano Engine
Large Model TTS Sound Quality List - Voice Technology - Volcano Engine
{
"TTSType": "volcengine", // Required: String TTS type
"AppId" : "xxxxxxxx", // Required: String Volcano Engine assigned AppId
"Token" : "TY9d4sQXHxxxxxxx", // Required: String type Volcano Engine access token
"Speed" : 1.0, // Optional parameter: Playback speed, default is 1.0
"Volume": 1.0, // Optional parameter: Volume, default is 1.0
"Cluster" : "volcano_tts", // Optional parameter: Business cluster, default is volcano_tts
"VoiceType" : "zh_male_aojiaobazong_moon_bigtts" // Sound quality type, default is the sound quality of the large model TTS. If using normal TTS, fill in the corresponding sound quality type. Incorrect sound quality type will result in no sound.
}
{
"TTSType": "azure", // Required: String TTS type
"SubscriptionKey": "xxxxxxxx", // Required: String subscription key
"Region": "chinanorth3", // Required: String subscription region
"VoiceName": "zh-CN-XiaoxiaoNeural", // Required: String Timbre Name required
"Language": "zh-CN", // Required: String Language for synthesis
"Rate": 1 // Optional: float Playback Speed 0.5-2 default is 1
}
TTS
Please refer to the specific protocol standards in the Tencent documentation
{
"TTSType": "custom", // Required String
"APIKey": "ApiKey", // Required String for Authentication
"APIUrl": "http://0.0.0.0:8080/stream-audio" // Required String, TTS API URL
"AudioFormat": "wav", // String, optional, expected audio format, such as mp3, ogg_opus, pcm, wav, default is wav, currently only pcm and wav are supported,
"SampleRate": 16000, // Integer, optional, audio sample rate, default is 16000 (16k), recommended value is 16000
"AudioChannel": 1, // Integer, optional, number of audio channels, values: 1 or 2, default is 1
}
Copyright © 2025. All rights reserved.