public class CreateRecTaskRequest extends AbstractModel
header, skipSign
Constructor and Description |
---|
CreateRecTaskRequest() |
CreateRecTaskRequest(CreateRecTaskRequest source)
NOTE: Any ambiguous key set via .set("AnyKey", "value") will be a shallow copy,
and any explicit key, i.e Foo, set via .setFoo("value") will be a deep copy.
|
Modifier and Type | Method and Description |
---|---|
String |
getCallbackUrl()
Get Callback URL
User-defined service URL for receiving recognition results.
|
Long |
getChannelNum()
Get Number of recognition channels.
|
Long |
getConvertNumMode()
Get Intelligent conversion into Arabic numerals (supported only for engines for recognizing audio in Mandarin).
|
String |
getCustomizationId()
Get This service is not available.
|
String |
getData()
Get Audio file Base64 code.
|
Long |
getDataLen()
Get Data length (before Base64 encoding).
|
Long |
getEmotionalEnergy()
Get Emotional energy value.
|
Long |
getEmotionRecognition()
Get This service is not available.
|
String |
getEngineModelType()
Get Engine model type.
|
String |
getExtra()
Get Additional parameter.
|
Long |
getFilterDirty()
Get Dirty word filtering (supported only for engines for recognizing audio in Mandarin).
|
Long |
getFilterModal()
Get Modal particle filtering (supported only for engines for recognizing audio in Mandarin).
|
Long |
getFilterPunc()
Get Punctuation filtering (supported only for engines for recognizing audio in Mandarin).
|
String |
getHotwordId()
Get This service is not available.
|
String |
getHotwordList()
Get Temporary term list.
|
String[] |
getKeyWordLibIdList()
Get List of keyword IDs for recognition.
|
Long |
getReinforceHotword()
Deprecated.
|
Long |
getResTextFormat()
Get Format of the returned recognition result.
|
Long |
getSentenceMaxLength()
Get The maximum number of characters per line (supported only for engines for recognizing audio in Mandarin).
|
Long |
getSourceType()
Get Audio source.
|
Long |
getSpeakerDiarization()
Get Whether to enable speaker separation.
|
Long |
getSpeakerNumber()
Get Number of speakers to be separated.
|
String |
getUrl()
Get Audio URL.
|
void |
setCallbackUrl(String CallbackUrl)
Set Callback URL
User-defined service URL for receiving recognition results.
|
void |
setChannelNum(Long ChannelNum)
Set Number of recognition channels.
|
void |
setConvertNumMode(Long ConvertNumMode)
Set Intelligent conversion into Arabic numerals (supported only for engines for recognizing audio in Mandarin).
|
void |
setCustomizationId(String CustomizationId)
Set This service is not available.
|
void |
setData(String Data)
Set Audio file Base64 code.
|
void |
setDataLen(Long DataLen)
Set Data length (before Base64 encoding).
|
void |
setEmotionalEnergy(Long EmotionalEnergy)
Set Emotional energy value.
|
void |
setEmotionRecognition(Long EmotionRecognition)
Set This service is not available.
|
void |
setEngineModelType(String EngineModelType)
Set Engine model type.
|
void |
setExtra(String Extra)
Set Additional parameter.
|
void |
setFilterDirty(Long FilterDirty)
Set Dirty word filtering (supported only for engines for recognizing audio in Mandarin).
|
void |
setFilterModal(Long FilterModal)
Set Modal particle filtering (supported only for engines for recognizing audio in Mandarin).
|
void |
setFilterPunc(Long FilterPunc)
Set Punctuation filtering (supported only for engines for recognizing audio in Mandarin).
|
void |
setHotwordId(String HotwordId)
Set This service is not available.
|
void |
setHotwordList(String HotwordList)
Set Temporary term list.
|
void |
setKeyWordLibIdList(String[] KeyWordLibIdList)
Set List of keyword IDs for recognition.
|
void |
setReinforceHotword(Long ReinforceHotword)
Deprecated.
|
void |
setResTextFormat(Long ResTextFormat)
Set Format of the returned recognition result.
|
void |
setSentenceMaxLength(Long SentenceMaxLength)
Set The maximum number of characters per line (supported only for engines for recognizing audio in Mandarin).
|
void |
setSourceType(Long SourceType)
Set Audio source.
|
void |
setSpeakerDiarization(Long SpeakerDiarization)
Set Whether to enable speaker separation.
|
void |
setSpeakerNumber(Long SpeakerNumber)
Set Number of speakers to be separated.
|
void |
setUrl(String Url)
Set Audio URL.
|
void |
toMap(HashMap<String,String> map,
String prefix)
Internal implementation, normal users should not use it.
|
any, fromJsonString, getBinaryParams, GetHeader, getMultipartRequestParams, getSkipSign, isStream, set, SetHeader, setParamArrayObj, setParamArraySimple, setParamObj, setParamSimple, setSkipSign, toJsonString
public CreateRecTaskRequest()
public CreateRecTaskRequest(CreateRecTaskRequest source)
public String getEngineModelType()
public void setEngineModelType(String EngineModelType)
EngineModelType
- Engine model type.
Each recognition engine adopts a specific billing plan. Engines marked with "large model version" adopt the large model billing plan. For product billing instructions, [click here] (https://intl.cloud.tencent.com/document/product/1093/35686?from_cn_redirect=1).
Note: If you want to recognize telecommunication audio but find that a 16k engine is required, you can use a 16k engine as described below for recognition. However, ** the 16k engines are not trained for recognizing telecommunication audio. Therefore, the recognition results cannot be guaranteed. You need to check whether the recognition results can be used. **
Engines for general scenarios:
Note: Use 16k engines for scenarios other than telecommunication. **
16k_zh_large: ** Engine (large model version) for Mandarin, Chinese dialects, and English. This engine supports recognizing audio in Chinese, English, and [various Chinese dialects] (https://intl.cloud.tencent.com/document/product/1093/35682?from_cn_redirect=1). It has a large number of parameters, enhanced performance, and greatly improved recognition accuracy for low-quality audio with loud noise, too much echo, low voice volume, or faint voices. [Click here] (https://console.cloud.tencent.com/asr/demonstrate) to compare the recognition performance of the 16k_zh engine and this one.
16k_multi_lang: ** Engine (large model version) for multiple languages. This engine supports recognizing audio in English, Japanese, Korean, Arabic, Filipino, French, Hindi, Indonesian, Malay, Portuguese, Spanish, Thai, Turkish, Vietnamese, and German (sentence-level or paragraph-level).
16k_zh-PY: ** Engine for Chinese, English, and Cantonese. The engine supports recognizing audio in Mandarin, English, and Cantonese at the same time.
16k_ms: ** Engine for Malay.
16k_id: ** Engine for Indonesian.
16k_th: ** Engine for Thai.public Long getChannelNum()
public void setChannelNum(Long ChannelNum)
ChannelNum
- Number of recognition channels.
1: Mono. (16k engines only support mono. ** Do no t** set to stereo.)
2: Stereo. (Stereo is supported only for 8k engines, and the two channels should correspond to the respective communication parties.)
Note:
16k engines: Only support mono. ** ChannelNum should be set to 1 **.
8k engines: Support both mono and stereo. ** It is recommended to set ChannelNum to 2 (indicating stereo) **. Stereo can physically distinguish speakers to avoid recognition mistakes caused by overlapping speech. It can provide the best speaker separation and recognition effects. Once stereo is set, the speakers are automatically separated. ** You do not need to enable the speaker separation feature **. You can use the default values for related parameters (** SpeakerDiarization and SpeakerNumber **). For speakerId in the returned ResultDetail, the value 0 represents the left channel, and the value 1 represents the right channel.public Long getResTextFormat()
public void setResTextFormat(Long ResTextFormat)
ResTextFormat
- Format of the returned recognition result.
0: The basic recognition result (containing only valid voice timestamps but no word-level [detailed recognition result] (https://intl.cloud.tencent.com/document/api/1093/37824?from_cn_redirect=1#SentenceDetail)).
1: The basic recognition result and word-level [detailed recognition result] (https://intl.cloud.tencent.com/document/api/1093/37824?from_cn_redirect=1#SentenceDetail) (containing word-level timestamps and speech speed value but ** no punctuation **).
2: The basic recognition result and word-level [detailed recognition result] (https://intl.cloud.tencent.com/document/api/1093/37824?from_cn_redirect=1#SentenceDetail) (containing word-level timestamps, speech speed value, and ** punctuation **).
3: The basic recognition result and word-level [detailed recognition result] (https://intl.cloud.tencent.com/document/api/1093/37824?from_cn_redirect=1#SentenceDetail) (containing word-level timestamps, speech speed value, and ** punctuation **). The recognition results are segmented by punctuation. ** This format applies to subtitle scenarios **.
4: ** [Value-added paid feature] ** The basic recognition result and word-level [detailed recognition result] (https://intl.cloud.tencent.com/document/api/1093/37824?from_cn_redirect=1#SentenceDetail) (containing word-level timestamps, speech speed value, and ** punctuation **). The recognition results are segmented by NLP semantics. ** This format applies to scenarios such as meeting and court record transcription ** and is supported only for 8k_zh and 16k_zh engines.
5: ** [Value-added paid feature] ** Basic recognition result and word-level [detailed recognition result] (https://intl.cloud.tencent.com/document/api/1093/37824?from_cn_redirect=1#SentenceDetail) (containing word-level timestamps, speech speed value, and ** punctuation **). The oral-to-written transcription result is also output, which has excluded modal particles and consecutive identical words, optimized expressions, and corrected speech mistakes. ** This format applies to scenarios of generating minutes for online and offline meetings** and is supported only for 8k_zh and 16k_zh engines.
Notes:
If this parameter is set to 4, make sure that a [semantics-based segmentation resource package] (https://intl.cloud.tencent.com/document/product/1093/35686?from_cn_redirect=1#97ae4aa0-29a0-4066-9f07-ccaf8856a16b) is purchased for your account or that your account has enabled post-payment. ** If post-payment is enabled and this parameter is set to 4, [automatic billing] (https://intl.cloud.tencent.com/document/product/1093/35686?from_cn_redirect=1#d912167d-ffd5-41a9-8b1c-2e89845a6852) will apply **.
If this parameter is set to 5, make sure that an [oral-to-written resource package] (https://intl.cloud.tencent.com/document/product/1093/35686?from_cn_redirect=1#97ae4aa0-29a0-4066-9f07-ccaf8856a16b) is purchased for your account or that your account has enabled post-payment. ** If post-payment is enabled and this parameter is set to 5, [automatic billing] (https://intl.cloud.tencent.com/document/product/1093/35686?from_cn_redirect=1#d912167d-ffd5-41a9-8b1c-2e89845a6852) will apply **.public Long getSourceType()
public void setSourceType(Long SourceType)
SourceType
- Audio source.
0: Audio URL.
1: Local audio file (body of the POST request).public String getData()
public void setData(String Data)
Data
- Audio file Base64 code.
This parameter is required if SourceType is set to 1. Otherwise, it can be left blank. **
Note: The audio data size cannot exceed 5 MB.public Long getDataLen()
public void setDataLen(Long DataLen)
DataLen
- Data length (before Base64 encoding).public String getUrl()
public void setUrl(String Url)
Url
- Audio URL. (The audio should be downloadable via a public network browser.)
This parameter is required if SourceType is set to 0. Otherwise, it can be left blank. **
Notes:
1. Make sure that the total audio duration of recording files does not exceed 5 hours. Otherwise, recognition may fail.
2. Pay attention to file download to avoid download failure.public String getCallbackUrl()
public void setCallbackUrl(String CallbackUrl)
CallbackUrl
- Callback URL
User-defined service URL for receiving recognition results.
For the callback format and content, see [Callback Description] (https://intl.cloud.tencent.com/document/product/1093/52632?from_cn_redirect=1).
Notes:
- If you use the polling method to obtain recognition results, this parameter is not required.
- It is recommended to include your business ID and other information in the callback URL for handling business logic.public Long getSpeakerDiarization()
public void setSpeakerDiarization(Long SpeakerDiarization)
SpeakerDiarization
- Whether to enable speaker separation.
0: Disable.
1: Enable. (This value is supported only for the following engines: 8k_zh, 16k_zh, 16k_ms, 16k_en, 16k_id, 16k_zh_large, and 16k_zh_dialect. ChannelNum should be set to 1.)
The default value is 0.
Note:
If an 8k engine is used and ChannelNum is set to 2 (stereo), use the default values for corresponding parameters as stated in the ** ChannelNum ** parameter description.public Long getSpeakerNumber()
public void setSpeakerNumber(Long SpeakerNumber)
SpeakerNumber
- Number of speakers to be separated.
Speaker separation must be enabled. Otherwise, this parameter does not take effect. ** Value range: 0-10.
0: Automatic separation. (Up to 20 speakers can be separated.)
1-10: Specify the number of speakers.
The default value is 0.public String getHotwordId()
public void setHotwordId(String HotwordId)
HotwordId
- This service is not available.@Deprecated public Long getReinforceHotword()
@Deprecated public void setReinforceHotword(Long ReinforceHotword)
ReinforceHotword
- This service is not available.public String getCustomizationId()
public void setCustomizationId(String CustomizationId)
CustomizationId
- This service is not available.public Long getEmotionRecognition()
public void setEmotionRecognition(Long EmotionRecognition)
EmotionRecognition
- This service is not available.public Long getEmotionalEnergy()
public void setEmotionalEnergy(Long EmotionalEnergy)
EmotionalEnergy
- Emotional energy value.
The value is the result of dividing the sound volume in dB by 10. Value range: [1,10]. The higher the value, the stronger the emotion.
0: Disable.
1: Enable.
The default value is 0.public Long getConvertNumMode()
public void setConvertNumMode(Long ConvertNumMode)
ConvertNumMode
- Intelligent conversion into Arabic numerals (supported only for engines for recognizing audio in Mandarin).
0: Do not convert, but directly output Chinese numerals.
1: Intelligently convert into Arabic numerals based on the scenario.
3: Enable conversion for math-related letters.
The default value is 1.public Long getFilterDirty()
public void setFilterDirty(Long FilterDirty)
FilterDirty
- Dirty word filtering (supported only for engines for recognizing audio in Mandarin).
0: Do not filter out dirty words.
1: Filter out dirty words.
2: Replace dirty words with *.
The default value is 0.public Long getFilterPunc()
public void setFilterPunc(Long FilterPunc)
FilterPunc
- Punctuation filtering (supported only for engines for recognizing audio in Mandarin).
0: Do not filter out punctuation.
1: Filter out sentence-ending punctuation.
2: Filter out all punctuation.
The default value is 0.public Long getFilterModal()
public void setFilterModal(Long FilterModal)
FilterModal
- Modal particle filtering (supported only for engines for recognizing audio in Mandarin).
0: Do not filter out modal particles.
1: Filter out specified modal particles.
2: Filter out all modal particles.
The default value is 0.public Long getSentenceMaxLength()
public void setSentenceMaxLength(Long SentenceMaxLength)
SentenceMaxLength
- The maximum number of characters per line (supported only for engines for recognizing audio in Mandarin). A punctuation mark is added if this limit is reached.
This parameter can control the maximum number of characters per line, which applies to subtitle generation scenarios. ** Value range: [6,40].
0: Disable this feature.
The default value is 0.
Note: To enable this feature, ResTextFormat should be set to 3. The recognition result can be obtained from FinalSentence by parsing the list in the returned ResultDetail.public String getExtra()
public void setExtra(String Extra)
Extra
- Additional parameter. ** (This parameter is meaningless. Ignore it.) **public String getHotwordList()
public void setHotwordList(String HotwordList)
HotwordList
- Temporary term list. This parameter is used to improve the recognition accuracy.
- Restrictions for individual terms: The format is "term|weight". Each term can contain no more than 30 characters (or 10 Chinese characters. The weight can be in the range of [1-11] or 100. For example, "Tencent Cloud|5" or "ASR|11".
- Restrictions for the temporary term list: Multiple terms are separated by commas. The list can contain up to 128 terms. For example, "Tencent Cloud|10, Audio Recognition|5, ASR|11".
- Difference between hotword_id (term list) and hotword_list (temporary term list):
- hotword_id: Term list. You need to create a term list in the console or by using the API first and obtain the term list ID.
- hotword_list: Temporary term list. You can directly enter the ID of the temporary term list each time you initiate a request. The temporary term list is not retained on the cloud. This parameter applies to users with a massive number of terms.
Notes:
- If both hotword_id and hotword_list are specified, hotword_list will take effect first.
- When the weight of a term is set to 11, this term becomes a super term. It is recommended that the weight is set to 11 only for critical and necessary terms. The overall accuracy will be affected if the weight is set to 11 for too many terms.
- When the weight of a term is set to 100, the term enhancement feature is enabled to replace homophones of this term. (This feature is supported only for 8k_zh and 16k_zh engines.) For example, if you configure "mizhi 1|100", the recognized word "mizhi 2", which is a homophone of "mizhi 2", will be forcibly replaced with "mizhi 2". It is recommended that you enable this feature based on the actual needs. You can set the weight to 100 for only critical and necessary terms. The overall accuracy will be affected if the weight is set to 100 for too many terms.public String[] getKeyWordLibIdList()
public void setKeyWordLibIdList(String[] KeyWordLibIdList)
KeyWordLibIdList
- List of keyword IDs for recognition. This parameter is left blank by default, indicating that no keyword is recognized. You can enter up to 10 IDs.Copyright © 2025. All rights reserved.