摘要:
A method and apparatus for aligning texts. The method includes acquiring a target text and a reference text and aligning the target text and the reference text at word level based on phoneme similarity. The method can be applied to automatically archiving a multimedia resource and a method of automatically searching a multimedia resource.
摘要:
A method and apparatus for aligning texts. The method includes acquiring a target text and a reference text and aligning the target text and the reference text at word level based on phoneme similarity. The method can be applied to automatically archiving a multimedia resource and a method of automatically searching a multimedia resource.
摘要:
A method, system and computer readable storage medium for assessing speech prosody. The method includes the steps of: receiving input speech data; acquiring a prosody constraint; assessing prosody of the input speech data according to the prosody constraint; and providing assessment result where at least of the steps is carried out using a computer device.
摘要:
A method and system for achieving emotional text to speech. The method includes: receiving text data; generating emotion tag for the text data by a rhythm piece; and achieving TTS to the text data corresponding to the emotion tag, where the emotion tags are expressed as a set of emotion vectors; where each emotion vector includes a plurality of emotion scores given based on a plurality of emotion categories. A system for the same includes: a text data receiving module; an emotion tag generating module; and a TTS module for achieving TTS, wherein the emotion tag is expressed as a set of emotion vectors; and wherein emotion vector includes a plurality of emotion scores given based on a plurality of emotion categories.
摘要:
A method and system for achieving emotional text to speech. The method includes: receiving text data; generating emotion tag for the text data by a rhythm piece; and achieving TTS to the text data corresponding to the emotion tag, where the emotion tags are expressed as a set of emotion vectors; where each emotion vector includes a plurality of emotion scores given based on a plurality of emotion categories. A system for the same includes: a text data receiving module; an emotion tag generating module; and a TTS module for achieving TTS, wherein the emotion tag is expressed as a set of emotion vectors; and wherein emotion vector includes a plurality of emotion scores given based on a plurality of emotion categories.
摘要:
A method, system and computer readable storage medium for assessing speech prosody. The method includes the steps of: receiving input speech data; acquiring a prosody constraint; assessing prosody of the input speech data according to the prosody constraint; and providing assessment result where at least of the steps is carried out using a computer device.
摘要:
A method for performing speech synthesis on textual content at a client. The method includes the steps of: performing speech synthesis on the textual content based on a current acoustical unit set Scurrent in a corpus at the client; analyzing the textual content and generating a list of target units with corresponding context features, selecting multiple acoustical unit candidates for each target unit according to the context features based on an acoustical unit set Stotal that is more plentiful than the current acoustical unit set Scurrent in the corpus at the client, and determining acoustical units suitable for speech synthesis for the textual content according to the multiple unit candidates; and updating the current acoustical unit set Scurrent in the corpus at the client based on the determined acoustical units.
摘要:
Techniques disclosed herein include systems and methods for open-domain voice-enabled searching that is speaker sensitive. Techniques include using speech information, speaker information, and information associated with a spoken query to enhance open voice search results. This includes integrating a textual index with a voice index to support the entire search cycle. Given a voice query, the system can execute two matching processes simultaneously. This can include a text matching process based on the output of speech recognition, as well as a voice matching process based on characteristics of a caller or user voicing a query. Characteristics of the caller can include output of voice feature extraction and metadata about the call. The system clusters callers according to these characteristics. The system can use specific voice and text clusters to modify speech recognition results, as well as modifying search results.
摘要:
Techniques disclosed herein include systems and methods for open-domain voice-enabled searching that is speaker sensitive. Techniques include using speech information, speaker information, and information associated with a spoken query to enhance open voice search results. This includes integrating a textual index with a voice index to support the entire search cycle. Given a voice query, the system can execute two matching processes simultaneously. This can include a text matching process based on the output of speech recognition, as well as a voice matching process based on characteristics of a caller or user voicing a query. Characteristics of the caller can include output of voice feature extraction and metadata about the call. The system clusters callers according to these characteristics. The system can use specific voice and text clusters to modify speech recognition results, as well as modifying search results.
摘要:
A method for performing speech synthesis to a textual content at a client. The method includes the steps of: performing speech synthesis to the textual content based on a current acoustical unit set Scurrent in a corpus at the client; analyzing the textual content and generating a list of target units with corresponding context features, selecting multiple acoustical unit candidates for each target unit according to the context features based on an acoustical unit set Stotal that is more plentiful than the current acoustical unit set Scurrent in the corpus at the client, and determining acoustical units suitable for speech synthesis for the textual content according to the multiple unit candidates; and updating the current acoustical unit set Scurrent in the corpus at the client based on the determined acoustical units.
摘要翻译:一种用于对客户端的文本内容执行语音合成的方法。 该方法包括以下步骤:基于在客户端的语料库中的当前声学单元集Scurrent对文本内容执行语音合成; 分析文本内容并生成具有相应上下文特征的目标单元的列表,根据上下文特征,基于声音单元组Stotal选择多个声学单元候选,该声音单元组Stotal比当前声学单元组S Current 客户端的语料库,并根据多个单元候选确定适用于文本内容的语音合成的声学单元; 以及基于所确定的声学单元来更新客户端处的语料库中的当前声学单元组Scurrent。