Iterative text-to-speech with user feedback

    公开(公告)号:US09978359B1

    公开(公告)日:2018-05-22

    申请号:US14098677

    申请日:2013-12-06

    CPC classification number: G10L13/02 G10L13/00 G10L13/06

    Abstract: A text-to-speech (TTS) processing system may be configured for iterative processing. Speech units for unit selection may be tagged according to extra segmental features, such as emotional features, dramatic features, etc. Preliminary TTS results based on input text may be provided to a user through a user interface. The user may offer corrections to the preliminary results. Those corrections may correspond to the extra segmental features. The user corrections may then be input into the TTS system along with the input text to provide refined TTS results. This process may be repeated iteratively to obtain desired TTS results.

    PREDICTING PRONUNCIATION IN SPEECH RECOGNITION
    15.
    发明申请
    PREDICTING PRONUNCIATION IN SPEECH RECOGNITION 审中-公开
    预测语音识别中的授权

    公开(公告)号:US20150255069A1

    公开(公告)日:2015-09-10

    申请号:US14196055

    申请日:2014-03-04

    Abstract: An automatic speech recognition (ASR) device may be configured to predict pronunciations of textual identifiers (for example, song names, etc.) based on predicting one or more languages of origin of the textual identifier. The one or more languages of origin may be determined based on the textual identifier. The pronunciations may include a hybrid pronunciation including a pronunciation in one language, a pronunciation in a second language and a hybrid pronunciation that combines multiple languages. The pronunciations may be added to a lexicon and matched to the content item (e.g., song) and/or textual identifier. The ASR device may receive a spoken utterance from a user requesting the ASR device to access the content item. The ASR device determines whether the spoken utterance matches one of the pronunciations of the content item in the lexicon. The ASR device then accesses the content when the spoken utterance matches one of the potential textual identifier pronunciations.

    Abstract translation: 自动语音识别(ASR)设备可以被配置为基于预测文本标识符的一个或多个原始语言来预测文本标识符(例如,歌曲名称等)的发音。 可以基于文本标识符来确定一个或多个来源的语言。 发音可以包括混合发音,包括一种语言的发音,第二语言的发音和组合多种语言的混合发音。 发音可以被添加到词典中并与内容项(例如,歌曲)和/或文本标识符匹配。 ASR设备可以从请求ASR设备的用户接收到该内容项的语音话语。 ASR设备确定口语话语是否匹配词典中内容项的发音之一。 ASR设备然后在口语发音与潜在的文本标识符发音之一匹配时访问该内容。

    DIRECTION BASED END-POINTING FOR SPEECH RECOGNITION

    公开(公告)号:US20230395095A1

    公开(公告)日:2023-12-07

    申请号:US18182811

    申请日:2023-03-13

    CPC classification number: G10L25/87 G10L15/00 G10L25/78

    Abstract: A speech recognition system utilizing automatic speech recognition techniques such as end-pointing techniques in conjunction with beamforming and/or signal processing to isolate speech from one or more speaking users from multiple received audio signals and to detect the beginning and/or end of the speech based at least in part on the isolation. Audio capture devices such as microphones may be arranged in a beamforming array to receive the multiple audio signals. Multiple audio sources including speech may be identified in different beams and processed.

    Direction based end-pointing for speech recognition

    公开(公告)号:US11037584B2

    公开(公告)日:2021-06-15

    申请号:US16715026

    申请日:2019-12-16

    Abstract: A speech recognition system utilizing automatic speech recognition techniques such as end-pointing techniques in conjunction with beamforming and/or signal processing to isolate speech from one or more speaking users from multiple received audio signals and to detect the beginning and/or end of the speech based at least in part on the isolation. Audio capture devices such as microphones may be arranged in a beamforming array to receive the multiple audio signals. Multiple audio sources including speech may be identified in different beams and processed.

Patent Agency Ranking