Method and system for building an integrated user profile
    2.
    发明授权
    Method and system for building an integrated user profile 有权
    构建集成用户配置文件的方法和系统

    公开(公告)号:US09564123B1

    公开(公告)日:2017-02-07

    申请号:US14704833

    申请日:2015-05-05

    Abstract: A system and method are provided for adding user characterization information to a user profile by analyzing user's speech. User properties such as age, gender, accent, and English proficiency may be inferred by extracting and deriving features from user speech, without the user having to configure such information manually. A feature extraction module that receives audio signals as input extracts acoustic, phonetic, textual, linguistic, and semantic features. The module may be a system component independent of any particular vertical application or may be embedded in an application that accepts voice input and performs natural language understanding. A profile generation module receives the features extracted by the feature extraction module and uses classifiers to determine user property values based on the extracted and derived features and store these values in a user profile. The resulting profile variables may be globally available to other applications.

    Abstract translation: 提供了一种系统和方法,用于通过分析用户的语音将用户表征信息添加到用户简档。 用户属性如年龄,性别,口音和英语水平可以通过从用户语音中提取和导出特征来推断,而用户不必手动配置此类信息。 接收音频信号作为输入的特征提取模块提取声,语音,文本,语言和语义特征。 模块可以是独立于任何特定垂直应用的系统组件,或者可以嵌入在接受语音输入并执行自然语言理解的应用中。 简档生成模块接收由特征提取模块提取的特征,并使用分类器基于提取和导出的特征来确定用户属性值,并将这些值存储在用户简档中。 所得到的概要文件变量可能全局可用于其他应用程序。

    Dynamic interpolation for hybrid language models

    公开(公告)号:US11295732B2

    公开(公告)日:2022-04-05

    申请号:US16529730

    申请日:2019-08-01

    Abstract: In order to improve the accuracy of ASR, an utterance is transcribed using a plurality of language models, such as for example, an N-gram language model and a neural language model. The language models are trained separately. They each output a probability score or other figure of merit for a partial transcription hypothesis. Model scores are interpolated to determine a hybrid score. While recognizing an utterance, interpolation weights are chosen or updated dynamically, in the specific context of processing. The weights are based on dynamic variables associated with the utterance, the partial transcription hypothesis, or other aspects of context.

    Pronunciation guided by automatic speech recognition

    公开(公告)号:US10319250B2

    公开(公告)日:2019-06-11

    申请号:US15439883

    申请日:2017-02-22

    Abstract: Speech synthesis chooses pronunciations of words with multiple acceptable pronunciations based on an indication of a personal, class-based, or global preference or an intended non-preferred pronunciation. A speaker's words can be parroted back on personal devices using preferred pronunciations for accent training. Degrees of pronunciation error are computed and indicated to the user in a visual transcription or audibly as word emphasis in parroted speech. Systems can use sets of phonemes extended beyond those generally recognized for a language. Speakers are classified in order to choose specific phonetic dictionaries or adapt global ones. User profiles maintain lists of which pronunciations are preferred among ones acceptable for words with multiple recognized pronunciations. Systems use multiple correlations of word preferences across users to predict use preferences of unlisted words. Speaker-preferred pronunciations are used to weight the scores of transcription hypotheses based on phoneme sequence hypotheses in speech engines.

    DYNAMIC INTERPOLATION FOR HYBRID LANGUAGE MODELS

    公开(公告)号:US20210035569A1

    公开(公告)日:2021-02-04

    申请号:US16529730

    申请日:2019-08-01

    Abstract: In order to improve the accuracy of ASR, an utterance is transcribed using a plurality of language models, such as for example, an N-gram language model and a neural language model. The language models are trained separately. They each output a probability score or other figure of merit for a partial transcription hypothesis. Model scores are interpolated to determine a hybrid score. While recognizing an utterance, interpolation weights are chosen or updated dynamically, in the specific context of processing. The weights are based on dynamic variables associated with the utterance, the partial transcription hypothesis, or other aspects of context.

    Method and system for building an integrated user profile

    公开(公告)号:US10311858B1

    公开(公告)日:2019-06-04

    申请号:US15385493

    申请日:2016-12-20

    Abstract: A system and method are provided for adding user characterization information to a user profile by analyzing user's speech. User properties such as age, gender, accent, and English proficiency may be inferred by extracting and deriving features from user speech, without the user having to configure such information manually. A feature extraction module that receives audio signals as input extracts acoustic, phonetic, textual, linguistic, and semantic features. The module may be a system component independent of any particular vertical application or may be embedded in an application that accepts voice input and performs natural language understanding. A profile generation module receives the features extracted by the feature extraction module and uses classifiers to determine user property values based on the extracted and derived features and store these values in a user profile. The resulting profile variables may be globally available to other applications.

    PRONUNCIATION GUIDED BY AUTOMATIC SPEECH RECOGNITION

    公开(公告)号:US20180190269A1

    公开(公告)日:2018-07-05

    申请号:US15439883

    申请日:2017-02-22

    CPC classification number: G10L15/26 G09B5/04 G09B19/06 G10L13/00 G10L2015/225

    Abstract: Speech synthesis chooses pronunciations of words with multiple acceptable pronunciations based on an indication of a personal, class-based, or global preference or an intended non-preferred pronunciation. A speaker's words can be parroted back on personal devices using preferred pronunciations for accent training. Degrees of pronunciation error are computed and indicated to the user in a visual transcription or audibly as word emphasis in parroted speech. Systems can use sets of phonemes extended beyond those generally recognized for a language. Speakers are classified in order to choose specific phonetic dictionaries or adapt global ones. User profiles maintain lists of which pronunciations are preferred among ones acceptable for words with multiple recognized pronunciations. Systems use multiple correlations of word preferences across users to predict use preferences of unlisted words. Speaker-preferred pronunciations are used to weight the scores of transcription hypotheses based on phoneme sequence hypotheses in speech engines.

Patent Agency Ranking