System and method for enhancing voice-enabled search based on automated demographic identification
    31.
    发明授权
    System and method for enhancing voice-enabled search based on automated demographic identification 有权
    基于自动人口统计学识别来增强语音搜索的系统和方法

    公开(公告)号:US09189483B2

    公开(公告)日:2015-11-17

    申请号:US13847173

    申请日:2013-03-19

    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for approximating responses to a user speech query in voice-enabled search based on metadata that include demographic features of the speaker. A system practicing the method recognizes received speech from a speaker to generate recognized speech, identifies metadata about the speaker from the received speech, and feeds the recognized speech and the metadata to a question-answering engine. Identifying the metadata about the speaker is based on voice characteristics of the received speech. The demographic features can include age, gender, socio-economic group, nationality, and/or region. The metadata identified about the speaker from the received speech can be combined with or override self-reported speaker demographic information.

    Abstract translation: 本文公开的是基于包括说话者的人口统计特征的元数据的用于在基于语音的搜索中近似对用户语音查询的响应的系统,方法和非暂时计算机可读存储介质。 实施该方法的系统识别来自扬声器的接收到的语音以产生识别的语音,从接收到的语音识别关于说话者的元数据,并将识别的语音和元数据馈送到问答引擎。 识别关于扬声器的元数据是基于所接收语音的语音特征。 人口特征可以包括年龄,性别,社会经济群体,国籍和/或地区。 从接收到的语音中识别的关于说话者的元数据可以与自报告的说话者人口统计信息进行组合或覆盖。

    SYSTEM AND METHOD FOR RECOGNIZING SPEECH WITH DIALECT GRAMMARS
    32.
    发明申请
    SYSTEM AND METHOD FOR RECOGNIZING SPEECH WITH DIALECT GRAMMARS 有权
    用对角格子识别语音的系统和方法

    公开(公告)号:US20150279362A1

    公开(公告)日:2015-10-01

    申请号:US14735035

    申请日:2015-06-09

    CPC classification number: G10L15/19 G10L15/005 G10L15/1822 G10L15/183

    Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable media for recognizing speech. The method includes receiving speech from a user, perceiving at least one speech dialect in the received speech, selecting at least one grammar from a plurality of optimized dialect grammars based on at least one score associated with the perceived speech dialect and the perceived at least one speech dialect, and recognizing the received speech with the selected at least one grammar. Selecting at least one grammar can be further based on a user profile. Multiple grammars can be blended. Predefined parameters can include pronunciation differences, vocabulary, and sentence structure. Optimized dialect grammars can be domain specific. The method can further include recognizing initial received speech with a generic grammar until an optimized dialect grammar is selected. Selecting at least one grammar from a plurality of optimized dialect grammars can be based on a certainty threshold.

    Abstract translation: 这里公开了用于识别语音的系统,计算机实现的方法和计算机可读介质。 该方法包括从用户接收语音,感知所接收到的语音中的至少一个语音方言,基于与所感知的语音方言相关联的至少一个分数,从多个优化的方言语法中选择至少一个语法,以及感知的至少一个 语音方言,并用所选择的至少一种语法识别所接收的语音。 选择至少一个语法可以进一步基于用户简档。 可以混合多种语法。 预定义参数可以包括发音差异,词汇和句子结构。 优化的方言语法可以是域特定的。 该方法还可以包括用通用语法识别初始接收到的语音,直到选择优化的方言语法。 从多个优化方言语法中选择至少一个语法可以基于确定性阈值。

    Method of active learning for automatic speech recognition
    33.
    发明授权
    Method of active learning for automatic speech recognition 有权
    自动语音识别主动学习方法

    公开(公告)号:US08990084B2

    公开(公告)日:2015-03-24

    申请号:US14176439

    申请日:2014-02-10

    CPC classification number: G10L15/063

    Abstract: State-of-the-art speech recognition systems are trained using transcribed utterances, preparation of which is labor-intensive and time-consuming. The present invention is an iterative method for reducing the transcription effort for training in automatic speech recognition (ASR). Active learning aims at reducing the number of training examples to be labeled by automatically processing the unlabeled examples and then selecting the most informative ones with respect to a given cost function for a human to label. The method comprises automatically estimating a confidence score for each word of the utterance and exploiting the lattice output of a speech recognizer, which was trained on a small set of transcribed data. An utterance confidence score is computed based on these word confidence scores; then the utterances are selectively sampled to be transcribed using the utterance confidence scores.

    Abstract translation: 最先进的语音识别系统是使用转录语言进行训练,其准备是劳动密集型和耗时的。 本发明是用于减少自动语音识别(ASR)中训练的转录努力的迭代方法。 主动学习旨在通过自动处理未标记的示例,然后针对人类给定的成本函数选择最具信息的示例来减少要标注的培训示例的数量。 该方法包括自动估计每个词语的置信度,并利用在一小组转录数据上训练的语音识别器的格子输出。 基于这些单词置信度得出一个话语置信度得分; 然后使用话语置信度得分选择性地采样语音以进行转录。

    Universal semi-word model for vocabulary contraction in automatic speech recognition

    公开(公告)号:US12008986B1

    公开(公告)日:2024-06-11

    申请号:US16859938

    申请日:2020-04-27

    Abstract: A speech recognition system includes, or has access to, conventional speech recognizer data, including a conventional acoustic model and pronunciation dictionary. The speech recognition system generates restructured speech recognizer data from the conventional speech recognizer data. When used at runtime by a speech recognizer module, the restructured speech recognizer data produces more accurate and efficient results than those produced using the conventional speech recognizer data. The restructuring involves segmenting entries of the conventional pronunciation dictionary and acoustic model according to their constituent phonemes and grouping those entries with the same initial N phonemes, for some integer N (e.g., N=3), and deriving a restructured dictionary with a corresponding semi-word acoustic model for the various grouped entries. The decomposition of the conventional pronunciation dictionary into the restructured dictionary with semi-word acoustic model greatly reduces the number of possibilities in the dictionaries (e.g., from potentially unlimited to finite and relatively small), and also improves the accuracy of speech recognition.

    Real-time privacy filter
    35.
    发明授权

    公开(公告)号:US11210461B2

    公开(公告)日:2021-12-28

    申请号:US16027202

    申请日:2018-07-03

    Abstract: A masking system prevents a human agent from receiving sensitive personal information (SPI) provided by a caller during caller-agent communication. The masking system includes components for detecting the SPI, including automated speech recognition and natural language processing systems. When the caller communicates with the agent, e.g., via a phone call, the masking system processes the incoming caller audio. When the masking system detects SPI in the caller audio stream or when the masking system determines a high likelihood that incoming caller audio will include SPI, the caller audio is masked such that it cannot be heard by the agent. The masking system collects the SPI from the caller audio and sends it to the organization associated with the agent for processing the caller's request or transaction without giving the agent access to caller SPI.

    Bootstrapping multilingual natural language understanding via machine translation

    公开(公告)号:US10891435B1

    公开(公告)日:2021-01-12

    申请号:US15900687

    申请日:2018-02-20

    Abstract: Machine translation is used to leverage the semantic properties (e.g., intent) already known for one natural language for use in another natural language. In a first embodiment, the corpus of a first language is translated to each other language of interest using machine translation, and the corresponding semantic properties are transferred to the translated corpuses. Semantic models can then be generated from the translated corpuses and the transferred semantic properties. In a second embodiment, given a first language for which there is a semantic model, if a query is received in a second, different language lacking its own semantic model, machine translation is used to translate the query into the first language. Then, the semantic model for the first language is applied to the translated query, thereby obtaining the semantic properties for the query, even though no semantic model existed for the language in which the query was specified.

    Automated recognition system for natural language understanding

    公开(公告)号:US10147419B2

    公开(公告)日:2018-12-04

    申请号:US15251868

    申请日:2016-08-30

    Abstract: An interactive response system directs input to a software-based router, which is able to intelligently respond to the input by drawing on a combination of human agents, advanced recognition and expert systems. The system utilizes human “intent analysts” for purposes of interpreting customer input. Automated recognition subsystems are trained by coupling customer input with IA-selected intent corresponding to the input, using model-updating subsystems to develop the training information for the automated recognition subsystems.

    UNDERSPECIFICATION OF INTENTS IN A NATURAL LANGUAGE PROCESSING SYSTEM

    公开(公告)号:US20180174578A1

    公开(公告)日:2018-06-21

    申请号:US15384275

    申请日:2016-12-19

    CPC classification number: G06F17/30654 G06F17/2241 G06F17/2785

    Abstract: A natural language processing system has a hierarchy of user intents related to a domain of interest, the hierarchy having specific intents corresponding to leaf nodes of the hierarchy, and more general intents corresponding to ancestor nodes of the leaf nodes. The system also has a trained understanding model that can classify natural language utterances according to user intent. When the understanding model cannot determine with sufficient confidence that a natural language utterance corresponds to one of the specific intents, the natural language processing system traverses the hierarchy of intents to find a more general user intent that is related to the most applicable specific intent of the utterance and for which there is sufficient confidence. The general intent can then be used to prompt the user with questions applicable to the general intent to obtain the missing information needed for a specific intent.

Patent Agency Ranking