DEEP MULTI-CHANNEL ACOUSTIC MODELING
    41.
    发明申请

    公开(公告)号:US20200349928A1

    公开(公告)日:2020-11-05

    申请号:US16932049

    申请日:2020-07-17

    Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-channel DNN) that takes in raw signals and produces a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. These three models may be jointly optimized for speech processing (as opposed to individually optimized for signal enhancement), enabling improved performance despite a reduction in microphones and a reduction in bandwidth consumption during real-time processing.

    Stochastic modeling of user interactions with a detection system

    公开(公告)号:US09899021B1

    公开(公告)日:2018-02-20

    申请号:US14136712

    申请日:2013-12-20

    CPC classification number: G10L15/142 G10L15/14 G10L15/22 G10L2015/088

    Abstract: Features are disclosed for modeling user interaction with a detection system using a stochastic dynamical model in order to determine or adjust detection thresholds. The model may incorporate numerous features, such as the probability of false rejection and false acceptance of a user utterance and the cost associated with each potential action. The model may determine or adjust detection thresholds so as to minimize the occurrence of false acceptances and false rejections while preserving other desirable characteristics. The model may further incorporate background and speaker statistics. Adjustments to the model or other operation parameters can be implemented based on the model, user statistics, and/or additional data.

    Security measures for an electronic device

    公开(公告)号:US09706406B1

    公开(公告)日:2017-07-11

    申请号:US13747245

    申请日:2013-01-22

    CPC classification number: H04W12/08 G06F21/32 H04L63/0861 H04L63/102 H04W12/06

    Abstract: Approaches are described for detecting when an electronic device (such as a mobile phone) has been stolen or is otherwise being used by someone other than an authorized user of the device. At least one sensor of the device can obtain data during a current use of the device, and the device can determine from the data a set of available features. The features can be compared to a corresponding model associated with an owner (or other authorized user) of the device to generate a confidence value indicative of whether the current user operating the device is likely the owner of the device. The confidence value can be compared to at least one confidence threshold, for example, and based on the comparison, the current user can be provided access to at least a portion of functionality of the device and/or a security action can be performed when the confidence value does not at least meet at least one confidence threshold.

    LOW LATENCY AND MEMORY EFFICIENT KEYWORK SPOTTING

    公开(公告)号:US20170098442A1

    公开(公告)日:2017-04-06

    申请号:US15207183

    申请日:2016-07-11

    Abstract: Features are disclosed for spotting keywords in utterance audio data without requiring the entire utterance to first be processed. Likelihoods that a portion of the utterance audio data corresponds to the keyword may be compared to likelihoods that the portion corresponds to background audio (e.g., general speech and/or non-speech sounds). The difference in the likelihoods may be determined, and keyword may be triggered when the difference exceeds a threshold, or shortly thereafter. Traceback information and other data may be stored during the process so that a second speech processing pass may be performed. For efficient management of system memory, traceback information may only be stored for those frames that may encompass a keyword; the traceback information for older frames may be overwritten by traceback information for newer frames.

    LANGUAGE MODEL SPEECH ENDPOINTING
    48.
    发明申请
    LANGUAGE MODEL SPEECH ENDPOINTING 审中-公开
    语言模式语音终止

    公开(公告)号:US20160379632A1

    公开(公告)日:2016-12-29

    申请号:US14753811

    申请日:2015-06-29

    Abstract: An automatic speech recognition (ASR) system detects an endpoint of an utterance using the active hypotheses under consideration by a decoder. The ASR system calculates the amount of non-speech detected by a plurality of hypotheses and weights the non-speech duration by the probability of each hypotheses. When the aggregate weighted non-speech exceeds a threshold, an endpoint may be declared.

    Abstract translation: 自动语音识别(ASR)系统使用解码器考虑的活动假设来检测话音的端点。 ASR系统计算由多个假设检测到的非语音量,并以每个假设的概率对非语音持续时间加权。 当聚合加权非语音超过阈值时,可以声明端点。

    Speech recognizer with multi-directional decoding
    49.
    发明授权
    Speech recognizer with multi-directional decoding 有权
    语音识别器,具有多向解码功能

    公开(公告)号:US09286897B2

    公开(公告)日:2016-03-15

    申请号:US14039383

    申请日:2013-09-27

    Abstract: In an automatic speech recognition (ASR) processing system, ASR processing may be configured to process speech based on multiple channels of audio received from a beamformer. The ASR processing system may include a microphone array and the beamformer to output multiple channels of audio such that each channel isolates audio in a particular direction. The multichannel audio signals may include spoken utterances/speech from one or more speakers as well as undesired audio, such as noise from a household appliance. The ASR device may simultaneously perform speech recognition on the multi-channel audio to provide more accurate speech recognition results.

    Abstract translation: 在自动语音识别(ASR)处理系统中,ASR处理可以被配置为基于从波束形成器接收的多个音频信道来处理语音。 ASR处理系统可以包括麦克风阵列和波束形成器以输出多个音频通道,使得每个通道在特定方向上隔离音频。 多声道音频信号可以包括来自一个或多个扬声器的说话话音/语音以及不期望的音频,例如来自家用电器的噪声。 ASR设备可以同时对多声道音频执行语音识别,以提供更准确的语音识别结果。

    SPEECH MODEL RETRIEVAL IN DISTRIBUTED SPEECH RECOGNITION SYSTEMS
    50.
    发明申请
    SPEECH MODEL RETRIEVAL IN DISTRIBUTED SPEECH RECOGNITION SYSTEMS 有权
    分布式语音识别系统中的语音模型检索

    公开(公告)号:US20140163977A1

    公开(公告)日:2014-06-12

    申请号:US13712891

    申请日:2012-12-12

    CPC classification number: G10L15/32 G10L15/22 G10L15/30

    Abstract: Features are disclosed for managing the use of speech recognition models and data in automated speech recognition systems. Models and data may be retrieved asynchronously and used as they are received or after an utterance is initially processed with more general or different models. Once received, the models and statistics can be cached. Statistics needed to update models and data may also be retrieved asynchronously so that it may be used to update the models and data as it becomes available. The updated models and data may be immediately used to re-process an utterance, or saved for use in processing subsequently received utterances. User interactions with the automated speech recognition system may be tracked in order to predict when a user is likely to utilize the system. Models and data may be pre-cached based on such predictions.

    Abstract translation: 公开了用于管理语音识别模型和自动语音识别系统中的数据的使用的特征。 可以异步检索模型和数据,并在收到文字或使用更为一般或不同的模型对话语进行初始处理之后进行使用。 一旦收到,模型和统计信息可以被缓存。 还可以异步检索更新模型和数据所需的统计数据,以便可以在模型和数据可用时更新模型和数据。 可以立即使用更新的模型和数据来重新处理话语,或者保存用于处理随后接收的话语。 可以跟踪与自动语音识别系统的用户交互,以便预测用户什么时候可能利用该系统。 基于这样的预测,模型和数据可以被预先缓存。

Patent Agency Ranking