Error tolerant neural network model compression

    公开(公告)号:US10229356B1

    公开(公告)日:2019-03-12

    申请号:US14581969

    申请日:2014-12-23

    Abstract: Features are disclosed for error tolerant model compression. Such features could be used to reduce the size of a deep neural network model including several hidden node layers. The size reduction in an error tolerant fashion ensures predictive applications relying on the model do not experience performance degradation due to model compression. Such predictive applications include automatic recognition of speech, image recognition, and recommendation engines. Partially quantized models are re-trained such that any degradation of accuracy is “trained out” of the model providing improved error tolerance with compression.

    Speech recognition with combined grammar and statistical language models
    4.
    发明授权
    Speech recognition with combined grammar and statistical language models 有权
    语音识别与组合语法和统计语言模型

    公开(公告)号:US09449598B1

    公开(公告)日:2016-09-20

    申请号:US14037975

    申请日:2013-09-26

    Abstract: Features are disclosed for performing speech recognition on utterances using a grammar and a statistical language model, such as an n-gram model. States of the grammar may correspond to states of the statistical language model. Speech recognition may be initiated using the grammar. At a given state of the grammar, speech recognition may continue at a corresponding state of the statistical language model. Speech recognition may continue using the grammar in parallel with the statistical language model, or it may continue using the statistical language model exclusively. Scores associated with the correspondences between states (e.g., backoff arcs) may be determined according to a heuristically or based on test data.

    Abstract translation: 公开了使用语法和诸如n-gram模型的统计语言模型来进行语音识别的特征。 语法状态可能对应于统计语言模型的状态。 可以使用语法来启动语音识别。 在语法的给定状态下,语音识别可以在统计语言模型的相应状态下继续。 语音识别可以继续使用与统计语言模型并行的语法,或者可以继续使用统计语言模型。 可以根据启发式或基于测试数据来确定与状态之间的对应关系(例如,退避弧)相关联的分数。

    Estimating speaker-specific affine transforms for neural network based speech recognition systems
    5.
    发明授权
    Estimating speaker-specific affine transforms for neural network based speech recognition systems 有权
    基于神经网络的语音识别系统估计说话人特定的仿射变换

    公开(公告)号:US09378735B1

    公开(公告)日:2016-06-28

    申请号:US14135474

    申请日:2013-12-19

    CPC classification number: G10L15/16

    Abstract: Features are disclosed for estimating affine transforms in Log Filter-Bank Energy Space (“LFBE” space) in order to adapt artificial neural network-based acoustic models to a new speaker or environment. Neural network-based acoustic models may be trained using concatenated LFBEs as input features. The affine transform may be estimated by minimizing the least squares error between corresponding linear and bias transform parts for the resultant neural network feature vector and some standard speaker-specific feature vector obtained for a GMM-based acoustic model using constrained Maximum Likelihood Linear Regression (“cMLLR”) techniques. Alternatively, the affine transform may be estimated by minimizing the least squares error between the resultant transformed neural network feature and some standard speaker-specific feature obtained for a GMM-based acoustic model.

    Abstract translation: 公开了用于估计Log Filter-Bank Energy Space(“LFBE”空间)中的仿射变换的特征,以便将基于人造神经网络的声学模型适应于新的扬声器或环境。 可以使用连接的LFBE作为输入特征来训练基于神经网络的声学模型。 仿射变换可以通过最小化用于所得到的神经网络特征向量的相应线性偏置变换部分和偏置变换部分之间的最小二乘误差来估计,以及使用约束最大似然线性回归(“ cMLLR“)技术。 或者,可以通过最小化所得到的经变换的神经网络特征与为基于GMM的声学模型获得的某些标准的说话者特有特征之间的最小二乘误差来估计仿射变换。

    Speech model retrieval in distributed speech recognition systems
    6.
    发明授权
    Speech model retrieval in distributed speech recognition systems 有权
    分布式语音识别系统中的语音模型检索

    公开(公告)号:US09190057B2

    公开(公告)日:2015-11-17

    申请号:US13712891

    申请日:2012-12-12

    CPC classification number: G10L15/32 G10L15/22 G10L15/30

    Abstract: Features are disclosed for managing the use of speech recognition models and data in automated speech recognition systems. Models and data may be retrieved asynchronously and used as they are received or after an utterance is initially processed with more general or different models. Once received, the models and statistics can be cached. Statistics needed to update models and data may also be retrieved asynchronously so that it may be used to update the models and data as it becomes available. The updated models and data may be immediately used to re-process an utterance, or saved for use in processing subsequently received utterances. User interactions with the automated speech recognition system may be tracked in order to predict when a user is likely to utilize the system. Models and data may be pre-cached based on such predictions.

    Abstract translation: 公开了用于管理语音识别模型和自动语音识别系统中的数据的使用的特征。 可以异步检索模型和数据,并在收到文字或使用更为一般或不同的模型对话语进行初始处理之后进行使用。 一旦收到,模型和统计信息可以被缓存。 还可以异步检索更新模型和数据所需的统计数据,以便可以在模型和数据可用时更新模型和数据。 可以立即使用更新的模型和数据来重新处理话语,或者保存用于处理随后接收的话语。 可以跟踪与自动语音识别系统的用户交互,以便预测用户什么时候可能利用该系统。 基于这样的预测,模型和数据可以被预先缓存。

    Keyword spotting with competitor models
    7.
    发明授权
    Keyword spotting with competitor models 有权
    与竞争对手模型进行关键词查询

    公开(公告)号:US09159319B1

    公开(公告)日:2015-10-13

    申请号:US13692775

    申请日:2012-12-03

    CPC classification number: G10L15/08 G10L2015/088

    Abstract: Keyword spotting may be improved by using a competitor model. In some embodiments, audio data is received by a device. At least a portion of the audio data may be compared with a keyword model to obtain a first score. The keyword model may model a keyword. The portion of the audio data may also be compared with a competitor model to obtain a second score. The competitor model may model a competitor word, which may be a word that is similar to the keyword. The device may compare the first score and the second score to determine if a keyword is spoken.

    Abstract translation: 可以通过使用竞争对手模型来提高关键字的发现。 在一些实施例中,音频数据由设备接收。 可以将音频数据的至少一部分与关键词模型进行比较以获得第一分数。 关键字模型可以建模一个关键字。 音频数据的一部分也可以与竞争者模型进行比较以获得第二分。 竞争对手模型可以对竞争对手词进行建模,这可能是与关键字相似的单词。 设备可以比较第一分数和第二分数以确定是否说出关键字。

    SPEECH RECOGNIZER WITH MULTI-DIRECTIONAL DECODING
    8.
    发明申请
    SPEECH RECOGNIZER WITH MULTI-DIRECTIONAL DECODING 有权
    具有多方向解码的语音识别器

    公开(公告)号:US20150095026A1

    公开(公告)日:2015-04-02

    申请号:US14039383

    申请日:2013-09-27

    Abstract: In an automatic speech recognition (ASR) processing system, ASR processing may be configured to process speech based on multiple channels of audio received from a beamformer. The ASR processing system may include a microphone array and the beamformer to output multiple channels of audio such that each channel isolates audio in a particular direction. The multichannel audio signals may include spoken utterances/speech from one or more speakers as well as undesired audio, such as noise from a household appliance. The ASR device may simultaneously perform speech recognition on the multi-channel audio to provide more accurate speech recognition results.

    Abstract translation: 在自动语音识别(ASR)处理系统中,ASR处理可以被配置为基于从波束形成器接收的多个音频信道来处理语音。 ASR处理系统可以包括麦克风阵列和波束形成器以输出多个音频通道,使得每个通道在特定方向上隔离音频。 多声道音频信号可以包括来自一个或多个扬声器的说话话音/语音以及不期望的音频,例如来自家用电器的噪声。 ASR设备可以同时对多声道音频执行语音识别,以提供更准确的语音识别结果。

    Device-directed utterance detection

    公开(公告)号:US12236950B2

    公开(公告)日:2025-02-25

    申请号:US18149181

    申请日:2023-01-03

    Abstract: A speech interface device is configured to detect an interrupt event and process a voice command without detecting a wakeword. The device includes on-device interrupt architecture configured to detect when device-directed speech is present and send audio data to a remote system for speech processing. This architecture includes an interrupt detector that detects an interrupt event (e.g., device-directed speech) with low latency, enabling the device to quickly lower a volume of output audio and/or perform other actions in response to a potential voice command. In addition, the architecture includes a device directed classifier that processes an entire utterance and corresponding semantic information and detects device-directed speech with high accuracy. Using the device directed classifier, the device may reject the interrupt event and increase a volume of the output audio or may accept the interrupt event, causing the output audio to end and performing speech processing on the audio data.

    LANGUAGE MODEL ADAPTATION
    10.
    发明申请

    公开(公告)号:US20220358908A1

    公开(公告)日:2022-11-10

    申请号:US17706057

    申请日:2022-03-28

    Abstract: Exemplary embodiments relate to adapting a generic language model during runtime using domain-specific language model data. The system performs an audio frame-level analysis, to determine if the utterance corresponds to a particular domain and whether the ASR hypothesis needs to be rescored. The system processes, using a trained classifier, the ASR hypothesis (a partial hypothesis) generated for the audio data processed so far. The system determines whether to rescore the hypothesis after every few audio frames (representing a word in the utterance) are processed by the speech recognition system.

Patent Agency Ranking