Training an automatic speech recognition system using compressed word frequencies

    公开(公告)号:US08543398B1

    公开(公告)日:2013-09-24

    申请号:US13666223

    申请日:2012-11-01

    Applicant: Google Inc.

    CPC classification number: G10L15/063

    Abstract: Respective word frequencies may be determined from a corpus of utterance-to-text-string mappings that contain associations between audio utterances and a respective text string transcription of each audio utterance. Respective compressed word frequencies may be obtained based on the respective word frequencies such that the distribution of the respective compressed word frequencies has a lower variance than the distribution of the respective word frequencies. Sample utterance-to-text-string mappings may be selected from the corpus of utterance-to-text-string mappings based on the compressed word frequencies. An automatic speech recognition (ASR) system may be trained with the sample utterance-to-text-string mappings.

    Estimating Speech in the Presence of Noise
    2.
    发明申请
    Estimating Speech in the Presence of Noise 审中-公开
    估计噪音的演讲

    公开(公告)号:US20150287406A1

    公开(公告)日:2015-10-08

    申请号:US13771419

    申请日:2013-02-20

    Applicant: Google Inc.

    CPC classification number: G10L15/20 G10L21/0232

    Abstract: A method for estimating speech signal in the presence of non-stationary noise includes determining a plurality of initial speech estimates by subtracting a plurality of noise spectra, respectively, from an observed spectrum. Each of the noise spectra is represented by a noise component vector obtained from a Gaussian mixture model. The method also includes determining a plurality of initial noise estimates by subtracting a plurality of speech spectra, respectively, from the observed spectrum. Each of the speech spectra is represented by a speech component vector obtained from another Gaussian mixture model. A plurality of scores is determined, each score corresponding to one of the plurality of initial speech estimates, and calculated from a joint distribution defined by a combination of one of the noise component vectors and one of the speech component vectors. A clean speech estimate is determined as a combination of a subset of the scores.

    Abstract translation: 用于在存在非平稳噪声的情况下估计语音信号的方法包括通过从观测频谱中分别减去多个噪声谱来确定多个初始语音估计。 每个噪声频谱由从高斯混合模型获得的噪声分量矢量表示。 该方法还包括通过从观察到的频谱中分别减去多个语音频谱来确定多个初始噪声估计。 每个语音频谱由从另一个高斯混合模型获得的语音分量向量表示。 确定多个分数,每个分数对应于多个初始语音估计中的一个,并且根据由噪声分量矢量中的一个和语音分量矢量之一组合定义的联合分布来计算。 干净的语音估计被确定为分数子集的组合。

    Parallel recognition
    3.
    发明授权
    Parallel recognition 有权
    平行识别

    公开(公告)号:US09286894B1

    公开(公告)日:2016-03-15

    申请号:US13755070

    申请日:2013-01-31

    Applicant: Google Inc.

    CPC classification number: G10L15/32 G10L15/083 G10L2015/025

    Abstract: Recognition techniques may include the following. On a first processing entity, a first recognition process is performed on a first element, where the first recognition process includes: in a first state machine having M (M>1) states, determining a first best path cost in at least a subset of the M states for at least part of the first element. On a second processing entity, a second recognition process is performed on a second element, where the second recognition process includes: in a second state machine having N (N>1) states, determining a second best path cost in at least a subset of the N states for at least part of the second element. At least one of the following is done: (i) passing the first best path cost to the second state machine, or (ii) passing the second best path cost to the first state machine. The foregoing techniques may include one or more of the following features, either alone or in combination.

    Abstract translation: 识别技术可能包括以下内容。 在第一处理实体上,对第一元素执行第一识别处理,其中第一识别过程包括:在具有M(M> 1)状态的第一状态机中,确定至少一个子集中的第一最佳路径开销 M状态至少部分是第一个元素。 在第二处理实体上,对第二元素执行第二识别处理,其中第二识别处理包括:在具有N(N> 1)个状态的第二状态机中,确定至少一个子集中的第二最佳路径开销 N个状态至少部分是第二个元素。 至少执行以下操作之一:(i)将第一最佳路径开销传递给第二状态机,或(ii)将第二最佳路径开销传递给第一状态机。 上述技术可以单独地或组合地包括一个或多个以下特征。

    Training an automatic speech recognition system using compressed word frequencies
    4.
    发明授权
    Training an automatic speech recognition system using compressed word frequencies 有权
    训练使用压缩字频率的自动语音识别系统

    公开(公告)号:US09123331B1

    公开(公告)日:2015-09-01

    申请号:US13967965

    申请日:2013-08-15

    Applicant: Google Inc.

    CPC classification number: G10L15/063

    Abstract: Respective word frequencies may be determined from a corpus of utterance-to-text-string mappings that contain associations between audio utterances and a respective text string transcription of each audio utterance. Respective compressed word frequencies may be obtained based on the respective word frequencies such that the distribution of the respective compressed word frequencies has a lower variance than the distribution of the respective word frequencies. Sample utterance-to-text-string mappings may be selected from the corpus of utterance-to-text-string mappings based on the compressed word frequencies. An automatic speech recognition (ASR) system may be trained with the sample utterance-to-text-string mappings.

    Abstract translation: 可以从包含音频话语和每个音频话语的相应文本串转录之间的关联的话语到文本串映射的语料库来确定相应的词频率。 可以基于相应的字频率来获得各个压缩字频率,使得各个压缩字频率的分布具有比各个字频率的分布更低的方差。 可以从基于压缩字频率的话语到文本串映射的语料库中选择示例到文本串的映射。 自动语音识别(ASR)系统可以用样本话语到文本串映射进行训练。

    Speech recognition process
    5.
    发明授权
    Speech recognition process 有权
    语音识别过程

    公开(公告)号:US08775177B1

    公开(公告)日:2014-07-08

    申请号:US13665245

    申请日:2012-10-31

    Applicant: Google Inc.

    CPC classification number: G10L15/10 G10L2015/085

    Abstract: A speech recognition process may perform the following operations: performing a preliminary recognition process on first audio to identify candidates for the first audio; generating first templates corresponding to the first audio, where each first template includes a number of elements; selecting second templates corresponding to the candidates, where the second templates represent second audio, and where each second template includes elements that correspond to the elements in the first templates; comparing the first templates to the second templates, where comparing comprises includes similarity metrics between the first templates and corresponding second templates; applying weights to the similarity metrics to produce weighted similarity metrics, where the weights are associated with corresponding second templates; and using the weighted similarity metrics to determine whether the first audio corresponds to the second audio.

    Abstract translation: 语音识别处理可以执行以下操作:对第一音频执行初步识别处理以识别第一音频的候选; 生成与第一音频相对应的第一模板,其中每个第一模板包括多个元素; 选择与候选对应的第二模板,其中第二模板表示第二音频,并且其中每个第二模板包括与第一模板中的元素相对应的元素; 将第一模板与第二模板进行比较,其中比较包括第一模板与对应的第二模板之间的相似性度量; 对所述相似性度量应用权重以产生加权相似性度量,其中所述权重与相应的第二模板相关联; 以及使用所述加权相似性度量来确定所述第一音频是否对应于所述第二音频。

Patent Agency Ranking