-
1.
公开(公告)号:US20110224982A1
公开(公告)日:2011-09-15
申请号:US12722556
申请日:2010-03-12
IPC分类号: G10L15/02
CPC分类号: G10L15/08 , G10L2015/025
摘要: Described is a technology in which information retrieval (IR) techniques are used in a speech recognition (ASR) system. Acoustic units (e.g., phones, syllables, multi-phone units, words and/or phrases) are decoded, and features found from those acoustic units. The features are then used with IR techniques (e.g., TF-IDF based retrieval) to obtain a target output (a word or words). Also described is the use of IR techniques to provide a full large vocabulary continuous speech (LVCSR) recognizer
摘要翻译: 描述了在语音识别(ASR)系统中使用信息检索(IR)技术的技术。 声学单元(例如,电话,音节,多电话单元,单词和/或短语)被解码,并且从那些声学单元找到的特征。 然后将特征与IR技术(例如,基于TF-IDF的检索)一起使用以获得目标输出(一个或多个单词)。 还描述了使用IR技术来提供完整的大词汇连续语音(LVCSR)识别器
-
2.
公开(公告)号:US08401852B2
公开(公告)日:2013-03-19
申请号:US12626943
申请日:2009-11-30
IPC分类号: G10L15/04
摘要: A computer-implemented speech recognition system described herein includes a receiver component that receives a plurality of detected units of an audio signal, wherein the audio signal comprises a speech utterance of an individual. A selector component selects a subset of the plurality of detected units that correspond to a particular time-span. A generator component generates at least one feature with respect to the particular time-span, wherein the at least one feature is one of an existence feature, an expectation feature, or an edit distance feature. Additionally, a statistical speech recognition model outputs at least one word that corresponds to the particular time-span based at least in part upon the at least one feature generated by the feature generator component.
摘要翻译: 本文描述的计算机实现的语音识别系统包括接收组件,其接收多个检测到的音频信号的单元,其中该音频信号包括个人的讲话语音。 选择器部件选择对应于特定时间跨度的多个检测单元的子集。 发生器组件相对于特定时间跨度产生至少一个特征,其中所述至少一个特征是存在特征,期望特征或编辑距离特征之一。 另外,统计语音识别模型至少部分地基于由特征生成器组件生成的至少一个特征来输出对应于特定时间跨度的至少一个单词。
-
公开(公告)号:US20110131046A1
公开(公告)日:2011-06-02
申请号:US12626943
申请日:2009-11-30
IPC分类号: G10L15/04
摘要: A computer-implemented speech recognition system described herein includes a receiver component that receives a plurality of detected units of an audio signal, wherein the audio signal comprises a speech utterance of an individual. A selector component selects a subset of the plurality of detected units that correspond to a particular time-span. A generator component generates at least one feature with respect to the particular time-span, wherein the at least one feature is one of an existence feature, an expectation feature, or an edit distance feature. Additionally, a statistical speech recognition model outputs at least one word that corresponds to the particular time-span based at least in part upon the at least one feature generated by the feature generator component.
摘要翻译: 本文描述的计算机实现的语音识别系统包括接收组件,其接收多个检测到的音频信号的单元,其中该音频信号包括个人的讲话语音。 选择器部件选择对应于特定时间跨度的多个检测单元的子集。 发生器组件相对于特定时间跨度产生至少一个特征,其中所述至少一个特征是存在特征,期望特征或编辑距离特征之一。 另外,统计语音识别模型至少部分地基于由特征生成器组件生成的至少一个特征来输出对应于特定时间跨度的至少一个单词。
-
公开(公告)号:US08965765B2
公开(公告)日:2015-02-24
申请号:US12233826
申请日:2008-09-19
申请人: Geoffrey G. Zweig , Xiao Li , Dan Bohus , Alejandro Acero , Eric J. Horvitz
发明人: Geoffrey G. Zweig , Xiao Li , Dan Bohus , Alejandro Acero , Eric J. Horvitz
CPC分类号: G10L15/1822
摘要: Described is a technology by which a structured model of repetition is used to determine the words spoken by a user, and/or a corresponding database entry, based in part on a prior utterance. For a repeated utterance, a joint probability analysis is performed on (at least some of) the corresponding word sequences as recognized by one or more recognizers) and associated acoustic data. For example, a generative probabilistic model, or a maximum entropy model may be used in the analysis. The second utterance may be a repetition of the first utterance using the exact words, or another structural transformation thereof relative to the first utterance, such as an extension that adds one or more words, a truncation that removes one or more words, or a whole or partial spelling of one or more words.
摘要翻译: 描述了一种技术,通过该技术,部分地基于先前的话语,使用结构化重复模型来确定用户说出的单词和/或相应的数据库条目。 对于重复的话语,对由一个或多个识别器识别的相应字序列(和至少一些)和相关联的声学数据进行联合概率分析。 例如,可以在分析中使用生成概率模型或最大熵模型。 第二个发音可以是使用精确的单词或相对于第一个发音的其他结构变换的第一个发音的重复,例如添加一个或多个单词的扩展,删除一个或多个单词的截断或整个 或一个或多个单词的部分拼写。
-
公开(公告)号:US20080281806A1
公开(公告)日:2008-11-13
申请号:US11746847
申请日:2007-05-10
申请人: Ye-Yi Wang , Dong Yu , Yun-Cheng Ju , Alejandro Acero , Geoffrey G. Zweig
发明人: Ye-Yi Wang , Dong Yu , Yun-Cheng Ju , Alejandro Acero , Geoffrey G. Zweig
IPC分类号: G06F17/30
CPC分类号: G06F17/30663 , G06F3/0641 , G06F17/3069 , G10L15/187 , G10L15/197
摘要: A database having listings rather than long documents is searched using a term frequency-inverse document frequency (Tf/Idf) algorithm.
摘要翻译: 使用术语频率 - 逆文档频率(Tf / Idf)算法搜索具有列表而不是长文档的数据库。
-
公开(公告)号:US20100076765A1
公开(公告)日:2010-03-25
申请号:US12233826
申请日:2008-09-19
申请人: Geoffrey G. Zweig , Xiao Li , Dan Bohus , Alejandro Acero , Eric J. Horvitz
发明人: Geoffrey G. Zweig , Xiao Li , Dan Bohus , Alejandro Acero , Eric J. Horvitz
IPC分类号: G10L15/00
CPC分类号: G10L15/1822
摘要: Described is a technology by which a structured model of repetition is used to determine the words spoken by a user, and/or a corresponding database entry, based in part on a prior utterance. For a repeated utterance, a joint probability analysis is performed on (at least some of) the corresponding word sequences as recognized by one or more recognizers) and associated acoustic data. For example, a generative probabilistic model, or a maximum entropy model may be used in the analysis. The second utterance may be a repetition of the first utterance using the exact words, or another structural transformation thereof relative to the first utterance, such as an extension that adds one or more words, a truncation that removes one or more words, or a whole or partial spelling of one or more words.
摘要翻译: 描述了一种技术,通过该技术,部分地基于先前的话语,使用结构化重复模型来确定用户说出的单词和/或相应的数据库条目。 对于重复的话语,对由一个或多个识别器识别的相应字序列(和至少一些)和相关联的声学数据进行联合概率分析。 例如,可以在分析中使用生成概率模型或最大熵模型。 第二个发音可以是使用精确的单词或相对于第一个发音的其他结构变换的第一个发音的重复,例如添加一个或多个单词的扩展,删除一个或多个单词的截断或整个 或一个或多个单词的部分拼写。
-
公开(公告)号:US09218412B2
公开(公告)日:2015-12-22
申请号:US11746847
申请日:2007-05-10
申请人: Ye-Yi Wang , Dong Yu , Yun-Cheng Ju , Alejandro Acero , Geoffrey G. Zweig
发明人: Ye-Yi Wang , Dong Yu , Yun-Cheng Ju , Alejandro Acero , Geoffrey G. Zweig
IPC分类号: G06F7/00 , G06F17/30 , G06F3/06 , G10L15/187 , G10L15/197
CPC分类号: G06F17/30663 , G06F3/0641 , G06F17/3069 , G10L15/187 , G10L15/197
摘要: A database having listings rather than long documents is searched using a term frequency-inverse document frequency (Tf/Idf) algorithm.
摘要翻译: 使用术语频率 - 逆文档频率(Tf / Idf)算法搜索具有列表而不是长文档的数据库。
-
公开(公告)号:US09054764B2
公开(公告)日:2015-06-09
申请号:US13187235
申请日:2011-07-20
申请人: Ivan Tashev , Alejandro Acero
发明人: Ivan Tashev , Alejandro Acero
CPC分类号: H04B7/0854
摘要: A novel beamforming post-processor technique with enhanced noise suppression capability. The present beamforming post-processor technique is a non-linear post-processing technique for sensor arrays (e.g., microphone arrays) which improves the directivity and signal separation capabilities. The technique works in so-called instantaneous direction of arrival space, estimates the probability for sound coming from a given incident angle or look-up direction and applies a time-varying, gain based, spatio-temporal filter for suppressing sounds coming from directions other than the sound source direction, resulting in minimal artifacts and musical noise.
摘要翻译: 一种具有增强噪声抑制能力的新型波束成形后处理器技术。 本波束形成后处理器技术是用于传感器阵列(例如麦克风阵列)的非线性后处理技术,其改善了方向性和信号分离能力。 该技术在所谓的瞬时到达空间方向上工作,估计来自给定入射角或查找方向的声音的概率,并且应用时间变化的基于增益的时空滤波器来抑制来自其他方向的声音 比声源方向,导致最小的伪影和音乐噪音。
-
公开(公告)号:US08818797B2
公开(公告)日:2014-08-26
申请号:US12978197
申请日:2010-12-23
IPC分类号: G10L21/00
CPC分类号: G10L19/005 , G10L15/02 , G10L19/20 , G10L21/038 , G10L2019/0001
摘要: This document describes various techniques for dual-band speech encoding. In some embodiments, a first type of speech feature is received from a remote entity, an estimate of a second type of speech feature is determined based on the first type of speech feature, the estimate of the second type of speech feature is provided to a speech recognizer, speech-recognition results based on the estimate of the second type of speech feature are received from the speech recognizer, and the speech-recognition results are transmitted to the remote entity.
摘要翻译: 本文件描述了用于双频语音编码的各种技术。 在一些实施例中,从远程实体接收第一类型的语音特征,基于第一类型的语音特征来确定第二类型的语音特征的估计,将第二类型的语音特征的估计提供给 语音识别器,从语音识别器接收基于第二类型语音特征的估计的语音识别结果,将语音识别结果发送到远程实体。
-
公开(公告)号:US08818002B2
公开(公告)日:2014-08-26
申请号:US13187618
申请日:2011-07-21
申请人: Ivan Tashev , Alejandro Acero , Byung-Jun Yoon
发明人: Ivan Tashev , Alejandro Acero , Byung-Jun Yoon
CPC分类号: G01S3/86 , H04B7/0854 , H04R3/005 , H04R2430/20
摘要: A novel adaptive beamforming technique with enhanced noise suppression capability. The technique incorporates the sound-source presence probability into an adaptive blocking matrix. In one embodiment the sound-source presence probability is estimated based on the instantaneous direction of arrival of the input signals and voice activity detection. The technique guarantees robustness to steering vector errors without imposing ad hoc constraints on the adaptive filter coefficients. It can provide good suppression performance for both directional interference signals as well as isotropic ambient noise.
摘要翻译: 一种具有增强噪声抑制能力的新型自适应波束成形技术。 该技术将声源存在概率纳入自适应阻塞矩阵。 在一个实施例中,基于输入信号的瞬时到达方向和语音活动检测来估计声源存在概率。 该技术保证对导向矢量误差的鲁棒性,而不会对自适应滤波器系数施加自组织约束。 它可以为双向干扰信号以及各向同性环境噪声提供良好的抑制性能。
-
-
-
-
-
-
-
-
-