专利检索 ap:("Michael Lewis Seltzer" OR "Kaustubh Prakash Kalgaonkar" OR "Alejandro Acero") AND inv:"Alejandro Acero" 第 6 页

51.

发明申请
System and method for identifying semantic intent from acoustic information 有权

公开(公告)号：US20060129397A1

公开(公告)日：2006-06-15

申请号：US11009630

申请日：2004-12-10

申请人： Xiao Li , Asela Gunawardana , Alejandro Acero , Milind Mahajan , Dong Yu

发明人： Xiao Li , Asela Gunawardana , Alejandro Acero , Milind Mahajan , Dong Yu

IPC分类号： G10L15/06

CPC分类号： G10L15/19 , G10L15/1815

摘要： In accordance with one embodiment of the present invention, unanticipated semantic intents are discovered in audio data in an unsupervised manner. For instance, the audio acoustics are clustered based on semantic intent and representative acoustics are chosen for each cluster. The human then need only listen to a small number of representative acoustics for each cluster (and possibly only one per cluster) in order to identify the unforeseen semantic intents.

52.

发明授权
Including the category of environmental noise when processing speech signals 有权
标题翻译：包括处理语音信号时的环境噪声类别

公开(公告)号：US06959276B2

公开(公告)日：2005-10-25

申请号：US09965239

申请日：2001-09-27

申请人： James G. Droppo , Alejandro Acero , Li Deng

发明人： James G. Droppo , Alejandro Acero , Li Deng

IPC分类号： G10L15/20 , G10L21/02

CPC分类号： G10L21/0208 , G10L15/20 , G10L21/0216

摘要： A method and apparatus are provided for identifying a noise environment for a frame of an input signal based on at least one feature for that frame. Under one embodiment, the noise environment is identified by determining the probability of each of a set of possible noise environments. For some embodiments, the probabilities of the noise environments for past frames are included in the identification of an environment for a current frame. In one particular embodiment, a count is generated for each environment that indicates the number of past frames for which the environment was the most probable environment. The environment with the highest count is then selected as the environment for the current frame.

摘要翻译： 提供了一种方法和装置，用于基于该帧的至少一个特征来识别输入信号的帧的噪声环境。在一个实施例中，通过确定一组可能的噪声环境中的每一个的概率来识别噪声环境。对于一些实施例，过去帧的噪声环境的概率被包括在当前帧的环境的识别中。在一个具体实施例中，为指示环境是最可能的环境的过去帧的数量的每个环境生成计数。然后选择具有最高计数的环境作为当前帧的环境。

53.

发明申请
Method and apparatus for predicting word error rates from text 有权
标题翻译：用于从文本中预测字错误率的方法和装置

公开(公告)号：US20050228670A1

公开(公告)日：2005-10-13

申请号：US11146324

申请日：2005-06-06

申请人： Milind Mahajan , Yonggang Deng , Alejandro Acero , Asela Gunawardana , Ciprian Chelba

发明人： Milind Mahajan , Yonggang Deng , Alejandro Acero , Asela Gunawardana , Ciprian Chelba

IPC分类号： G10L15/28 , G10L13/08 , G10L15/00 , G10L15/06 , G10L15/14 , G10L15/18

CPC分类号： G10L15/197 , G10L15/183

摘要： A method of modeling a speech recognition system includes decoding a speech signal produced from a training text to produce a sequence of predicted speech units. The training text comprises a sequence of actual speech units that is used with the sequence of predicted speech units to form a confusion model. In further embodiments, the confusion model is used to decode a text to identify an error rate that would be expected if the speech recognition system decoded speech based on the text.

摘要翻译： 对语音识别系统进行建模的方法包括对从训练文本产生的语音信号进行解码以产生预测语音单元的序列。训练文本包括与预测语音单元的序列一起使用以形成混淆模型的实际语音单元的序列。在另外的实施例中，混淆模型用于对文本进行解码以识别如果语音识别系统基于文本解码的语音将会预期的错误率。

54.

发明申请
Head mounted multi-sensory audio input system 审中-公开
标题翻译：头戴式多声道音频输入系统

公开(公告)号：US20050033571A1

公开(公告)日：2005-02-10

申请号：US10636176

申请日：2003-08-07

申请人： Xuedong Huang , Zicheng Liu , Zhengyou Zhang , Michael Sinclair , Alejandro Acero

发明人： Xuedong Huang , Zicheng Liu , Zhengyou Zhang , Michael Sinclair , Alejandro Acero

IPC分类号： G10L11/02 , G10L15/20 , G10L15/24 , H04R1/10 , H04R1/14 , H04R25/00 , G10L15/00

CPC分类号： H04R1/14 , G10L15/20 , G10L15/24 , G10L25/78 , H04R1/083 , H04R1/1008 , H04R5/033 , H04R2460/13

摘要： The present invention combines a conventional audio microphone with an additional speech sensor that provides a speech sensor signal based on an input. The speech sensor signal is generated based on an action undertaken by a speaker during speech, such as facial movement, bone vibration, throat vibration, throat impedance changes, etc. A speech detector component receives an input from the speech sensor and outputs a speech detection signal indicative of whether a user is speaking. The speech detector generates the speech detection signal based on the microphone signal and the speech sensor signal.

摘要翻译： 本发明将常规音频麦克风与基于输入提供语音传感器信号的附加话音传感器组合。语音传感器信号基于语音中的扬声器在诸如面部运动，骨骼振动，喉部振动，喉部阻抗变化等中的动作而产生。语音检测器组件从语音传感器接收输入并输出语音检测指示用户是否正在说话的信号。语音检测器基于麦克风信号和语音传感器信号产生语音检测信号。

55.

发明授权
Rapid tree-based method for vector quantization 失效
标题翻译：用于矢量量化的快速基于树的方法

公开(公告)号：US5734791A

公开(公告)日：1998-03-31

申请号：US999354

申请日：1992-12-31

申请人： Alejandro Acero , Kai-Fu Lee , Yen-Lu Chow

发明人： Alejandro Acero , Kai-Fu Lee , Yen-Lu Chow

IPC分类号： G10L19/02 , G10L3/02

CPC分类号： G10L19/038

摘要： The branching decision for each node in a vector quantization (VQ) binary tree is made by a simple comparison of a pre-selected element of the candidate vector with a stored threshold resulting in a binary decision for reaching the next lower level. Each node has a preassigned element and threshold value. Conventional centroid distance training techniques (such as LBG and k-means) are used to establish code-book indices corresponding to a set of VQ centroids. The set of training vectors are used a second time to select a vector element and threshold value at each node that approximately splits the data evenly. After processing the training vectors through the binary tree using threshold decisions, a histogram is generated for each code-book index that represents the number of times a training vector belonging to a given index set appeared at each index. The final quantization is accomplished by processing and then selecting the nearest centroid belonging to that histogram. Accuracy comparable to that achieved by conventional binary tree VQ is realized but with almost a full magnitude increase in processing speed.

摘要翻译： 矢量量化（VQ）二叉树中的每个节点的分支决定是通过将候选矢量的预先选择的元素与存储的阈值进行简单比较而得到的，从而产生用于达到下一较低级别的二进制决定。每个节点具有预分配的元素和阈值。传统的质心距离训练技术（如LBG和k-means）用于建立与一组VQ质心相对应的代码簿索引。训练矢量集合被用于第二次在每个节点选择一个向量元素和阈值，每个节点大致分割数据。在通过使用阈值判定的二进制树处理训练向量之后，针对代表每个索引处出现的给定索引集的训练向量的次数的每个代码簿索引生成直方图。最后量化通过处理然后选择属于该直方图的最近质心来实现。实现与常规二叉树VQ实现的精度相当的精度，但处理速度几乎提高了一个全面的幅度。

56.

发明授权
Structured models of repetition for speech recognition 有权
标题翻译：用于语音识别的重复结构化模型

公开(公告)号：US08965765B2

公开(公告)日：2015-02-24

申请号：US12233826

申请日：2008-09-19

申请人： Geoffrey G. Zweig , Xiao Li , Dan Bohus , Alejandro Acero , Eric J. Horvitz

发明人： Geoffrey G. Zweig , Xiao Li , Dan Bohus , Alejandro Acero , Eric J. Horvitz

IPC分类号： G10L15/00 , G10L15/18

CPC分类号： G10L15/1822

摘要： Described is a technology by which a structured model of repetition is used to determine the words spoken by a user, and/or a corresponding database entry, based in part on a prior utterance. For a repeated utterance, a joint probability analysis is performed on (at least some of) the corresponding word sequences as recognized by one or more recognizers) and associated acoustic data. For example, a generative probabilistic model, or a maximum entropy model may be used in the analysis. The second utterance may be a repetition of the first utterance using the exact words, or another structural transformation thereof relative to the first utterance, such as an extension that adds one or more words, a truncation that removes one or more words, or a whole or partial spelling of one or more words.

摘要翻译： 描述了一种技术，通过该技术，部分地基于先前的话语，使用结构化重复模型来确定用户说出的单词和/或相应的数据库条目。对于重复的话语，对由一个或多个识别器识别的相应字序列（和至少一些）和相关联的声学数据进行联合概率分析。例如，可以在分析中使用生成概率模型或最大熵模型。第二个发音可以是使用精确的单词或相对于第一个发音的其他结构变换的第一个发音的重复，例如添加一个或多个单词的扩展，删除一个或多个单词的截断或整个或一个或多个单词的部分拼写。

57.

发明授权
Warped spectral and fine estimate audio encoding 有权
标题翻译：变形光谱和精细估计音频编码

公开(公告)号：US08532985B2

公开(公告)日：2013-09-10

申请号：US12959386

申请日：2010-12-03

申请人： Michael L. Seltzer , James G. Droppo , Henrique S. Malvar , Alejandro Acero , Xing Fan

发明人： Michael L. Seltzer , James G. Droppo , Henrique S. Malvar , Alejandro Acero , Xing Fan

IPC分类号： G10L15/02

CPC分类号： G10L15/02 , G10L15/30 , G10L19/02 , G10L19/0212

摘要： A warped spectral estimate of an original audio signal can be used to encode a representation of a fine estimate of the original signal. The representation of the warped spectral estimate and the representation of the fine estimate can be sent to a speech recognition system. The representation of the warped spectral estimate can be passed to a speech recognition engine, where it may be used for speech recognition. The representation of the warped spectral estimate can also be used along with the representation of the fine estimate to reconstruct a representation of the original audio signal.

摘要翻译： 可以使用原始音频信号的翘曲频谱估计来对原始信号的精细估计的表示进行编码。翘曲光谱估计的表示和精细估计的表示可以发送到语音识别系统。翘曲频谱估计的表示可以传递到语音识别引擎，其中它可以用于语音识别。翘曲频谱估计的表示也可以与精细估计的表示一起使用以重建原始音频信号的表示。

58.

发明授权
Acoustic echo suppression 有权
标题翻译：声回声抑制

公开(公告)号：US08325909B2

公开(公告)日：2012-12-04

申请号：US12145579

申请日：2008-06-25

申请人： Ivan J. Tashev , Alejandro Acero , Nilesh Madhu

发明人： Ivan J. Tashev , Alejandro Acero , Nilesh Madhu

IPC分类号： H04M9/08 , H04B3/20

CPC分类号： H04M9/082

摘要： Sound signals captured by a microphone are adjusted to provide improved sound quality. More particularly, an Acoustic Echo Reduction system which performs a first stage of echo reduction (e.g., acoustic echo cancellation) on a received signal is configured to perform a second stage of echo reduction (e.g., acoustic echo suppression) by segmenting the received signal into a plurality of frequency bins respectively comprised within a number of frames (e.g., 0.3 s to 0.5 s sound signal segments) for a given block. Data comprised within respective frequency bins is modeled according to a probability density function (e.g., Gaussian distribution). The probability of whether respective frequency bins comprise predominantly near-end signal or predominantly residual echo is calculated. The output of the acoustic echo suppression is computed as a product of the content of a frequency bin in a frame and the probability the frequency bin in a frame comprises predominantly near-end signal, thereby making near-end signals more prominent than residual echoes.

摘要翻译： 由麦克风捕获的声音信号进行调整，以提高音质。更具体地，在接收信号上执行回波减少的第一阶段（例如，声学回声消除）的声学回波减少系统被配置为通过将接收到的信号分段为进行回波减少的第二阶段（例如，声学回声抑制）分别包括在给定块的多个帧（例如，0.3s至0.5s的声音信号段）内的多个频率仓。根据概率密度函数（例如，高斯分布）对包含在相应频率仓内的数据进行建模。计算各个频率仓主要包括近端信号或主要是残余回波的概率。声波回声抑制的输出被计算为帧中的频率仓的内容与帧中的频率仓主要包含近端信号的概率的乘积，从而使近端信号比残余回波更突出。

59.

发明申请
DEEP CONVEX NETWORK WITH JOINT USE OF NONLINEAR RANDOM PROJECTION, RESTRICTED BOLTZMANN MACHINE AND BATCH-BASED PARALLELIZABLE OPTIMIZATION 有权
标题翻译：连续使用非线性随机投影，限制性BOLTZMANN机器和基于批量的平行优化的深层网络

公开(公告)号：US20120254086A1

公开(公告)日：2012-10-04

申请号：US13077978

申请日：2011-03-31

申请人： Li Deng , Dong Yu , Alejandro Acero

发明人： Li Deng , Dong Yu , Alejandro Acero

IPC分类号： G06N3/08

CPC分类号： G06N3/08 , G06N3/02 , G06N3/04 , G06N3/0454

摘要： A method is disclosed herein that includes an act of causing a processor to access a deep-structured, layered or hierarchical model, called deep convex network, retained in a computer-readable medium, wherein the deep-structured model comprises a plurality of layers with weights assigned thereto. This layered model can produce the output serving as the scores to combine with transition probabilities between states in a hidden Markov model and language model scores to form a full speech recognizer. The method makes joint use of nonlinear random projections and RBM weights, and it stacks a lower module's output with the raw data to establish its immediately higher module. Batch-based, convex optimization is performed to learn a portion of the deep convex network's weights, rendering it appropriate for parallel computation to accomplish the training. The method can further include the act of jointly substantially optimizing the weights, the transition probabilities, and the language model scores of the deep-structured model using the optimization criterion based on a sequence rather than a set of unrelated frames.

摘要翻译： 本文公开了一种方法，其包括使处理器访问被保留在计算机可读介质中的称为深凸网络的深层结构的分层或层次模型的动作，其中深层结构模型包括多个具有分配给它的权重。该分层模型可以产生作为分数的输出，以与隐藏的马尔可夫模型和语言模型分数中的状态之间的转移概率相结合，以形成完整的语音识别器。该方法联合使用非线性随机投影和RBM权重，并将较低模块的输出与原始数据叠加以建立其立即更高的模块。执行基于批次的凸优化来学习深凸网络权重的一部分，使其适合于并行计算以完成训练。该方法还可以包括使用基于序列而不是一组不相关帧的优化准则共同基本优化深层结构模型的权重，转移概率和语言模型分数的动作。

60.

发明申请
SEARCH LEXICON EXPANSION 有权
标题翻译：搜索LEXICON EXPANSION

公开(公告)号：US20120158703A1

公开(公告)日：2012-06-21

申请号：US12970477

申请日：2010-12-16

申请人： Xiao Li , Jingjing Liu , Alejandro Acero , Ye-Yi Wang

发明人： Xiao Li , Jingjing Liu , Alejandro Acero , Ye-Yi Wang

IPC分类号： G06F17/30

CPC分类号： G06F17/30737 , G06F17/2735 , G06F17/30693 , G06F17/30864

摘要： One or more techniques and/or systems are disclosed for creating an expanded or improved lexicon for use in search-based semantic tagging. A set of first documents can be identified using a set of first lexicon elements as queries, and one or more first document patterns can be extracted from the set of first documents. The document patterns can be used to find one or more second documents in a query log that comprise the document patterns, which are associated with query terms used to return the second documents. The query terms for the second documents can be extracted and used to expand the lexicon. Elements within the lexicon may be weighted based upon relevance to different query domains, for example.

摘要翻译： 公开了一种或多种技术和/或系统，用于创建用于基于搜索的语义标签中的扩展或改进的词典。可以使用一组第一词典元素作为查询来识别一组第一文档，并且可以从该组第一文档中提取一个或多个第一文档图案。文档模式可用于在查询日志中找到构成文档模式的一个或多个第二文档，这些文档模式与用于返回第二个文档的查询术语相关联。可以提取和使用第二个文档的查询条款来扩展词典。例如，词法中的元素可以基于与不同查询域的相关性来加权。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类