Including the category of environmental noise when processing speech signals
    52.
    发明授权
    Including the category of environmental noise when processing speech signals 有权
    包括处理语音信号时的环境噪声类别

    公开(公告)号:US06959276B2

    公开(公告)日:2005-10-25

    申请号:US09965239

    申请日:2001-09-27

    IPC分类号: G10L15/20 G10L21/02

    摘要: A method and apparatus are provided for identifying a noise environment for a frame of an input signal based on at least one feature for that frame. Under one embodiment, the noise environment is identified by determining the probability of each of a set of possible noise environments. For some embodiments, the probabilities of the noise environments for past frames are included in the identification of an environment for a current frame. In one particular embodiment, a count is generated for each environment that indicates the number of past frames for which the environment was the most probable environment. The environment with the highest count is then selected as the environment for the current frame.

    摘要翻译: 提供了一种方法和装置,用于基于该帧的至少一个特征来识别输入信号的帧的噪声环境。 在一个实施例中,通过确定一组可能的噪声环境中的每一个的概率来识别噪声环境。 对于一些实施例,过去帧的噪声环境的概率被包括在当前帧的环境的识别中。 在一个具体实施例中,为指示环境是最可能的环境的过去帧的数量的每个环境生成计数。 然后选择具有最高计数的环境作为当前帧的环境。

    Rapid tree-based method for vector quantization
    55.
    发明授权
    Rapid tree-based method for vector quantization 失效
    用于矢量量化的快速基于树的方法

    公开(公告)号:US5734791A

    公开(公告)日:1998-03-31

    申请号:US999354

    申请日:1992-12-31

    IPC分类号: G10L19/02 G10L3/02

    CPC分类号: G10L19/038

    摘要: The branching decision for each node in a vector quantization (VQ) binary tree is made by a simple comparison of a pre-selected element of the candidate vector with a stored threshold resulting in a binary decision for reaching the next lower level. Each node has a preassigned element and threshold value. Conventional centroid distance training techniques (such as LBG and k-means) are used to establish code-book indices corresponding to a set of VQ centroids. The set of training vectors are used a second time to select a vector element and threshold value at each node that approximately splits the data evenly. After processing the training vectors through the binary tree using threshold decisions, a histogram is generated for each code-book index that represents the number of times a training vector belonging to a given index set appeared at each index. The final quantization is accomplished by processing and then selecting the nearest centroid belonging to that histogram. Accuracy comparable to that achieved by conventional binary tree VQ is realized but with almost a full magnitude increase in processing speed.

    摘要翻译: 矢量量化(VQ)二叉树中的每个节点的分支决定是通过将​​候选矢量的预先选择的元素与存储的阈值进行简单比较而得到的,从而产生用于达到下一较低级别的二进制决定。 每个节点具有预分配的元素和阈值。 传统的质心距离训练技术(如LBG和k-means)用于建立与一组VQ质心相对应的代码簿索引。 训练矢量集合被用于第二次在每个节点选择一个向量元素和阈值,每个节点大致分割数据。 在通过使用阈值判定的二进制树处理训练向量之后,针对代表每个索引处出现的给定索引集的训练向量的次数的每个代码簿索引生成直方图。 最后量化通过处理然后选择属于该直方图的最近质心来实现。 实现与常规二叉树VQ实现的精度相当的精度,但处理速度几乎提高了一个全面的幅度。

    Structured models of repetition for speech recognition
    56.
    发明授权
    Structured models of repetition for speech recognition 有权
    用于语音识别的重复结构化模型

    公开(公告)号:US08965765B2

    公开(公告)日:2015-02-24

    申请号:US12233826

    申请日:2008-09-19

    IPC分类号: G10L15/00 G10L15/18

    CPC分类号: G10L15/1822

    摘要: Described is a technology by which a structured model of repetition is used to determine the words spoken by a user, and/or a corresponding database entry, based in part on a prior utterance. For a repeated utterance, a joint probability analysis is performed on (at least some of) the corresponding word sequences as recognized by one or more recognizers) and associated acoustic data. For example, a generative probabilistic model, or a maximum entropy model may be used in the analysis. The second utterance may be a repetition of the first utterance using the exact words, or another structural transformation thereof relative to the first utterance, such as an extension that adds one or more words, a truncation that removes one or more words, or a whole or partial spelling of one or more words.

    摘要翻译: 描述了一种技术,通过该技术,部分地基于先前的话语,使用结构化重复模型来确定用户说出的单词和/或相应的数据库条目。 对于重复的话语,对由一个或多个识别器识别的相应字序列(和至少一些)和相关联的声学数据进行联合概率分析。 例如,可以在分析中使用生成概率模型或最大熵模型。 第二个发音可以是使用精确的单词或相对于第一个发音的其他结构变换的第一个发音的重复,例如添加一个或多个单词的扩展,删除一个或多个单词的截断或整个 或一个或多个单词的部分拼写。

    Warped spectral and fine estimate audio encoding
    57.
    发明授权
    Warped spectral and fine estimate audio encoding 有权
    变形光谱和精细估计音频编码

    公开(公告)号:US08532985B2

    公开(公告)日:2013-09-10

    申请号:US12959386

    申请日:2010-12-03

    IPC分类号: G10L15/02

    摘要: A warped spectral estimate of an original audio signal can be used to encode a representation of a fine estimate of the original signal. The representation of the warped spectral estimate and the representation of the fine estimate can be sent to a speech recognition system. The representation of the warped spectral estimate can be passed to a speech recognition engine, where it may be used for speech recognition. The representation of the warped spectral estimate can also be used along with the representation of the fine estimate to reconstruct a representation of the original audio signal.

    摘要翻译: 可以使用原始音频信号的翘曲频谱估计来对原始信号的精细估计的表示进行编码。 翘曲光谱估计的表示和精细估计的表示可以发送到语音识别系统。 翘曲频谱估计的表示可以传递到语音识别引擎,其中它可以用于语音识别。 翘曲频谱估计的表示也可以与精细估计的表示一起使用以重建原始音频信号的表示。

    Acoustic echo suppression
    58.
    发明授权
    Acoustic echo suppression 有权
    声回声抑制

    公开(公告)号:US08325909B2

    公开(公告)日:2012-12-04

    申请号:US12145579

    申请日:2008-06-25

    IPC分类号: H04M9/08 H04B3/20

    CPC分类号: H04M9/082

    摘要: Sound signals captured by a microphone are adjusted to provide improved sound quality. More particularly, an Acoustic Echo Reduction system which performs a first stage of echo reduction (e.g., acoustic echo cancellation) on a received signal is configured to perform a second stage of echo reduction (e.g., acoustic echo suppression) by segmenting the received signal into a plurality of frequency bins respectively comprised within a number of frames (e.g., 0.3 s to 0.5 s sound signal segments) for a given block. Data comprised within respective frequency bins is modeled according to a probability density function (e.g., Gaussian distribution). The probability of whether respective frequency bins comprise predominantly near-end signal or predominantly residual echo is calculated. The output of the acoustic echo suppression is computed as a product of the content of a frequency bin in a frame and the probability the frequency bin in a frame comprises predominantly near-end signal, thereby making near-end signals more prominent than residual echoes.

    摘要翻译: 由麦克风捕获的声音信号进行调整,以提高音质。 更具体地,在接收信号上执行回波减少的第一阶段(例如,声学回声消除)的声学回波减少系统被配置为通过将接收到的信号分段为进行回波减少的第二阶段(例如,声学回声抑制) 分别包括在给定块的多个帧(例如,0.3s至0.5s的声音信号段)内的多个频率仓。 根据概率密度函数(例如,高斯分布)对包含在相应频率仓内的数据进行建模。 计算各个频率仓主要包括近端信号或主要是残余回波的概率。 声波回声抑制的输出被计算为帧中的频率仓的内容与帧中的频率仓主要包含近端信号的概率的乘积,从而使近端信号比残余回波更突出。

    DEEP CONVEX NETWORK WITH JOINT USE OF NONLINEAR RANDOM PROJECTION, RESTRICTED BOLTZMANN MACHINE AND BATCH-BASED PARALLELIZABLE OPTIMIZATION
    59.
    发明申请
    DEEP CONVEX NETWORK WITH JOINT USE OF NONLINEAR RANDOM PROJECTION, RESTRICTED BOLTZMANN MACHINE AND BATCH-BASED PARALLELIZABLE OPTIMIZATION 有权
    连续使用非线性随机投影,限制性BOLTZMANN机器和基于批量的平行优化的深层网络

    公开(公告)号:US20120254086A1

    公开(公告)日:2012-10-04

    申请号:US13077978

    申请日:2011-03-31

    IPC分类号: G06N3/08

    摘要: A method is disclosed herein that includes an act of causing a processor to access a deep-structured, layered or hierarchical model, called deep convex network, retained in a computer-readable medium, wherein the deep-structured model comprises a plurality of layers with weights assigned thereto. This layered model can produce the output serving as the scores to combine with transition probabilities between states in a hidden Markov model and language model scores to form a full speech recognizer. The method makes joint use of nonlinear random projections and RBM weights, and it stacks a lower module's output with the raw data to establish its immediately higher module. Batch-based, convex optimization is performed to learn a portion of the deep convex network's weights, rendering it appropriate for parallel computation to accomplish the training. The method can further include the act of jointly substantially optimizing the weights, the transition probabilities, and the language model scores of the deep-structured model using the optimization criterion based on a sequence rather than a set of unrelated frames.

    摘要翻译: 本文公开了一种方法,其包括使处理器访问被保留在计算机可读介质中的称为深凸网络的深层结构的分层或层次模型的动作,其中深层结构模型包括多个具有 分配给它的权重。 该分层模型可以产生作为分数的输出,以与隐藏的马尔可夫模型和语言模型分数中的状态之间的转移概率相结合,以形成完整的语音识别器。 该方法联合使用非线性随机投影和RBM权重,并将较低模块的输出与原始数据叠加以建立其立即更高的模块。 执行基于批次的凸优化来学习深凸网络权重的一部分,使其适合于并行计算以完成训练。 该方法还可以包括使用基于序列而不是一组不相关帧的优化准则共同基本优化深层结构模型的权重,转移概率和语言模型分数的动作。

    SEARCH LEXICON EXPANSION
    60.
    发明申请
    SEARCH LEXICON EXPANSION 有权
    搜索LEXICON EXPANSION

    公开(公告)号:US20120158703A1

    公开(公告)日:2012-06-21

    申请号:US12970477

    申请日:2010-12-16

    IPC分类号: G06F17/30

    摘要: One or more techniques and/or systems are disclosed for creating an expanded or improved lexicon for use in search-based semantic tagging. A set of first documents can be identified using a set of first lexicon elements as queries, and one or more first document patterns can be extracted from the set of first documents. The document patterns can be used to find one or more second documents in a query log that comprise the document patterns, which are associated with query terms used to return the second documents. The query terms for the second documents can be extracted and used to expand the lexicon. Elements within the lexicon may be weighted based upon relevance to different query domains, for example.

    摘要翻译: 公开了一种或多种技术和/或系统,用于创建用于基于搜索的语义标签中的扩展或改进的词典。 可以使用一组第一词典元素作为查询来识别一组第一文档,并且可以从该组第一文档中提取一个或多个第一文档图案。 文档模式可用于在查询日志中找到构成文档模式的一个或多个第二文档,这些文档模式与用于返回第二个文档的查询术语相关联。 可以提取和使用第二个文档的查询条款来扩展词典。 例如,词法中的元素可以基于与不同查询域的相关性来加权。