Methods and apparatuses for automatic speech recognition
    11.
    发明授权
    Methods and apparatuses for automatic speech recognition 有权
    自动语音识别的方法和装置

    公开(公告)号:US09431006B2

    公开(公告)日:2016-08-30

    申请号:US12497511

    申请日:2009-07-02

    摘要: Exemplary embodiments of methods and apparatuses for automatic speech recognition are described. First model parameters associated with a first representation of an input signal are generated. The first representation of the input signal is a discrete parameter representation. Second model parameters associated with a second representation of the input signal are generated. The second representation of the input signal includes a continuous parameter representation of residuals of the input signal. The first representation of the input signal includes discrete parameters representing first portions of the input signal. The second representation includes discrete parameters representing second portions of the input signal that are smaller than the first portions. Third model parameters are generated to couple the first representation of the input signal with the second representation of the input signal. The first representation and the second representation of the input signal are mapped into a vector space.

    摘要翻译: 描述用于自动语音识别的方法和装置的示例性实施例。 产生与输入信号的第一表示相关联的第一模型参数。 输入信号的第一个表示是离散参数表示。 产生与输入信号的第二表示相关联的第二模型参数。 输入信号的第二表示包括输入信号的残差的连续参数表示。 输入信号的第一表示包括表示输入信号的第一部分的离散参数。 第二表示包括表示输入信号的小于第一部分的第二部分的离散参数。 产生第三模型参数以将输入信号的第一表示与输入信号的第二表示耦合。 输入信号的第一表示和第二表示被映射到向量空间中。

    Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis
    12.
    发明授权
    Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis 失效
    用于文本到语音合成的组合统计和规则的词性标签

    公开(公告)号:US08719006B2

    公开(公告)日:2014-05-06

    申请号:US12870542

    申请日:2010-08-27

    IPC分类号: G06F17/27 G06F17/20 G06F17/21

    CPC分类号: G10L13/02 G10L13/10

    摘要: In response to a word of a text sequence, a first part-of-speech (POS) tag is generated using a statistical part-of-speech (POS) tagger based on a corpus of trained text sequences, each representing a likely POS of a word for a given text sequence. A second POS tag is generated using a rule-based POS tagger based on a set of one or more rules associated with a type of an application associated with the text sequence. A final POS tag is assigned to the word of the text sequence for TTS synthesis based on the first POS tag and the second POS tag.

    摘要翻译: 响应于文本序列的单词,使用基于经训练的文本序列的语料库的统计语音(POS)标签器来生成第一语音(POS)标签,每个表示可能的POS 给定文本序列的一个单词。 使用基于规则的POS标签器基于与与文本序列相关联的应用的类型相关联的一个或多个规则的集合来生成第二POS标签。 基于第一POS标签和第二POS标签,将最终的POS标签分配给用于TTS合成的文本序列的单词。

    Unsupervised document clustering using latent semantic density analysis
    13.
    发明授权
    Unsupervised document clustering using latent semantic density analysis 有权
    使用潜在语义密度分析的无监督文档聚类

    公开(公告)号:US08713021B2

    公开(公告)日:2014-04-29

    申请号:US12831909

    申请日:2010-07-07

    IPC分类号: G06F17/30

    摘要: According to one embodiment, a latent semantic mapping (LSM) space is generated from a collection of a plurality of documents, where the LSM space includes a plurality of document vectors, each representing one of the documents in the collection. For each of the document vectors considered as a centroid document vector, a group of document vectors is identified in the LSM space that are within a predetermined hypersphere diameter from the centroid document vector. As a result, multiple groups of document vectors are formed. The predetermined hypersphere diameter represents a predetermined closeness measure among the document vectors in the LSM space. Thereafter, a group from the plurality of groups is designated as a cluster of document vectors, where the designated group contains a maximum number of document vectors among the plurality of groups.

    摘要翻译: 根据一个实施例,从多个文档的集合生成潜在语义映射(LSM)空间,其中LSM空间包括多个文档向量,每个文档向量表示集合中的文档之一。 对于被认为是质心文档向量的每个文档向量,在LSM空间中识别出一组文档向量,其位于距重心文档向量的预定超球直径内。 结果,形成了多组文档向量。 预定的超球直径表示LSM空间中的文档向量中的预定的接近度量度。 此后,将来自多个组的组指定为文档向量的集合,其中指定组在多个组中包含最大数量的文档向量。

    Method for dynamic context scope selection in hybrid N-GRAM+LSA language modeling
    14.
    发明授权
    Method for dynamic context scope selection in hybrid N-GRAM+LSA language modeling 有权
    混合N-GRAM + LSA语言建模中动态上下文范围选择的方法

    公开(公告)号:US07720673B2

    公开(公告)日:2010-05-18

    申请号:US11710098

    申请日:2007-02-23

    IPC分类号: G06F17/20

    摘要: A method and system for dynamic language modeling of a document are described. In one embodiment, a number of local probabilities of a current document are computed and a vector representation of the current document in a latent semantic analysis (LSA) space is determined. In addition, a number of global probabilities based upon the vector representation of the current document in an LSA space is computed. Further, the local probabilities and the global probabilities are combined to produce the language modeling.

    摘要翻译: 描述了用于文档的动态语言建模的方法和系统。 在一个实施例中,计算当前文档的多个局部概率,并确定潜在语义分析(LSA)空间中当前文档的向量表示。 此外,计算出基于LSA空间中的当前文档的向量表示的多个全局概率。 此外,组合局部概率和全局概率以产生语言建模。

    Unsupervised data-driven pronunciation modeling
    15.
    发明授权
    Unsupervised data-driven pronunciation modeling 失效
    无监督的数据驱动的发音建模

    公开(公告)号:US07702509B2

    公开(公告)日:2010-04-20

    申请号:US11603586

    申请日:2006-11-21

    IPC分类号: G10L13/04

    CPC分类号: G10L15/187 G10L15/063

    摘要: Pronunciation for an input word is modeled by generating a set of candidate phoneme strings having pronunciations close to the input word in an orthographic space. Phoneme sub-strings in the set are selected as the pronunciation. In one aspect, a first closeness measure between phoneme strings for words chosen from a dictionary and contexts within the input word is used to determine the candidate phoneme strings. The words are chosen from the dictionary based on a second closeness measure between a representation of the input word in the orthographic space and orthographic anchors corresponding to the words in the dictionary. In another aspect, the phoneme sub-strings are selected by aligning the candidate phoneme strings on common phoneme sub-strings to produce an occurrence count, which is used to choose the phoneme sub-strings for the pronunciation.

    摘要翻译: 通过在正交空间中生成具有接近输入字的发音的候选音素串的集合来建模输入字的发音。 选择音色中的音素子串作为发音。 在一个方面,用于从字典中选择的词语的音素字符串和输入单词内的上下文之间的第一接近度量度用于确定候选音素字符串。 基于字典中的输入字的表示和对应于字典中的单词的正字拼图之间的第二接近度测量,从字典中选择词。 在另一方面,通过将候选音素串对准在公共音素子串上以产生一个出现次数来选择音素子串,该数目用于选择发音的音素子串。

    Method for dynamic context scope selection in hybrid n-gram+LSA language modeling
    16.
    发明授权
    Method for dynamic context scope selection in hybrid n-gram+LSA language modeling 有权
    混合n-gram + LSA语言建模中动态上下文范围选择的方法

    公开(公告)号:US06477488B1

    公开(公告)日:2002-11-05

    申请号:US09523070

    申请日:2000-03-10

    IPC分类号: G06F1720

    摘要: A method and system for dynamic language modeling of a document are described. In one embodiment, a number of local probabilities of a current document are computed and a vector representation of the current document in a latent semantic analysis (LSA) space is determined. In addition, a number of global probabilities based upon the vector representation of the current document in an LSA space is computed. Further, the local probabilities and the global probabilities are combined to produce the language modeling.

    摘要翻译: 描述了用于文档的动态语言建模的方法和系统。 在一个实施例中,计算当前文档的多个局部概率,并确定潜在语义分析(LSA)空间中当前文档的向量表示。 此外,计算出基于LSA空间中的当前文档的向量表示的多个全局概率。 此外,组合局部概率和全局概率以产生语言建模。

    Fast update implementation for efficient latent semantic language modeling
    17.
    发明授权
    Fast update implementation for efficient latent semantic language modeling 有权
    快速更新实现高效潜在语义语言建模

    公开(公告)号:US06374217B1

    公开(公告)日:2002-04-16

    申请号:US09267334

    申请日:1999-03-12

    IPC分类号: G10L1514

    CPC分类号: G10L15/1815 G10L15/197

    摘要: Speech or acoustic signals are processed directly using a hybrid stochastic language model produced by integrating a latent semantic analysis language model into an n-gram probability language model. The latent semantic analysis language model probability is computed using a first pseudo-document vector that is derived from a second pseudo-document vector with the pseudo-document vectors representing pseudo-documents created from the signals received at different times. The first pseudo-document vector is derived from the second pseudo-document vector by updating the second pseudo-document vector directly in latent semantic analysis space in response to at least one addition of a candidate word of the received speech signals to the pseudo-document represented by the second pseudo-document vector. Updating precludes mapping a sparse representation for a pseudo-document into the latent semantic space to produce the first pseudo-document vector. A linguistic message representative of the received speech signals is generated.

    摘要翻译: 使用通过将潜在语义分析语言模型集成到n-gram概率语言模型中产生的混合随机语言模型直接处理语音或声信号。 使用从第二伪文档向量导出的第一伪文档向量计算潜在语义分析语言模型概率,其中伪文档向量表示从在不同时间接收的信号创建的伪文档。 通过响应于接收到的语音信号的候选词的至少一个添加到伪文档,在第一伪文档向量中直接在潜在语义分析空间中更新第二伪文档向量,从第二伪文档向量导出第一伪文档向量 由第二伪文档向量表示。 更新排除了将伪文档的稀疏表示映射到潜在语义空间中以产生第一伪文档向量。 产生代表接收到的语音信号的语言消息。

    Automatic handwriting recognition using both static and dynamic
parameters
    18.
    发明授权
    Automatic handwriting recognition using both static and dynamic parameters 失效
    使用静态和动态参数自动手写识别

    公开(公告)号:US5544264A

    公开(公告)日:1996-08-06

    申请号:US451001

    申请日:1995-05-25

    摘要: Methods and apparatus are disclosed for recognizing handwritten characters in response to an input signal from a handwriting transducer. A feature extraction and reduction procedure is disclosed that relies on static or shape information, wherein the temporal order in which points are captured by an electronic tablet may be disregarded. A method of the invention generates and processes the tablet data with three independent sets of feature vectors which encode the shape information of the input character information. These feature vectors include horizontal (x-axis) and vertical (y-axis) slices of a bit-mapped image of the input character data, and an additional feature vector to encode an absolute y-axis displacement from a baseline of the bit-mapped image. It is shown that the recognition errors that result from the spatial or static processing are quite different from those resulting from temporal or dynamic processing. Furthermore, it is shown that these differences complement one another. As a result, a combination of these two sources of feature vector information provides a substantial reduction in an overall recognition error rate. Methods to combine probability scores from dynamic and the static character models are also disclosed.

    摘要翻译: 公开了用于响应于来自手写传感器的输入信号识别手写字符的方法和装置。 公开了一种依赖于静态或形状信息的特征提取和缩减过程,其中可以忽略由电子平板电脑捕获点的时间顺序。 本发明的方法利用编码输入字符信息的形状信息的三个独立的特征向量组来生成和处理图形输入板数据。 这些特征向量包括输入字符数据的位映射图像的水平(x轴)和垂直(y轴)切片,以及附加特征向量,用于编码从比特映射图像的基线的绝对y轴位移。 映射图像。 显示由空间或静态处理产生的识别错误与由时间或动态处理产生的识别错误截然不同。 此外,这表明这些差异相互补充。 结果,这两个特征向量信息源的组合提供了总体识别错误率的显着降低。 还公开了从动态和静态字符模型组合概率分数的方法。

    Automatic recognition of a consistent message using multiple
complimentary sources of information
    19.
    发明授权
    Automatic recognition of a consistent message using multiple complimentary sources of information 失效
    使用多个免费信息来自动识别一致的消息

    公开(公告)号:US5502774A

    公开(公告)日:1996-03-26

    申请号:US300232

    申请日:1994-09-06

    摘要: A general approach is provided for the combined use of several sources of information in the automatic recognition of a consistent message. For each message unit (e.g., word) the total likelihood score is assumed to be the weighted sum of the likelihood scores resulting from the separate evaluation of each information source. Emphasis is placed on the estimation of weighing factors used in forming this total likelihood. This method can be applied, for example, to the decoding of a consistent message using both handwriting and speech recognition. The present invention includes three procedures which provide the optimal weighing coefficients.

    摘要翻译: 提供了一种通用方法,用于在一致的消息的自动识别中组合使用多种信息源。 对于每个消息单元(例如,单词),总概率分数被假设为由每个信息源的单独评估得到的似然分数的加权和。 强调用于形成这种总可能性的称重因子的估计。 该方法例如可以应用于使用手写和语音识别两者的一致消息的解码。 本发明包括提供最佳称重系数的三个步骤。

    Automatic handwriting recognition using both static and dynamic
parameters

    公开(公告)号:US5491758A

    公开(公告)日:1996-02-13

    申请号:US009515

    申请日:1993-01-27

    摘要: Methods and apparatus are disclosed for recognizing handwritten characters in response to an input signal from a handwriting transducer. A feature extraction and reduction procedure is disclosed that relies on static or shape information, wherein the temporal order in which points are captured by an electronic tablet may be disregarded. A method of the invention generates and processes the tablet data with three independent sets of feature vectors which encode the shape information of the input character information. These feature vectors include horizontal (x-axis) and vertical (y-axis) slices of a bit-mapped image of the input character data, and an additional feature vector to encode an absolute y-axis displacement from a baseline of the bit-mapped image. It is shown that the recognition errors that result from the spatial or static processing are quite different from those resulting from temporal or dynamic processing. Furthermore, it is shown that these differences complement one another. As a result, a combination of these two sources of feature vector information provides a substantial reduction in an overall recognition error rate. Methods to combine probability scores from dynamic and the static character models are also disclosed.