专利检索 ap:("Ponani S. Gopalakrishnan" OR "David Nahamoo" OR "Mukund Padmanabhan" OR "Michael Alan Picheny") AND inv:"David Nahamoo" 第 1 页

1.

发明授权
Method and apparatus for estimating phone class probabilities a-posteriori using a decision tree 失效
标题翻译：用于使用决策树估计电话类概率的方法和装置

公开(公告)号：US5680509A

公开(公告)日：1997-10-21

申请号：US312584

申请日：1994-09-27

申请人： Ponani S. Gopalakrishnan , David Nahamoo , Mukund Padmanabhan , Michael Alan Picheny

发明人： Ponani S. Gopalakrishnan , David Nahamoo , Mukund Padmanabhan , Michael Alan Picheny

IPC分类号： G10L15/06 , G10L15/08 , G10L5/06

CPC分类号： G10L15/063 , G10L15/08

摘要： A method and apparatus for estimating the probability of phones, a-posteriori, in the context of not only the acoustic feature at that time, but also the acoustic features in the vicinity of the current time, and its use in cutting down the search-space in a speech recognition system. The method constructs and uses a decision tree, with the predictors of the decision tree being the vector-quantized acoustic feature vectors at the current time, and in the vicinity of the current time. The process starts with an enumeration of all (predictor, class) events in the training data at the root node, and successively partitions the data at a node according to the most informative split at that node. An iterative algorithm is used to design the binary partitioning. After the construction of the tree is completed, the probability distribution of the predicted class is stored at all of its terminal leaves. The decision tree is used during the decoding process by tracing a path down to one of its leaves, based on the answers to binary questions about the vector-quantized acoustic feature vector at the current time and its vicinity.

摘要翻译： 在不仅在当时的声学特征以及当前时间附近的声学特征的上下文中估计电话的概率的方法和装置，以及其用于减少搜索 - 语音识别系统中的空间。该方法构造并使用决策树，其中决策树的预测变量是当前时间和当前时间附近的矢量量化的声学特征向量。该过程从在根节点的训练数据中的所有（预测器，类）事件的枚举开始，并且根据该节点处的最多信息拆分在节点处依次划分数据。迭代算法用于设计二进制分区。树完成后，预测类的概率分布存储在其所有终端叶上。基于对当前时间及其附近的向量量化声学特征向量的二进制问题的答案，在解码过程中使用决策树通过跟踪到其叶子之一的路径。

2.

发明授权
Speech coding apparatus having speaker dependent prototypes generated from nonuser reference data 失效
标题翻译：具有由非用户参考数据生成的具有说话者依赖原型的语音编码装置

公开(公告)号：US5278942A

公开(公告)日：1994-01-11

申请号：US802678

申请日：1991-12-05

申请人： Lalit R. Bahl , Jerome R. Bellegarda , Peter V. De Souza , Ponani S. Gopalakrishnan , Arthur J. Nadas , David Nahamoo , Michael A. Picheny

发明人： Lalit R. Bahl , Jerome R. Bellegarda , Peter V. De Souza , Ponani S. Gopalakrishnan , Arthur J. Nadas , David Nahamoo , Michael A. Picheny

IPC分类号： G10L19/00 , G10L15/02 , G10L15/06 , G10L15/10 , G10L9/02

CPC分类号： G10L15/063 , G10L15/02

摘要： A speech coding apparatus and method for use in a speech recognition apparatus and method. The value of at least one feature of an utterance is measured during each of a series of successive time intervals to produce a series of feature vector signals representing the feature values. A plurality of prototype vector signals, each having at least one parameter value and a unique identification value are stored. The closeness of the feature vector signal is compared to the parameter values of the prototype vector signals to obtain prototype match scores for the feature value signal and each prototype vector signal. The identification value of the prototype vector signal having the best prototype match score is output as a coded representation signal of the feature vector signal. Speaker-dependent prototype vector signals are generated from both synthesized training vector signals and measured training vector signals. The synthesized training vector signals are transformed reference feature vector signals representing the values of features of one or more utterances of one or more speakers in a reference set of speakers. The measured training feature vector signals represent the values of features of one or more utterances of a new speaker/user not in the reference set.

摘要翻译： 一种用于语音识别装置和方法的语音编码装置和方法。在一系列连续时间间隔的每一个期间测量话音的至少一个特征的值，以产生表示特征值的一系列特征向量信号。存储多个具有至少一个参数值和唯一识别值的原型矢量信号。将特征矢量信号的接近度与原型矢量信号的参数值进行比较，以获得特征值信号和每个原型矢量信号的原型匹配分数。输出具有最佳原型匹配分数的原型矢量信号的识别值作为特征矢量信号的编码表示信号。从合成的训练矢量信号和测量的训练矢量信号产生与扬声器相关的原型矢量信号。合成的训练矢量信号是变换的参考特征矢量信号，其代表参考的一组扬声器中的一个或多个扬声器的一个或多个话音的特征值。测量的训练特征向量信号表示不在参考集合中的新的说话者/用户的一个或多个话语的特征值。

3.

发明授权
Speech coding apparatus and method for generating acoustic feature vector component values by combining values of the same features for multiple time intervals 失效
标题翻译：用于通过组合多个时间间隔的相同特征的值来生成声学特征矢量分量值的语音编码装置和方法

公开(公告)号：US5544277A

公开(公告)日：1996-08-06

申请号：US98682

申请日：1993-07-28

申请人： Raimo Bakis , Ponani S. Gopalakrishnan , Dimitri Kanevsky , Arthur J. Nadas , David Nahamoo , Michael A. Picheny , Jan Sedivy

发明人： Raimo Bakis , Ponani S. Gopalakrishnan , Dimitri Kanevsky , Arthur J. Nadas , David Nahamoo , Michael A. Picheny , Jan Sedivy

IPC分类号： G06F3/16 , G10L11/00 , G10L15/02 , G10L15/10 , G10L15/20 , H03M7/30 , G10L9/00

CPC分类号： G10L15/02 , G10L15/20

摘要： A speech coding apparatus and method measures the values of at least first and second different features of an utterance during each of a series of successive time intervals. For each time interval, a feature vector signal has a first component value equal to a first weighted combination of the values of only one feature of the utterance for at least two time intervals. The feature vector signal has a second component value equal to a second weighted combination, different from the first weighted combination, of the values of only one feature of the utterance for at least two time intervals. The resulting feature vector signals for a series of successive time intervals form a coded representation of the utterance. In one embodiment, a first weighted mixture signal has a value equal to a first weighted mixture of the values of the features of the utterance during a single time interval. A second weighted mixture signal has a value equal to a second weighted mixture, different from the first weighted mixture, of the values of the features of the utterance during a single time interval. The first component value of each feature vector signal is equal to a first weighted combination of the values of only the first weighted mixture signals for at least two time intervals, and the second component value of each feature vector signal is equal to a second weighted combination, different from the first weighted combination, of the values of only the second weighted mixture for at least two time intervals.

摘要翻译： 语音编码装置和方法在一系列连续时间间隔的每一个期间测量话音的至少第一和第二不同特征的值。对于每个时间间隔，特征向量信号具有等于至少两个时间间隔的仅一个特征的值的第一加权组合的第一分量值。特征向量信号具有等于至少两个时间间隔的话语的一个特征的值的等于第一加权组合的第二加权组合的第二分量值。所得到的一系列连续时间间隔的特征矢量信号形成话音的编码表示。在一个实施例中，第一加权混合信号具有等于在单个时间间隔期间话音特征值的第一加权混合的值。第二加权混合信号具有等于在单个时间间隔期间话音特征的值的与第一加权混合不同的第二加权混合的值。每个特征向量信号的第一分量值等于至少两个时间间隔的仅第一加权混合信号的值的第一加权组合，并且每个特征向量信号的第二分量值等于第二加权组合与第一加权组合不同的是仅至少两个时间间隔的第二加权混合值的值。

4.

发明授权
Speech coding apparatus and method using classification rules 失效
标题翻译：语音编码装置和方法使用分类规则

公开(公告)号：US5522011A

公开(公告)日：1996-05-28

申请号：US127392

申请日：1993-09-27

申请人： Mark E. Epstein , Ponani S. Gopalakrishnan , David Nahamoo , Michael A. Picheny , Jan Sedivy

发明人： Mark E. Epstein , Ponani S. Gopalakrishnan , David Nahamoo , Michael A. Picheny , Jan Sedivy

IPC分类号： G10L15/02 , G10L19/00 , G10L19/02 , H03M7/30 , H04B14/04 , G10L5/06

CPC分类号： G10L19/038

摘要： A speech coding apparatus and method uses classification rules to code an utterance while consuming fewer computing resources. The value of at least one feature of an utterance is measured during each of a series of successive time intervals to produce a series of feature vector signals representing the feature values. The classification rules comprise at least first and second sets of classification rules. The first set of classification rules map each feature vector signal from a set of all possible feature vector signals to exactly one of at least two disjoint subsets of feature vector signals. The second set of classification rules map each feature vector signal in a subset of feature vector signals to exactly one of at least two different classes of prototype vector signals. Each class contains a plurality of prototype vector signals. According to the classification rules, a first feature vector signal is mapped to a first class of prototype vector signals. The closeness of the feature value of the first feature vector signal is compared to the parameter values of only the prototype vector signals in the first class of prototype vector signals to obtain prototype match scores for the first feature vector signal and each prototype vector signal in the first class. At least the identification value of at least the prototype vector signal having the best prototype match score is output as a coded utterance representation signal of the first feature vector signal.

摘要翻译： 语音编码装置和方法使用分类规则来编码话语，同时消耗更少的计算资源。在一系列连续时间间隔的每一个期间测量话音的至少一个特征的值，以产生表示特征值的一系列特征向量信号。分类规则至少包括第一组和第二组分类规则。第一组分类规则将来自一组所有可能特征向量信号的每个特征向量信号映射到特征向量信号的至少两个不相交子集中的一个。第二组分类规则将特征向量信号的子集中的每个特征向量信号精确地映射到至少两个不同类型的原型矢量信号中的一个。每个类都包含多个原型矢量信号。根据分类规则，将第一特征向量信号映射到第一类原型矢量信号。将第一特征向量信号的特征值的接近度与仅第一类原型矢量信号中的原型矢量信号的参数值进行比较，以获得第一特征向量信号的原型匹配分数和一等课至少具有最佳原型匹配分数的原型矢量信号的识别值被输出为第一特征向量信号的编码话音表示信号。

5.

发明授权
Apparatus and method of grouping utterances of a phoneme into context-dependent categories based on sound-similarity for automatic speech recognition 失效
标题翻译：基于自动语音识别的声音相似性将音素的语音分组成上下文相关类别的装置和方法

公开(公告)号：US5195167A

公开(公告)日：1993-03-16

申请号：US871600

申请日：1992-04-17

申请人： Lalit R. Bahl , Peter V. De Souza , Ponani S. Gopalakrishnan , David Nahamoo , Michael A. Picheny

发明人： Lalit R. Bahl , Peter V. De Souza , Ponani S. Gopalakrishnan , David Nahamoo , Michael A. Picheny

IPC分类号： G06F7/38 , G06F17/27 , G10L11/00 , G10L15/02 , G10L15/06 , G10L15/10 , G10L15/18

CPC分类号： G10L15/063

摘要： Symbol feature values and contextual feature values of each event in a training set of events are measured. At least two pairs of complementary subsets of observed events are selected. In each pair of complementary subsets of observed events, one subset has contextual features with values in a set C.sub.n, and the other set has contextual features with values in a set C.sub.n, were the sets in C.sub.n and C.sub.n are complementary sets of contextual feature values. For each subset of observed events, the similarity values of the symbol features of the observed events in the subsets are calculated. For each pair of complementary sets of observed events, a "goodness of fit" is the sum of the symbol feature value similarity of the subsets. The sets of contextual feature values associated with the subsets of observed events having the best "goodness of fit" are identified and form context-dependent bases for grouping the observed events into two output sets.

摘要翻译： 测量训练集中的每个事件的符号特征值和上下文特征值。选择观察事件的至少两对互补子集。在观察事件的每对互补子集中，一个子集具有集合C n中的值的上下文特征，另一个集合具有集合Cn中的值的上下文特征，Cn和Cn中的集合是上下文特征值的互补集合。对于观察事件的每个子集，计算子集中观察事件的符号特征的相似度值。对于每对观察事件的互补集合，“拟合优度”是子集的符号特征值相似度的总和。识别与具有最佳“拟合优度”的观察事件的子集相关联的上下文特征值集合，并形成用于将观察到的事件分组为两个输出集合的上下文相关基础。

6.

发明授权
Method and apparatus for modeling words with multi-arc markov models 失效
标题翻译：用多模式MARKOV模型建模语言的方法和装置

公开(公告)号：US5129001A

公开(公告)日：1992-07-07

申请号：US514075

申请日：1990-04-25

申请人： Lalit R. Bahl , Jerome R. Bellegarda , Peter V. De Souza , Ponani S. Gopalakrishnan , David Nahamoo , Michael A. Picheny

发明人： Lalit R. Bahl , Jerome R. Bellegarda , Peter V. De Souza , Ponani S. Gopalakrishnan , David Nahamoo , Michael A. Picheny

IPC分类号： G06F7/00 , G06F17/18 , G10L15/06 , G10L15/10 , G10L15/14

CPC分类号： G10L15/144

摘要： Modeling a word is done by concatenating a series of elemental models to form a word model. At least one elemental model in the series is a composite elemental model formed by combining the starting states of at least first and second primitive elemental models. Each primitive elemental model represents a speech component. The primitive elemental models are combined by a weighted combination of their parameters in proportion to the values of the weighting factors. To tailor the word model to closely represent variations in the pronunciation of the word, the word is uttered a plurality of times by a plurality of different speakers. Constructing word models from composite elemental models, and constructing composite elemental models from primitive elemental models enables word models to represent many variations in the pronunciation of a word. Providing a relatively small set of primitive elemental models for a relatively large vocabulary of words enables models to be trained to the voice of a new speaker by having the new speaker utter only a small subset of the words in the vocabulary.

7.

发明授权
Hierarchical labeler in a speech recognition system 失效
标题翻译：语音识别系统中的分层标签器

公开(公告)号：US6023673A

公开(公告)日：2000-02-08

申请号：US869061

申请日：1997-06-04

申请人： Raimo Bakis , David Nahamoo , Michael Alan Picheny , Jan Sedivy

发明人： Raimo Bakis , David Nahamoo , Michael Alan Picheny , Jan Sedivy

IPC分类号： G10L5/06 , G10L9/00

CPC分类号： G10L15/083

摘要： A speech coding apparatus and method uses a hierarchy of prototype sets to code an utterance while consuming fewer computing resources. The value of at least one feature of an utterance is measured during each of a series of successive time intervals to produce a series of feature vector signals representing the feature values. A plurality of level subsets of prototype vector signals is computed, wherein each prototype vector signal in a higher level subset is associated with at least one prototype vector signal in a lower level subset. Each level subset contains a plurality of prototype vector signals, with lower level subsets containing more prototypes than higher level subsets. The closeness of the feature value of the first feature vector signal is compared to the parameter values of prototype vector signals in the first level subset of prototype vector signals to obtain a ranked list of prototype match scores for the first feature vector signal and each prototype vector signal in the first level subset. The closeness of the feature value of the first feature vector signal is compared to the parameter values of each prototype vector signal in a second (lower) level subset that is associated with the highest ranking prototype vectors in the first level subset, to obtain a second ranked list of prototype match scores. The identification value of the prototype vector signal in the second ranked list having the best prototype match score is output as a coded utterance representation signal of the first feature vector signal.

摘要翻译： 语音编码装置和方法使用原型集的层次来编码话语，同时消耗更少的计算资源。在一系列连续时间间隔的每一个期间测量话音的至少一个特征的值，以产生表示特征值的一系列特征向量信号。计算原型矢量信号的多个级别子集，其中较高级子集中的每个原型矢量信号与较低级子集中的至少一个原型矢量信号相关联。每个级别子集包含多个原型矢量信号，其中较低级子集包含比较高级子集更多的原型。将第一特征向量信号的特征值的接近度与原型矢量信号的第一级子集中的原型矢量信号的参数值进行比较，以获得第一特征向量信号和每个原型矢量的原型匹配分数的排序列表信号在第一级子集。将第一特征向量信号的特征值的接近度与与第一级子集中的最高排序原型向量相关联的第二（较低）级子集中的每个原型矢量信号的参数值进行比较，以获得第二排名榜的原型比赛得分。将具有最佳原型匹配分数的第二等级列表中的原型矢量信号的识别值输出为第一特征向量信号的编码话音表示信号。

8.

发明授权
Speech recognition utilizing multitude of speech features 失效
标题翻译：语音识别利用多种语音特征

公开(公告)号：US07464031B2

公开(公告)日：2008-12-09

申请号：US10724536

申请日：2003-11-28

申请人： Scott E. Axelrod , Sreeram Viswanath Balakrishnan , Stanley F. Chen , Yuging Gao , Ramesh A. Gopinath , Hong-Kwang Kuo , Benoit Maison , David Nahamoo , Michael Alan Picheny , George A. Saon , Geoffrey G. Zweig

发明人： Scott E. Axelrod , Sreeram Viswanath Balakrishnan , Stanley F. Chen , Yuging Gao , Ramesh A. Gopinath , Hong-Kwang Kuo , Benoit Maison , David Nahamoo , Michael Alan Picheny , George A. Saon , Geoffrey G. Zweig

IPC分类号： G10L15/00 , G10L15/20

CPC分类号： G10L15/063 , G10L15/02 , G10L15/14 , G10L2015/085

摘要： In a speech recognition system, the combination of a log-linear model with a multitude of speech features is provided to recognize unknown speech utterances. The speech recognition system models the posterior probability of linguistic units relevant to speech recognition using a log-linear model. The posterior model captures the probability of the linguistic unit given the observed speech features and the parameters of the posterior model. The posterior model may be determined using the probability of the word sequence hypotheses given a multitude of speech features. Log-linear models are used with features derived from sparse or incomplete data. The speech features that are utilized may include asynchronous, overlapping, and statistically non-independent speech features. Not all features used in training need to appear in testing/recognition.

摘要翻译： 在语音识别系统中，提供了具有多个语音特征的对数线性模型的组合来识别未知语音语音。语音识别系统使用对数线性模型对与语音识别相关的语言单位的后验概率进行建模。后验模型捕获了语言单位给出观察到的语音特征和后验模型参数的概率。可以使用给定多个语音特征的单词序列假设的概率来确定后验模型。对数线性模型与来自稀疏或不完整数据的特征一起使用。所使用的语音特征可以包括异步，重叠和统计上非独立的语音特征。培训中使用的并非所有功能都需要出现在测试/识别中。

9.

发明申请
SPEECH RECOGNITION UTILIZING MULTITUDE OF SPEECH FEATURES 审中-公开
标题翻译：语音识别利用多种语音特征

公开(公告)号：US20080312921A1

公开(公告)日：2008-12-18

申请号：US12195123

申请日：2008-08-20

申请人： Scott E. Axelrod , Sreeram Viswanath Balakrishnan , Stanley F. Chen , Yuging Gao , Rameah A. Gopinath , Hong-Kwang Kuo , Benoit Maison , David Nahamoo , Michael Alan Picheny , George A. Saon , Geoffrey G. Zweig

发明人： Scott E. Axelrod , Sreeram Viswanath Balakrishnan , Stanley F. Chen , Yuging Gao , Rameah A. Gopinath , Hong-Kwang Kuo , Benoit Maison , David Nahamoo , Michael Alan Picheny , George A. Saon , Geoffrey G. Zweig

IPC分类号： G10L15/00 , G10L15/04

CPC分类号： G10L15/063 , G10L15/02 , G10L15/14 , G10L2015/085

摘要： In a speech recognition system, the combination of a log-linear model with a multitude of speech features is provided to recognize unknown speech utterances. The speech recognition system models the posterior probability of linguistic units relevant to speech recognition using a log-linear model. The posterior model captures the probability of the linguistic unit given the observed speech features and the parameters of the posterior model. The posterior model may be determined using the probability of the word sequence hypotheses given a multitude of speech features. Log-linear models are used with features derived from sparse or incomplete data. The speech features that are utilized may include asynchronous, overlapping, and statistically non-independent speech features. Not all features used in training need to appear in testing/recognition.

摘要翻译： 在语音识别系统中，提供了具有多个语音特征的对数线性模型的组合来识别未知语音语音。语音识别系统使用对数线性模型对与语音识别相关的语言单位的后验概率进行建模。后验模型捕获了语言单位给出观察到的语音特征和后验模型参数的概率。可以使用给定多个语音特征的单词序列假设的概率来确定后验模型。对数线性模型与来自稀疏或不完整数据的特征一起使用。所使用的语音特征可以包括异步，重叠和统计上非独立的语音特征。培训中使用的并非所有功能都需要出现在测试/识别中。

10.

发明授权
Automatic indexing and aligning of audio and text using speech recognition 失效
标题翻译：使用语音识别自动索引和对齐音频和文本

公开(公告)号：US5649060A

公开(公告)日：1997-07-15

申请号：US547113

申请日：1995-10-23

申请人： Hamed A. Ellozy , Dimitri Kanevsky , Michelle Y. Kim , David Nahamoo , Michael Alan Picheny , Wlodek Wlodzimierz Zadrozny

发明人： Hamed A. Ellozy , Dimitri Kanevsky , Michelle Y. Kim , David Nahamoo , Michael Alan Picheny , Wlodek Wlodzimierz Zadrozny

IPC分类号： G03B31/00 , G06F17/30 , G10L15/00 , G10L15/18 , G10L15/22 , G10L15/26 , G11B27/028 , G11B27/10 , G11B27/28 , H04N5/91 , G10L9/00

CPC分类号： G11B27/28 , G06F17/30746 , G11B27/028 , G11B27/10

摘要： A method of automatically aligning a written transcript with speech in video and audio clips. The disclosed technique involves as a basic component an automatic speech recognizer. The automatic speech recognizer decodes speech (recorded on a tape) and produces a file with a decoded text. This decoded text is then matched with the original written transcript via identification of similar words or clusters of words. The results of this matching is an alignment of the speech with the original transcript. The method can be used (a) to create indexing of video clips, (b) for "teleprompting" (i.e. showing the next portion of text when someone is reading from a television screen), or (c) to enhance editing of a text that was dictated to a stenographer or recorded on a tape for its subsequent textual reproduction by a typist.

摘要翻译： 自动将书面誊本与视频和音频剪辑中的语音对齐的方法。所公开的技术涉及作为自动语音识别器的基本组件。自动语音识别器解码语音（记录在磁带上）并产生具有解码文本的文件。然后，通过识别类似的单词或单词集合，将该解码的文本与原始的书面记录相匹配。这种匹配的结果是语音与原始誊本的一致。该方法可用于（a）创建视频剪辑的索引，（b）“电视提示”（即，当有人从电视屏幕读取时显示文本的下一部分），或（c）增强文本的编辑这是由速记员决定的，或者录制在磁带上，以便打字员随后进行文字复制。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类