专利检索 ap:("Yuqing Gao" OR "Mukund Padmanabhan" OR "Michael Alan Picheny") AND inv:"Michael Alan Picheny" 第 2 页

11.

发明授权
Hierarchical labeler in a speech recognition system 失效
标题翻译：语音识别系统中的分层标签器

公开(公告)号：US6023673A

公开(公告)日：2000-02-08

申请号：US869061

申请日：1997-06-04

申请人： Raimo Bakis , David Nahamoo , Michael Alan Picheny , Jan Sedivy

发明人： Raimo Bakis , David Nahamoo , Michael Alan Picheny , Jan Sedivy

IPC分类号： G10L5/06 , G10L9/00

CPC分类号： G10L15/083

摘要： A speech coding apparatus and method uses a hierarchy of prototype sets to code an utterance while consuming fewer computing resources. The value of at least one feature of an utterance is measured during each of a series of successive time intervals to produce a series of feature vector signals representing the feature values. A plurality of level subsets of prototype vector signals is computed, wherein each prototype vector signal in a higher level subset is associated with at least one prototype vector signal in a lower level subset. Each level subset contains a plurality of prototype vector signals, with lower level subsets containing more prototypes than higher level subsets. The closeness of the feature value of the first feature vector signal is compared to the parameter values of prototype vector signals in the first level subset of prototype vector signals to obtain a ranked list of prototype match scores for the first feature vector signal and each prototype vector signal in the first level subset. The closeness of the feature value of the first feature vector signal is compared to the parameter values of each prototype vector signal in a second (lower) level subset that is associated with the highest ranking prototype vectors in the first level subset, to obtain a second ranked list of prototype match scores. The identification value of the prototype vector signal in the second ranked list having the best prototype match score is output as a coded utterance representation signal of the first feature vector signal.

摘要翻译： 语音编码装置和方法使用原型集的层次来编码话语，同时消耗更少的计算资源。在一系列连续时间间隔的每一个期间测量话音的至少一个特征的值，以产生表示特征值的一系列特征向量信号。计算原型矢量信号的多个级别子集，其中较高级子集中的每个原型矢量信号与较低级子集中的至少一个原型矢量信号相关联。每个级别子集包含多个原型矢量信号，其中较低级子集包含比较高级子集更多的原型。将第一特征向量信号的特征值的接近度与原型矢量信号的第一级子集中的原型矢量信号的参数值进行比较，以获得第一特征向量信号和每个原型矢量的原型匹配分数的排序列表信号在第一级子集。将第一特征向量信号的特征值的接近度与与第一级子集中的最高排序原型向量相关联的第二（较低）级子集中的每个原型矢量信号的参数值进行比较，以获得第二排名榜的原型比赛得分。将具有最佳原型匹配分数的第二等级列表中的原型矢量信号的识别值输出为第一特征向量信号的编码话音表示信号。

12.

发明申请
SPEECH RECOGNITION UTILIZING MULTITUDE OF SPEECH FEATURES 审中-公开
标题翻译：语音识别利用多种语音特征

公开(公告)号：US20080312921A1

公开(公告)日：2008-12-18

申请号：US12195123

申请日：2008-08-20

申请人： Scott E. Axelrod , Sreeram Viswanath Balakrishnan , Stanley F. Chen , Yuging Gao , Rameah A. Gopinath , Hong-Kwang Kuo , Benoit Maison , David Nahamoo , Michael Alan Picheny , George A. Saon , Geoffrey G. Zweig

发明人： Scott E. Axelrod , Sreeram Viswanath Balakrishnan , Stanley F. Chen , Yuging Gao , Rameah A. Gopinath , Hong-Kwang Kuo , Benoit Maison , David Nahamoo , Michael Alan Picheny , George A. Saon , Geoffrey G. Zweig

IPC分类号： G10L15/00 , G10L15/04

CPC分类号： G10L15/063 , G10L15/02 , G10L15/14 , G10L2015/085

摘要： In a speech recognition system, the combination of a log-linear model with a multitude of speech features is provided to recognize unknown speech utterances. The speech recognition system models the posterior probability of linguistic units relevant to speech recognition using a log-linear model. The posterior model captures the probability of the linguistic unit given the observed speech features and the parameters of the posterior model. The posterior model may be determined using the probability of the word sequence hypotheses given a multitude of speech features. Log-linear models are used with features derived from sparse or incomplete data. The speech features that are utilized may include asynchronous, overlapping, and statistically non-independent speech features. Not all features used in training need to appear in testing/recognition.

摘要翻译： 在语音识别系统中，提供了具有多个语音特征的对数线性模型的组合来识别未知语音语音。语音识别系统使用对数线性模型对与语音识别相关的语言单位的后验概率进行建模。后验模型捕获了语言单位给出观察到的语音特征和后验模型参数的概率。可以使用给定多个语音特征的单词序列假设的概率来确定后验模型。对数线性模型与来自稀疏或不完整数据的特征一起使用。所使用的语音特征可以包括异步，重叠和统计上非独立的语音特征。培训中使用的并非所有功能都需要出现在测试/识别中。

13.

发明授权
Statistical acoustic processing method and apparatus for speech recognition using a toned phoneme system 失效
标题翻译：使用音调音素系统进行语音识别的统计声学处理方法和装置

公开(公告)号：US5751905A

公开(公告)日：1998-05-12

申请号：US404786

申请日：1995-03-15

申请人： Chengjun Julian Chen , Ramesh Ambat Gopinath , Michael Daniel Monkowski , Michael Alan Picheny

发明人： Chengjun Julian Chen , Ramesh Ambat Gopinath , Michael Daniel Monkowski , Michael Alan Picheny

IPC分类号： G10L15/10 , G10L11/04 , G10L15/14 , G10L15/18 , G10L5/06

CPC分类号： G10L25/90 , G10L15/142 , G10L25/06 , G10L25/15

摘要： A method and apparatus for acoustic signal processing of speech recognition, the method comprising the following components: 1) Decompose each syllable into two phonemes of comparable length and complexity, the first one being a preme, and the second one being a toneme; 2) Each toneme is assigned a tone value such as high, rising, low, falling, and untoned; 3) No tone value is assigned to premes; 4) Pitch is detected continuously and treated the same way as energy and cepstrals in a Hidden Markov Model to predict the tone of a toneme; 5) The tone of a syllable is defined as the tone of its component toneme.

摘要翻译： 一种用于语音识别的声信号处理的方法和装置，所述方法包括以下部分：1）将每个音节分解成两个具有相当长度和复杂度的音素，第一个是preme，第二个音素是toneme; 2）每个toneme被分配一个音调值，如高，上升，低，下降和解除; 3）没有音调值被分配给premes; 4）在隐马尔科夫模型中，连续检测音调和能量和倒谱相同的方式来预测音调的音调; 5）音节的音调被定义为其音调的音调。

14.

发明授权
Automatic indexing and aligning of audio and text using speech recognition 失效
标题翻译：使用语音识别自动索引和对齐音频和文本

公开(公告)号：US5649060A

公开(公告)日：1997-07-15

申请号：US547113

申请日：1995-10-23

申请人： Hamed A. Ellozy , Dimitri Kanevsky , Michelle Y. Kim , David Nahamoo , Michael Alan Picheny , Wlodek Wlodzimierz Zadrozny

发明人： Hamed A. Ellozy , Dimitri Kanevsky , Michelle Y. Kim , David Nahamoo , Michael Alan Picheny , Wlodek Wlodzimierz Zadrozny

IPC分类号： G03B31/00 , G06F17/30 , G10L15/00 , G10L15/18 , G10L15/22 , G10L15/26 , G11B27/028 , G11B27/10 , G11B27/28 , H04N5/91 , G10L9/00

CPC分类号： G11B27/28 , G06F17/30746 , G11B27/028 , G11B27/10

摘要： A method of automatically aligning a written transcript with speech in video and audio clips. The disclosed technique involves as a basic component an automatic speech recognizer. The automatic speech recognizer decodes speech (recorded on a tape) and produces a file with a decoded text. This decoded text is then matched with the original written transcript via identification of similar words or clusters of words. The results of this matching is an alignment of the speech with the original transcript. The method can be used (a) to create indexing of video clips, (b) for "teleprompting" (i.e. showing the next portion of text when someone is reading from a television screen), or (c) to enhance editing of a text that was dictated to a stenographer or recorded on a tape for its subsequent textual reproduction by a typist.

摘要翻译： 自动将书面誊本与视频和音频剪辑中的语音对齐的方法。所公开的技术涉及作为自动语音识别器的基本组件。自动语音识别器解码语音（记录在磁带上）并产生具有解码文本的文件。然后，通过识别类似的单词或单词集合，将该解码的文本与原始的书面记录相匹配。这种匹配的结果是语音与原始誊本的一致。该方法可用于（a）创建视频剪辑的索引，（b）“电视提示”（即，当有人从电视屏幕读取时显示文本的下一部分），或（c）增强文本的编辑这是由速记员决定的，或者录制在磁带上，以便打字员随后进行文字复制。

15.

发明授权
Augmentation of alternate word lists by acoustic confusability criterion 有权
标题翻译：通过声学混淆标准来增加替代单词列表

公开(公告)号：US06754625B2

公开(公告)日：2004-06-22

申请号：US09746892

申请日：2000-12-26

申请人： Peder Andreas Olsen , Michael Alan Picheny , Harry W. Printz , Karthik Visweswariah

发明人： Peder Andreas Olsen , Michael Alan Picheny , Harry W. Printz , Karthik Visweswariah

IPC分类号： G10L1526

CPC分类号： G10L15/06 , G10L2015/0636

摘要： There is provided a method for augmenting an alternate word list generated by a speech recognition system. The alternate word list includes at least one potentially correct word for replacing a wrongly decoded word. The method includes the step of identifying at least one acoustically confusable word with respect to the wrongly decoded word. The alternate word list is augmented with the at least one acoustically confusable word.

摘要翻译： 提供了一种用于增强由语音识别系统生成的替代词汇表的方法。替代单词列表包括用于替换错误解码的单词的至少一个潜在正确的单词。该方法包括识别关于错误解码的字的至少一个声学上可混淆的词的步骤。替代单词列表用至少一个声学上可混淆的词增加。

16.

发明授权
Method and apparatus for a communication device for use by a hearing impaired/mute or deaf person or in silent environments 失效
标题翻译：用于由听力障碍/静音或聋人或无声环境使用的通信装置的方法和装置

公开(公告)号：US5995590A

公开(公告)日：1999-11-30

申请号：US35493

申请日：1998-03-05

申请人： Peter Thomas Brunet , Abraham P. Ittycheriah , Chandrasekhar Narayanaswami , Michael Alan Picheny , Bhuvana Ramabhadran

发明人： Peter Thomas Brunet , Abraham P. Ittycheriah , Chandrasekhar Narayanaswami , Michael Alan Picheny , Bhuvana Ramabhadran

IPC分类号： G10L13/00 , H04M1/247 , H04M11/00

CPC分类号： H04M1/2475 , G10L13/00

摘要： A method and apparatus is disclosed that allows people to carry on unobtrusive phone conversations in business or other settings where it is either not possible or impolite to talk. In the system of FIG. 1, the telephone user one will listen in the same manner as with a regular telephone. However, he will not speak into the telephone microphone. User one instead employs a unit including a keyboard to enter the text corresponding to what he wants to say. The text is converted into a synthesized speech using TTS apparatus and a voice output is sent to the microphone of the phone apparatus. The telephone apparatus transmits the synthesized voice signal over a standard telephone line to a unit including a conventional telephone speaker 26 and telephone microphone. User two, the party using the telephone at the other end, listens to a synthesized voice, but user one listens to the actual voice of user two with the telephone speaker, unless user two is also using a system similar to that of user one. Handwritten text may also be used in the system by employing a computer with a character recognition program as an input. In such a case handwriting is converted into synthesized sound and inputted into the telephone microphone. The telephone system can be used by the hearing impaired without involving a third party human transcriber.

摘要翻译： 公开了一种方法和装置，其允许人们在不可能或不礼貌地谈论的商业或其他设置中进行不引人注意的电话对话。在图1的系统中 1，电话用户将以与普通电话相同的方式收听。不过，他不会和电话麦克风说话。用户1使用包括键盘的单元来输入与他想说的对应的文本。使用TTS装置将文本转换成合成语音，并且将语音输出发送到电话装置的麦克风。电话设备通过标准电话线将合成语音信号发送到包括常规电话扬声器26和电话麦克风的单元。用户二，在另一端使用电话的一方侦听合成语音，但用户一听电话扬声器听用户二的实际语音，除非用户二也使用类似于用户二的系统。通过采用具有字符识别程序的计算机作为输入，也可以在系统中使用手写文本。在这种情况下，笔迹被转换成合成声音并被输入到电话麦克风。电话系统可以由听力障碍者使用，而不涉及第三方人类录音机。

17.

发明授权
Method and apparatus for improving acoustic fast match speed using a cache for phone probabilities 失效
标题翻译：用于使用用于电话概率的缓存来改善声速快速匹配速度的方法和装置

公开(公告)号：US5963905A

公开(公告)日：1999-10-05

申请号：US957200

申请日：1997-10-24

申请人： Miroslav Novak , Michael Alan Picheny

发明人： Miroslav Novak , Michael Alan Picheny

IPC分类号： G10L5/06

CPC分类号： G10L15/14 , G10L2015/081

摘要： Methods and apparatus for performing a tree search based acoustic fast match in a speech recognition system for decoding a speech utterance, the tree having a tree root and tree nodes connected by tree branches, the tree nodes having phonetic models associated therewith, are provided. An illustrative embodiment of the method comprises: providing a cache having cache cells for storing phone probabilities therein; selecting a first branch leading to a next node, said branch selection starting at the tree root; accessing the cache to select a particular cache cell where the probability of a particular match is stored; evaluating the phonetic model to obtain the probability and storing the probability and an associated end time in the cache cell, if the cache cell accessed in the accessing step does not contain the required probability; using the probability value and the associated end time stored in the cache cell, if the cache cell accessed in the accessing step contains the required probability; selecting a new branch to proceed to the next node; and iteratively continuing from the accessing step until the whole tree is traversed and all possible word candidates associated with the speech recognition system are evaluated.

摘要翻译： 提供了在用于解码语音话语的语音识别系统中执行基于树搜索的声学快速匹配的方法和装置，所述树具有树根和由树枝连接的树节点，所述树节点具有与之相关联的语音模型。该方法的说明性实施例包括：提供具有用于存储电话概率的高速缓存单元的高速缓存; 选择通向下一个节点的第一分支，所述分支选择从树根开始; 访问高速缓存以选择存储特定匹配的概率的特定高速缓存单元; 如果在访问步骤中访问的缓存单元不包含所需的概率，则评估语音模型以获得概率并将该概率和相关联的结束时间存储在高速缓存单元中; 如果在访问步骤中访问的高速缓存单元包含所需的概率，则使用存储在高速缓存单元中的概率值和相关联的结束时间; 选择一个新分支进行下一个节点; 并且从访问步骤迭代地继续，直到整个树被遍历，并且评估与语音识别系统相关联的所有可能的单词候选。

18.

发明授权
Method and apparatus for error correction in a continuous dictation system 失效
标题翻译：在连续口授系统中用于纠错的方法和装置

公开(公告)号：US5864805A

公开(公告)日：1999-01-26

申请号：US770390

申请日：1996-12-20

申请人： Chengjun Julian Chen , Liam David Comerford , Catalina Maria Danis , Satya Dharanipragada , Michael Daniel Monkowski , Peder Andreas Olsen , Michael Alan Picheny

发明人： Chengjun Julian Chen , Liam David Comerford , Catalina Maria Danis , Satya Dharanipragada , Michael Daniel Monkowski , Peder Andreas Olsen , Michael Alan Picheny

IPC分类号： G10L15/22 , G10L7/08

CPC分类号： G10L15/22

摘要： A continuous speech recognition system has the ability to correct errors in strings of words. The error correction method stores data in the system's internal state to update probability tables used in developing alternative lists for substitution in misrecognized text.

摘要翻译： 连续语音识别系统能够纠正字串中的错误。误差校正方法将数据存储在系统的内部状态中，以更新在替代列表中使用的概率表，以便在错误识别的文本中进行替换。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类