Method, apparatus, and system for building context dependent models for a large vocabulary continuous speech recognition (lvcsr) system
    1.
    发明申请
    Method, apparatus, and system for building context dependent models for a large vocabulary continuous speech recognition (lvcsr) system 有权
    用于为大型词汇连续语音识别(lvcsr)系统构建上下文相关模型的方法,装置和系统

    公开(公告)号:US20050228666A1

    公开(公告)日:2005-10-13

    申请号:US10332652

    申请日:2001-05-08

    CPC分类号: G10L15/187 G10L15/1815

    摘要: According to one aspect of the invention, a method is provided in which a set of multiple mixture monophone models is created and trained to generate a set of multiple mixture context dependent models. A set of single mixture triphone models is created and trained to generate a set of context dependent models. Corresponding states of the triphone models are clustered to obtain a set of tied states based on a decision tree clustering process. Parameters of the context dependent models are estimated using a data dependent maximum a posteriori (MAP) adaptation method in which parameters of the tied states of the context dependent models are derived by adapting corresponding parameters of the context independent models using the training data associated with the respective tied states.

    摘要翻译: 根据本发明的一个方面,提供了一种方法,其中创建并训练一组多个混合单声道模型以生成一组多个混合上下文相关模型。 创建和训练了一组单一混合三音模型,以生成一组上下文相关模型。 将三通电话模型的对应状态聚类成基于决策树聚类过程获得一组绑定状态。 使用依赖于数据的最大后验(MAP)适配方法来估计上下文相关模型的参数,其中,通过使用与上下文相关模型相关联的训练数据来调整上下文无关模型的相应参数,从而导出上下文相关模型的绑定状态的参数 各自的绑定状态。

    Method, apparatus, and system for building context dependent models for a large vocabulary continuous speech recognition (LVCSR) system
    2.
    发明授权
    Method, apparatus, and system for building context dependent models for a large vocabulary continuous speech recognition (LVCSR) system 有权
    用于为大型词汇连续语音识别(LVCSR)系统构建上下文相关模型的方法,装置和系统

    公开(公告)号:US07587321B2

    公开(公告)日:2009-09-08

    申请号:US10332652

    申请日:2001-05-08

    IPC分类号: G10L15/14

    CPC分类号: G10L15/187 G10L15/1815

    摘要: According to one aspect of the invention, a method is provided in which a set of multiple mixture monophone models is created and trained to generate a set of multiple mixture context dependent models. A set of single mixture triphone models is created and trained to generate a set of context dependent models. Corresponding states of the triphone models are clustered to obtain a set of tied states based on a decision tree clustering process. Parameters of the context dependent models are estimated using a data dependent maximum a posteriori (MAP) adaptation method in which parameters of the tied states of the context dependent models are derived by adapting corresponding parameters of the context independent models using the training data associated with the respective tied states.

    摘要翻译: 根据本发明的一个方面,提供了一种方法,其中创建并训练一组多个混合单声道模型以生成一组多个混合上下文相关模型。 创建和训练了一组单一混合三音模型,以生成一组上下文相关模型。 将三通电话模型的对应状态聚类成基于决策树聚类过程获得一组绑定状态。 使用依赖于数据的最大后验(MAP)适配方法来估计上下文相关模型的参数,其中,通过使用与上下文相关模型相关联的训练数据来调整上下文无关模型的相应参数,从而导出上下文相关模型的绑定状态的参数 各自的绑定状态。

    Method, apparatus, and system for bottom-up tone integration to Chinese continuous speech recognition system
    3.
    发明授权
    Method, apparatus, and system for bottom-up tone integration to Chinese continuous speech recognition system 有权
    用于自下而上音调集成到中文连续语音识别系统的方法,装置和系统

    公开(公告)号:US07181391B1

    公开(公告)日:2007-02-20

    申请号:US10148479

    申请日:2000-09-30

    IPC分类号: G10L15/00 G10L15/02 G10L15/14

    CPC分类号: G10L15/18 G10L25/15

    摘要: According to one aspect of the invention, a method is provided in which knowledge about tone characteristics of a tonal syllabic language is used to model speech at various levels in a bottom-up speech recognition structure. The various levels in the bottom-up recognition structure include the acoustic level, the phonetic level, the work level, and the sentence level. At the acoustic level, pitch is treated as a continuous acoustic variable and pitch information extracted from the speech signal is included as feature component of feature vectors. At the phonetic level, main vowels having the same phonetic structure but different tones are defined and modeled as different phonemes. At the word level, as set of tone changes rules is used to build transcription for training data and pronunciation lattice for decoding. At sentence level, a set of sentence ending words with light tone are also added to the system vocabulary.

    摘要翻译: 根据本发明的一个方面,提供了一种方法,其中使用音调音节语言的音调特征的知识来在自下而上的语音识别结构中对各种级别的语音进行建模。 自下而上识别结构的各个层次包括声级,语音级,工作级和句级。 在声级中,将音调视为连续的声学变量,并且将从语音信号提取的音调信息作为特征向量的特征成分被包括。 在语音层面,具有相同语音结构但不同音调的主元音被定义并被建模为不同的音素。 在词级上,作为一组音调变化规则用于构建用于训练数据和发音格子的转录用于解码。 在句子级别,系统词汇中还添加了一组带有轻音的句子结束词。

    Method and system to scale down a decision tree-based hidden markov model (HMM) for speech recognition
    4.
    发明授权
    Method and system to scale down a decision tree-based hidden markov model (HMM) for speech recognition 有权
    用于缩小基于决策树的隐马尔可夫模型(HMM)用于语音识别的方法和系统

    公开(公告)号:US07472064B1

    公开(公告)日:2008-12-30

    申请号:US10019381

    申请日:2000-09-30

    IPC分类号: G10L15/14

    摘要: A method and system are provided in which a decision tree-based model (“general model”) is scaled down (“trim-down”) for a given task. The trim-down model can be adapted for the given task using task specific data. The general model can be based on a hidden markov model (HMM). By allowing a decision tree-based acoustic model (“general model”) to be scaled according to the vocabulary of the given task, the general model can be configured dynamically into a trim-down model, which can be used to improve speech recognition performance and reduce system resource utilization. Furthermore, the trim-down model can be adapted/adjusted according to task specific data, e.g., task vocabulary, model size, or other like task specific data.

    摘要翻译: 提供了一种方法和系统,其中对于给定任务,基于决策树的模型(“一般模型”)被缩小(“缩小”)。 可以使用特定于任务的数据来适应给定任务的微调模型。 一般模型可以基于隐马尔可夫模型(HMM)。 通过允许基于决策树的声学模型(“通用模型”)根据给定任务的词汇进行缩放,通用模型可以动态地配置到缩小模型中,该模型可用于改善语音识别性能 并降低系统资源利用率。 此外,缩减模型可以根据任务特定数据(例如,任务词汇,模型大小或其他类似的任务特定数据)进行调整/调整。

    Search method based on single triphone tree for large vocabulary continuous speech recognizer
    5.
    发明授权
    Search method based on single triphone tree for large vocabulary continuous speech recognizer 失效
    基于单个三音节树的大型词汇连续语音识别器的搜索方法

    公开(公告)号:US06980954B1

    公开(公告)日:2005-12-27

    申请号:US10130857

    申请日:2000-09-30

    CPC分类号: G10L15/08 G10L2015/022

    摘要: A search method based on a single triphone tree for large vocabulary continuous speech recognizer is disclosed in which speech signal are received. Tokens are propagated in a phonetic tree to integrate a language model to recognize the received speech signals. By propagating tokens, which are preserved in tree nodes and record the path history, a single triphone tree can be used in a one pass searching process thereby reducing speech recognition processing time and system resource use.

    摘要翻译: 公开了一种基于用于大词汇连续语音识别器的单个三音节树的搜索方法,其中接收了语音信号。 令牌在语音树中传播以集成语言模型以识别所接收的语音信号。 通过传播保存在树节点中并记录路径历史的令牌,可以在单遍搜索过程中使用单个三叉树,从而减少语音识别处理时间和系统资源使用。

    Acoustic modeling using a two-level decision tree in a speech recognition system
    6.
    发明授权
    Acoustic modeling using a two-level decision tree in a speech recognition system 失效
    在语音识别系统中使用两级决策树进行声学建模

    公开(公告)号:US06789063B1

    公开(公告)日:2004-09-07

    申请号:US09653402

    申请日:2000-09-01

    申请人: Yonghong Yan

    发明人: Yonghong Yan

    IPC分类号: G10L1506

    CPC分类号: G10L15/06 G10L15/08

    摘要: In some embodiments, the invention involves receiving phonetic samples and assembling a two-level phonetic decision tree structure using the phonetic samples. The decision tree has multiple leaf node levels each having at least one state, wherein a least one node in a second level is assigned a Gaussian of a node in the first level, but the at least one node in the second level has a weight computed for it.

    摘要翻译: 在一些实施例中,本发明涉及使用语音样本接收语音样本和组合两级语音决策树结构。 决策树具有各自具有至少一个状态的多个叶节点级别,其中第二级中的至少一个节点被分配为第一级中的节点的高斯,但是第二级中的至少一个节点具有计算的权重 为了它。

    Method and system for expanding a word graph to a phone graph based on a cross-word acoustical model to improve continuous speech recognition
    7.
    发明授权
    Method and system for expanding a word graph to a phone graph based on a cross-word acoustical model to improve continuous speech recognition 有权
    基于跨词语音模型将字图扩展到手机图的方法和系统,以改善连续语音识别

    公开(公告)号:US08260614B1

    公开(公告)日:2012-09-04

    申请号:US10019382

    申请日:2000-09-28

    IPC分类号: G10L15/00

    摘要: A method and system that expands a word graph to a phone graph. An unknown speech signal is received. A word graph is generated based on an application task or based on information extracted from the unknown speech signal. The word graph is expanded into a phone graph. The unknown speech signal is recognized using the phone graph. The phone graph can be based on a cross-word acoustical model to improve continuous speech recognition. By expanding a word graph into a phone graph, the phone graph can consume less memory than a word graph and can reduce greatly the computation cost in the decoding process than that of the word graph thus improving system performance. Furthermore, continuous speech recognition error rate can be reduced by using the phone graph, which provides a more accurate graph for continuous speech recognition.

    摘要翻译: 将字图扩展到手机图的方法和系统。 接收到未知语音信号。 基于应用任务或基于从未知语音信号提取的信息生成词图。 字图展开为手机图。 使用电话图表识别未知语音信号。 电话图可以基于跨字声学模型来改善连续语音识别。 通过将字图扩展为手机图,手机图可以消耗比字图更少的存储器,并且可以大大减少解码过程中的计算成本,而不是字图,从而提高系统性能。 此外,可以通过使用电话图来减少连续语音识别错误率,这为连续语音识别提供了更准确的图形。

    Method and system for using rule-based knowledge to build a class-based domain specific statistical language model
    8.
    发明授权
    Method and system for using rule-based knowledge to build a class-based domain specific statistical language model 失效
    使用基于规则的知识构建基于类的域特定统计语言模型的方法和系统

    公开(公告)号:US07275033B1

    公开(公告)日:2007-09-25

    申请号:US10130860

    申请日:2000-09-30

    IPC分类号: G10L15/18 G06F17/27

    CPC分类号: G10L15/197 G10L15/183

    摘要: A method and system for providing a class-based statistical language model representation from rule-based knowledge is disclosed. The class-based language model is generated from a statistical representation of a class-based rule net. A class-based rule net is generated using the domain-related rules with words replaced with their corresponding class-tags that are manually defined. The class-based statistical representation from the class-based rule net is combined with a class-based statistical representation from a statistical language model to generate a language model. The language model is enhanced by smoothing/adapting with general-purpose and/or domain-related corpus for use as the final language model. A two-pass search algorithm is applied for speech decoding.

    摘要翻译: 公开了一种基于规则的知识提供基于类的统计语言模型表示的方法和系统。 基于类的语言模型是从基于类的规则网的统计表示生成的。 使用与域相关的规则生成基于类的规则网,其中单词替换为手动定义的相应类标签。 基于类的规则网络的基于类的统计表示与来自统计语言模型的基于类的统计表示相结合以生成语言模型。 通过使用通用和/或域相关语料库进行平滑/调整,作为最终语言模型来增强语言模型。 双路搜索算法应用于语音解码。

    Selective merging of segments separated in response to a break in an utterance
    9.
    发明授权
    Selective merging of segments separated in response to a break in an utterance 有权
    选择性地合并细分,以响应于话语中断

    公开(公告)号:US06601028B1

    公开(公告)日:2003-07-29

    申请号:US09648591

    申请日:2000-08-25

    申请人: Yonghong Yan

    发明人: Yonghong Yan

    IPC分类号: G10L1504

    CPC分类号: G10L15/04 G10L15/197

    摘要: In some embodiments, the invention involves a method including segmenting an utterance into at least a first segment and a second segment, wherein a boundary between the first and second segments corresponds to a break in the utterance. The method further includes selecting potential hypothetical paths of potential words in the first and second segments that cross the boundary. The method also includes applying a language model to the potential hypothetical paths crossing to determine whether to merge the first and second segments and to apply decoding to the merged segments.

    摘要翻译: 在一些实施例中,本发明涉及一种方法,包括将话语分割成至少第一段和第二段,其中第一和第二段之间的边界对应于话语中断。 该方法进一步包括选择横跨边界的第一和第二段中的潜在词的潜在假设路径。 该方法还包括将语言模型应用于跨越的潜在假设路径以确定是否合并第一和第二段并将解码应用于合并的段。

    Program endpoint time detection apparatus and method, and program information retrieval system
    10.
    发明授权
    Program endpoint time detection apparatus and method, and program information retrieval system 有权
    程序端点时间检测装置和方法,以及程序信息检索系统

    公开(公告)号:US09009054B2

    公开(公告)日:2015-04-14

    申请号:US12914346

    申请日:2010-10-28

    IPC分类号: G10L21/00 G10L11/06 G06F17/30

    CPC分类号: G06F17/30743 G06F17/30749

    摘要: This invention relates to retrieval for multimedia content, and provides a program endpoint time detection apparatus for detecting an endpoint time of a program by performing processing on audio signals of said program, comprising an audio classification unit for classifying said audio signals into a speech signal portion and a non-speech signal portion; a keyword retrieval unit for retrieving, as a candidate endpoint keyword, an endpoint keyword indicating start or end of the program from said speech signal portion; a content analysis unit for performing content analysis on context of the candidate endpoint keyword retrieved by the keyword retrieval unit to determine whether the candidate endpoint keyword is a valid endpoint keyword; and a program endpoint time determination unit for performing statistics analysis based on the retrieval result of said keyword retrieval unit and the determination result of said content analysis unit, and determining the endpoint time of the program. In addition, this invention also provides a program information retrieval system. With present invention, program information regarding a program attended by user can be rapidly obtained.

    摘要翻译: 本发明涉及用于多媒体内容的检索,并提供一种程序端点时间检测装置,用于通过对所述节目的音频信号执行处理来检测节目的终点时间,包括用于将所述音频信号分类成语音信号部分的音频分类单元 和非语音信号部分; 关键词检索单元,用于从所述语音信号部分检索表示节目开始或结束的终点关键字作为候选终点关键词; 内容分析单元,用于对关键词检索单元检索到的候选端点关键字的上下文进行内容分析,以确定候选端点关键字是否是有效的端点关键字; 以及程序端点时间确定单元,用于基于所述关键词检索单元的检索结果和所述内容分析单元的确定结果执行统计分析,以及确定程序的终点时间。 此外,本发明还提供了一种节目信息检索系统。 通过本发明,可以快速获得关于用户所关注的节目的节目信息。