Method, apparatus, and system for building a compact model for large vocabulary continuous speech recognition (LVCSR) system
    1.
    发明授权
    Method, apparatus, and system for building a compact model for large vocabulary continuous speech recognition (LVCSR) system 失效
    用于构建大型词汇连续语音识别(LVCSR)系统的紧凑型模型的方法,装置和系统

    公开(公告)号:US07454341B1

    公开(公告)日:2008-11-18

    申请号:US10148028

    申请日:2000-09-30

    IPC分类号: G10L15/14

    摘要: According to one aspect of the invention, a method is provided in which a mean vector set and a variance vector set of a set of N Gaussians are divided into multiple mean sub-vector sets and variance sub-vector sets, respectively. Each mean sub-vector set contains a subset of the dimensions of the corresponding mean vector set and each variance sub-vector set contains a subset of the dimensions of the corresponding variance vector set. Each resultant sub-vector set is clustered to build a codebook for the respective sub-vector set using a modified K-means clustering process which dynamically merges and splits clusters based upon the size and average distortion of each cluster during each iteration in the modified K-means clustering process.

    摘要翻译: 根据本发明的一个方面,提供了一种方法,其中将一组N高斯的均值向量集和方差矢量集分别分成多个平均子向量集和方差子向量集。 每个平均子向量集包含对应的平均向量集的维度的子集,并且每个方差子向量集合包含相应方差向量集合的维度的子集。 每个合成的子向量集合被聚类以使用修改的K均值聚类过程来构建用于相应子向量集的码本,其基于在修改的K中的每个迭代期间每个簇的大小和平均失真来动态地合并和分割聚类 - 聚类过程。

    Method, apparatus, and system for bottom-up tone integration to Chinese continuous speech recognition system
    2.
    发明授权
    Method, apparatus, and system for bottom-up tone integration to Chinese continuous speech recognition system 有权
    用于自下而上音调集成到中文连续语音识别系统的方法,装置和系统

    公开(公告)号:US07181391B1

    公开(公告)日:2007-02-20

    申请号:US10148479

    申请日:2000-09-30

    IPC分类号: G10L15/00 G10L15/02 G10L15/14

    CPC分类号: G10L15/18 G10L25/15

    摘要: According to one aspect of the invention, a method is provided in which knowledge about tone characteristics of a tonal syllabic language is used to model speech at various levels in a bottom-up speech recognition structure. The various levels in the bottom-up recognition structure include the acoustic level, the phonetic level, the work level, and the sentence level. At the acoustic level, pitch is treated as a continuous acoustic variable and pitch information extracted from the speech signal is included as feature component of feature vectors. At the phonetic level, main vowels having the same phonetic structure but different tones are defined and modeled as different phonemes. At the word level, as set of tone changes rules is used to build transcription for training data and pronunciation lattice for decoding. At sentence level, a set of sentence ending words with light tone are also added to the system vocabulary.

    摘要翻译: 根据本发明的一个方面,提供了一种方法,其中使用音调音节语言的音调特征的知识来在自下而上的语音识别结构中对各种级别的语音进行建模。 自下而上识别结构的各个层次包括声级,语音级,工作级和句级。 在声级中,将音调视为连续的声学变量,并且将从语音信号提取的音调信息作为特征向量的特征成分被包括。 在语音层面,具有相同语音结构但不同音调的主元音被定义并被建模为不同的音素。 在词级上,作为一组音调变化规则用于构建用于训练数据和发音格子的转录用于解码。 在句子级别,系统词汇中还添加了一组带有轻音的句子结束词。

    Method, apparatus, and system for building context dependent models for a large vocabulary continuous speech recognition (lvcsr) system
    3.
    发明申请
    Method, apparatus, and system for building context dependent models for a large vocabulary continuous speech recognition (lvcsr) system 有权
    用于为大型词汇连续语音识别(lvcsr)系统构建上下文相关模型的方法,装置和系统

    公开(公告)号:US20050228666A1

    公开(公告)日:2005-10-13

    申请号:US10332652

    申请日:2001-05-08

    CPC分类号: G10L15/187 G10L15/1815

    摘要: According to one aspect of the invention, a method is provided in which a set of multiple mixture monophone models is created and trained to generate a set of multiple mixture context dependent models. A set of single mixture triphone models is created and trained to generate a set of context dependent models. Corresponding states of the triphone models are clustered to obtain a set of tied states based on a decision tree clustering process. Parameters of the context dependent models are estimated using a data dependent maximum a posteriori (MAP) adaptation method in which parameters of the tied states of the context dependent models are derived by adapting corresponding parameters of the context independent models using the training data associated with the respective tied states.

    摘要翻译: 根据本发明的一个方面,提供了一种方法,其中创建并训练一组多个混合单声道模型以生成一组多个混合上下文相关模型。 创建和训练了一组单一混合三音模型,以生成一组上下文相关模型。 将三通电话模型的对应状态聚类成基于决策树聚类过程获得一组绑定状态。 使用依赖于数据的最大后验(MAP)适配方法来估计上下文相关模型的参数,其中,通过使用与上下文相关模型相关联的训练数据来调整上下文无关模型的相应参数,从而导出上下文相关模型的绑定状态的参数 各自的绑定状态。

    Method and system to scale down a decision tree-based hidden markov model (HMM) for speech recognition
    4.
    发明授权
    Method and system to scale down a decision tree-based hidden markov model (HMM) for speech recognition 有权
    用于缩小基于决策树的隐马尔可夫模型(HMM)用于语音识别的方法和系统

    公开(公告)号:US07472064B1

    公开(公告)日:2008-12-30

    申请号:US10019381

    申请日:2000-09-30

    IPC分类号: G10L15/14

    摘要: A method and system are provided in which a decision tree-based model (“general model”) is scaled down (“trim-down”) for a given task. The trim-down model can be adapted for the given task using task specific data. The general model can be based on a hidden markov model (HMM). By allowing a decision tree-based acoustic model (“general model”) to be scaled according to the vocabulary of the given task, the general model can be configured dynamically into a trim-down model, which can be used to improve speech recognition performance and reduce system resource utilization. Furthermore, the trim-down model can be adapted/adjusted according to task specific data, e.g., task vocabulary, model size, or other like task specific data.

    摘要翻译: 提供了一种方法和系统,其中对于给定任务,基于决策树的模型(“一般模型”)被缩小(“缩小”)。 可以使用特定于任务的数据来适应给定任务的微调模型。 一般模型可以基于隐马尔可夫模型(HMM)。 通过允许基于决策树的声学模型(“通用模型”)根据给定任务的词汇进行缩放,通用模型可以动态地配置到缩小模型中,该模型可用于改善语音识别性能 并降低系统资源利用率。 此外,缩减模型可以根据任务特定数据(例如,任务词汇,模型大小或其他类似的任务特定数据)进行调整/调整。

    Method, apparatus, and system for building context dependent models for a large vocabulary continuous speech recognition (LVCSR) system
    5.
    发明授权
    Method, apparatus, and system for building context dependent models for a large vocabulary continuous speech recognition (LVCSR) system 有权
    用于为大型词汇连续语音识别(LVCSR)系统构建上下文相关模型的方法,装置和系统

    公开(公告)号:US07587321B2

    公开(公告)日:2009-09-08

    申请号:US10332652

    申请日:2001-05-08

    IPC分类号: G10L15/14

    CPC分类号: G10L15/187 G10L15/1815

    摘要: According to one aspect of the invention, a method is provided in which a set of multiple mixture monophone models is created and trained to generate a set of multiple mixture context dependent models. A set of single mixture triphone models is created and trained to generate a set of context dependent models. Corresponding states of the triphone models are clustered to obtain a set of tied states based on a decision tree clustering process. Parameters of the context dependent models are estimated using a data dependent maximum a posteriori (MAP) adaptation method in which parameters of the tied states of the context dependent models are derived by adapting corresponding parameters of the context independent models using the training data associated with the respective tied states.

    摘要翻译: 根据本发明的一个方面,提供了一种方法,其中创建并训练一组多个混合单声道模型以生成一组多个混合上下文相关模型。 创建和训练了一组单一混合三音模型,以生成一组上下文相关模型。 将三通电话模型的对应状态聚类成基于决策树聚类过程获得一组绑定状态。 使用依赖于数据的最大后验(MAP)适配方法来估计上下文相关模型的参数,其中,通过使用与上下文相关模型相关联的训练数据来调整上下文无关模型的相应参数,从而导出上下文相关模型的绑定状态的参数 各自的绑定状态。

    Search method based on single triphone tree for large vocabulary continuous speech recognizer
    6.
    发明授权
    Search method based on single triphone tree for large vocabulary continuous speech recognizer 失效
    基于单个三音节树的大型词汇连续语音识别器的搜索方法

    公开(公告)号:US06980954B1

    公开(公告)日:2005-12-27

    申请号:US10130857

    申请日:2000-09-30

    CPC分类号: G10L15/08 G10L2015/022

    摘要: A search method based on a single triphone tree for large vocabulary continuous speech recognizer is disclosed in which speech signal are received. Tokens are propagated in a phonetic tree to integrate a language model to recognize the received speech signals. By propagating tokens, which are preserved in tree nodes and record the path history, a single triphone tree can be used in a one pass searching process thereby reducing speech recognition processing time and system resource use.

    摘要翻译: 公开了一种基于用于大词汇连续语音识别器的单个三音节树的搜索方法,其中接收了语音信号。 令牌在语音树中传播以集成语言模型以识别所接收的语音信号。 通过传播保存在树节点中并记录路径历史的令牌,可以在单遍搜索过程中使用单个三叉树,从而减少语音识别处理时间和系统资源使用。

    Method and apparatus for tone-sensitive acoustic modeling
    7.
    发明授权
    Method and apparatus for tone-sensitive acoustic modeling 失效
    用于音调声学建模的方法和装置

    公开(公告)号:US5884261A

    公开(公告)日:1999-03-16

    申请号:US271639

    申请日:1994-07-07

    摘要: Tone-sensitive acoustic models are generated by first generating acoustic vectors which represent the input data. The input data is separated into multiple frames and an acoustic vector is generated for each frame which represents the input data over its corresponding frame. A tone-sensitive parameter is then generated for each of the frames which indicates the tone of the input data at its corresponding frame. Tone-sensitive parameters are generated in accordance with two embodiments. First, a pitch detector may be used to calculate a pitch for each of the frames. If a pitch cannot be detected for a particular frame, then a pitch is created for that frame based on the pitch values of surrounding frames. Second, the cross covariance between the autocorrelation coefficients for each frame and its successive frame may be generated and used as the tone-sensitive parameter. Feature vectors are then created for each frame by appending the tone-sensitive parameter for a frame to the acoustic vector for the same frame. Then, using these feature vectors, acoustic models are created which represent the input data.

    摘要翻译: 通过首先产生表示输入数据的声矢量来产生音调敏感的声学模型。 输入数据被分成多个帧,并且为代表其对应帧上的输入数据的每个帧生成声向量。 然后,对于指示在其对应帧处的输入数据的音调的每个帧,生成对音调敏感的参数。 根据两个实施例产生音敏参数。 首先,可以使用音调检测器来计算每个帧的音调。 如果对于特定帧不能检测到音调,则基于周围帧的音调值创建针对该帧的音高。 其次,可以生成每个帧及其连续帧的自相关系数之间的交叉协方差,并将其用作音调敏感参数。 然后通过将帧的音调敏感参数附加到相同帧的声矢量来为每个帧创建特征向量。 然后,使用这些特征向量,创建表示输入数据的声学模型。