Apparatus, method, and medium for dialogue speech recognition using topic domain detection
    1.
    发明申请
    Apparatus, method, and medium for dialogue speech recognition using topic domain detection 有权
    使用主题域检测的对话语音识别的装置,方法和介质

    公开(公告)号:US20070100618A1

    公开(公告)日:2007-05-03

    申请号:US11589165

    申请日:2006-10-30

    IPC分类号: G10L15/00

    CPC分类号: G10L15/1822 G10L15/1815

    摘要: An apparatus, method, and medium for dialogue speech recognition using topic domain detection are disclosed. An apparatus includes a forward search module performing a forward search in order to create a word lattice similar to a feature vector, which is extracted from an input voice signal, with reference to a global language model database, a pronunciation dictionary database and an acoustic model database, which have been previously established, a topic-domain-detection module detecting a topic domain by inferring a topic based on meanings of vocabularies contained in the word lattice using information of the word lattice created as a result of the forward search, and a backward-decoding module performing a backward decoding of the detected topic domain with reference to a specific topic domain language model database, which has been previously established, thereby outputting a speech recognition result for an input voice signal in text form. Accuracy and efficiency for a dialogue sentence are improved.

    摘要翻译: 公开了一种使用主题域检测进行对话语音识别的装置,方法和介质。 一种装置,包括执行前向搜索的前向搜索模块,以便参考全局语言模型数据库,发音词典数据库和声学模型来创建类似于从输入语音信号提取的特征向量的单词格 数据库,其已经建立,主题域检测模块通过使用由作为前向搜索的结果创建的单词格点的信息,基于包含在单词格中的词汇的含义来推断主题来检测主题领域,以及 后向解码模块参照已经建立的特定主题域语言模型数据库执行所检测到的主题域的反向解码,从而以文本形式输出用于输入语音信号的语音识别结果。 提高对话句子的​​准确性和效率。

    Method, medium, and system retrieving a media file based on extracted partial keyword
    2.
    发明授权
    Method, medium, and system retrieving a media file based on extracted partial keyword 有权
    方法,介质和系统基于提取的部分关键字检索媒体文件

    公开(公告)号:US08356032B2

    公开(公告)日:2013-01-15

    申请号:US11651042

    申请日:2007-01-09

    IPC分类号: G06F7/00 G06F17/30

    摘要: A method, medium, and system retrieving a media file associated with a partial keyword which is generated by using a named entity extracted from the media file, when a query is received from a user, with the media file being associated with the partial keyword being retrieved by identifying the partial keyword associated with the query through speech recognition. That is, a media file may be retrieved by extracting the named entity from the media file, performing a word segmentation for the extracted named entity, generating the partial keyword from the word-segmented named entity, and retrieving the corresponding media file by using the partial keyword.

    摘要翻译: 一种方法,介质和系统,当从所述媒体文件接收到查询时,通过所述媒体文件与所述部分关键字相关联来检索与通过使用从所述媒体文件提取的命名实体生成的部分关键字相关联的媒体文件, 通过通过语音识别识别与查询相关联的部分关键词来检索。 也就是说,可以通过从媒体文件中提取命名实体来检索媒体文件,对所提取的命名实体执行字分割,从字分割命名实体生成部分关键字,并通过使用 部分关键字

    Recognition confidence measuring by lexical distance between candidates
    4.
    发明授权
    Recognition confidence measuring by lexical distance between candidates 有权
    通过候选人之间的词汇距离来识别可信度

    公开(公告)号:US08990086B2

    公开(公告)日:2015-03-24

    申请号:US11495562

    申请日:2006-07-31

    CPC分类号: G10L15/08 G10L15/187

    摘要: A recognition confidence measurement method, medium and system which can more accurately determine whether an input speech signal is an in-vocabulary, by extracting an optimum number of candidates that match a phone string extracted from the input speech signal and estimating a lexical distance between the extracted candidates is provided. A recognition confidence measurement method includes: extracting a phoneme string from a feature vector of an input speech signal; extracting candidates by matching the extracted phoneme string and phoneme strings of vocabularies registered in a predetermined dictionary and; estimating a lexical distance between the extracted candidates; and determining whether the input speech signal is an in-vocabulary, based on the lexical distance.

    摘要翻译: 一种识别置信度测量方法,能够更准确地确定输入语音信号是否是词汇表的识别置信度测量方法,通过提取与从输入语音信号提取的电话线匹配的候选的最佳数量,并且估计出一个词汇距离, 提供候选人。 识别置信测量方法包括:从输入语音信号的特征向量中提取音素串; 通过匹配提取的音素串和在预定字典中登记的词汇的音素串来提取候选; 估计提取的候选者之间的词汇距离; 以及基于所述词汇距离来确定所述输入语音信号是否是词汇表。

    Apparatus, method, and medium for dialogue speech recognition using topic domain detection
    5.
    发明授权
    Apparatus, method, and medium for dialogue speech recognition using topic domain detection 有权
    使用主题域检测的对话语音识别的装置,方法和介质

    公开(公告)号:US08301450B2

    公开(公告)日:2012-10-30

    申请号:US11589165

    申请日:2006-10-30

    CPC分类号: G10L15/1822 G10L15/1815

    摘要: An apparatus, method, and medium for dialogue speech recognition using topic domain detection are disclosed. An apparatus includes a forward search module performing a forward search in order to create a word lattice similar to a feature vector, which is extracted from an input voice signal, with reference to a global language model database, a pronunciation dictionary database and an acoustic model database, which have been previously established, a topic-domain-detection module detecting a topic domain by inferring a topic based on meanings of vocabularies contained in the word lattice using information of the word lattice created as a result of the forward search, and a backward-decoding module performing a backward decoding of the detected topic domain with reference to a specific topic domain language model database, which has been previously established, thereby outputting a speech recognition result for an input voice signal in text form. Accuracy and efficiency for a dialogue sentence are improved.

    摘要翻译: 公开了一种使用主题域检测进行对话语音识别的装置,方法和介质。 一种装置,包括执行前向搜索的前向搜索模块,以便参考全局语言模型数据库,发音词典数据库和声学模型来创建类似于从输入语音信号提取的特征向量的单词格 数据库,其已经建立,主题域检测模块通过使用由作为前向搜索的结果创建的单词格点的信息,基于包含在单词格中的词汇的含义来推断主题来检测主题领域,以及 后向解码模块参照已经建立的特定主题域语言模型数据库执行所检测到的主题域的反向解码,从而以文本形式输出用于输入语音信号的语音识别结果。 提高对话句子的​​准确性和效率。

    Method and apparatus for recognizing speech by measuring confidence levels of respective frames
    6.
    发明授权
    Method and apparatus for recognizing speech by measuring confidence levels of respective frames 有权
    通过测量各帧的置信水平来识别语音的方法和装置

    公开(公告)号:US08271283B2

    公开(公告)日:2012-09-18

    申请号:US11355082

    申请日:2006-02-16

    IPC分类号: G10L15/04 G10L15/00

    CPC分类号: G10L15/08 G10L15/142

    摘要: Disclosed herein is a method and apparatus to recognize speech by measuring the confidence levels of respective frames. The method includes the operations of obtaining frequency features of a received speech signal for the respective frames having a predetermined length, calculating a keyword model-based likelihood and a filler model-based likelihood for each of the frame, calculating a confidence score based on the two types of likelihoods, and deciding whether the received speech signal corresponds to a keyword or a non-keyword based on the confidence scores. Also, the method includes the operation of transforming the confidence scores by applying transform functions of clusters, which include the confidence scores or are close to the confidence scores, to the confidence scores.

    摘要翻译: 本文公开了一种通过测量各个帧的置信水平来识别语音的方法和装置。 该方法包括获得具有预定长度的各个帧的接收到的语音信号的频率特征的操作,计算每个帧的基于关键词模型的可能性和基于填充模型的可能性,基于 两种类型的可能性,并且基于置信度分数来决定接收到的语音信号是否对应于关键字或非关键字。 此外,该方法包括通过将包括置信分数或接近置信度得分的聚类的变换函数应用到置信度得分来变换置信度分数的操作。

    Apparatus and method for recognizing voice
    7.
    发明授权
    Apparatus and method for recognizing voice 有权
    用于识别语音的装置和方法

    公开(公告)号:US08140334B2

    公开(公告)日:2012-03-20

    申请号:US11475963

    申请日:2006-06-28

    IPC分类号: G10L15/14 G10L15/00

    CPC分类号: G10L15/142

    摘要: An apparatus and method for recognizing voice. The apparatus includes a feature vector extraction unit dividing an input voice signal into predetermined unit regions, and extracting feature vectors corresponding to each of the unit regions; a predicted node extraction unit extracting a list of second nodes whose travels to a first node corresponding to the extracted feature vectors are predicted, with reference to a network of one or more nodes; a single waveform similarity calculation unit calculating degrees of single waveform similarity of the first node and the second nodes of the list by substituting the extracted feature vectors into single waveform probability distributions that constitute voice signals corresponding to the second nodes; a multiple waveform similarity calculation unit calculating degrees of multiple waveform similarity by substituting the extracted feature vectors into multiple waveform probability distributions that constitute single waveform probability distributions usable to calculate the degrees of single waveform similarity in a preset range; and an output unit outputting a function-performing signal corresponding to a multiple waveform probability distribution that enables calculation of a highest of the calculated degrees of multiple waveform similarity.

    摘要翻译: 用于识别语音的装置和方法。 该装置包括:特征向量提取单元,将输入的语音信号划分为预定的单位区域;提取与每个单位区域对应的特征向量; 参考一个或多个节点的网络,预测提取与对应于所提取的特征向量的对第一节点的行进的第二节点的列表的预测节点提取单元; 单个波形相似度计算单元,通过将提取的特征向量代入构成对应于第二节点的语音信号的单波形概率分布来计算第一节点和列表的第二节点的单波形相似度的度数; 多波形相似度计算单元,通过将所提取的特征向量代入构成单个波形概率分布的多个波形概率分布来计算多个波形相似度,以计算预设范围内的单一波形相似度; 以及输出单元,输出与多波形概率分布相对应的功能执行信号,能够计算所计算出的多重波形相似度的最高值。

    Method and apparatus for discriminative estimation of parameters in maximum a posteriori (MAP) speaker adaptation condition and voice recognition method and apparatus including these
    9.
    发明授权
    Method and apparatus for discriminative estimation of parameters in maximum a posteriori (MAP) speaker adaptation condition and voice recognition method and apparatus including these 失效
    最大后验(MAP)说话者适应条件中的参数的鉴别估计方法和装置以及包括这些参数的语音识别方法和装置

    公开(公告)号:US07324941B2

    公开(公告)日:2008-01-29

    申请号:US10898382

    申请日:2004-07-26

    IPC分类号: G10L15/28

    CPC分类号: G10L15/07

    摘要: A method and apparatus for discriminative estimation of parameters in a maximum a posteriori (MAP) speaker adaptation condition, and a voice recognition apparatus having the apparatus and a voice recognition method using the method are provided. The method for discriminative estimation of parameters in a maximum a posteriori (MAP) speaker adaptation condition, in which at least speaker-independent model parameters and prior density parameters, which are standards in recognizing a speaker's voice, are obtained as the result of model training after fetching training sets on a plurality of speakers from a training database, has the steps of (a) classifying adaptation data among training sets for respective speakers; (b) obtaining model parameters adapted from adaptation data on each speaker by using the initial values of the parameters; (c) searching a plurality of candidate hypotheses on each uttered sentence of training sets by using the adapted model parameters, and calculating gradients of speaker-independent model parameters by measuring the degree of errors on each training sentence; and (d) when training sets of all speakers are adapted, updating parameters, which were set at the initial stage, based on the calculated gradients.

    摘要翻译: 提供了一种用于鉴别性估计最大后验(MAP)说话者适应条件中的参数的方法和装置,以及具有使用该方法的装置和语音识别方法的语音识别装置。 作为模型训练的结果,获得最大后验(MAP)说话者适应条件中的参数的辨别性估计的方法,其中至少与说话者独立的模型参数和作为识别说话者的声音的标准的先前密度参数被获得 在从训练数据库获取多个扬声器上的训练集之后,具有以下步骤:(a)在适用于各个扬声器的训练集之间对适配数据进行分类; (b)通过使用参数的初始值从每个说话者的适应数据中获得适应的模型参数; (c)通过使用适应的模型参数来搜索训练集的每个发音句子上的多个候选假设,以及通过测量每个训练句子的错误程度来计算与说话者无关的模型参数的梯度; 和(d)当适应所有发言者的训练集时,根据计算的梯度更新在初始阶段设定的参数。

    Method, medium, and system retrieving a media file based on extracted partial keyword
    10.
    发明申请
    Method, medium, and system retrieving a media file based on extracted partial keyword 有权
    方法,介质和系统基于提取的部分关键字检索媒体文件

    公开(公告)号:US20070198511A1

    公开(公告)日:2007-08-23

    申请号:US11651042

    申请日:2007-01-09

    IPC分类号: G06F17/30

    摘要: A method, medium, and system retrieving a media file associated with a partial keyword which is generated by using a named entity extracted from the media file, when a query is received from a user, with the media file being associated with the partial keyword being retrieved by identifying the partial keyword associated with the query through speech recognition. That is, a media file may be retrieved by extracting the named entity from the media file, performing a word segmentation for the extracted named entity, generating the partial keyword from the word-segmented named entity, and retrieving the corresponding media file by using the partial keyword.

    摘要翻译: 一种方法,介质和系统,当从所述媒体文件接收到查询时,通过所述媒体文件与所述部分关键字相关联来检索与通过使用从所述媒体文件提取的命名实体生成的部分关键字相关联的媒体文件, 通过通过语音识别识别与查询相关联的部分关键词来检索。 也就是说,可以通过从媒体文件中提取命名实体来检索媒体文件,对所提取的命名实体执行字分割,从字分割命名实体生成部分关键字,并通过使用 部分关键字